Subject: Re: Thread benchmarks, round 2
To: John Nemeth <jnemeth@victoria.tc.ca>
From: Kris Kennaway <kris@FreeBSD.org>
List: tech-kern
Date: 10/06/2007 11:02:27
John Nemeth wrote:
> On Feb 25, 5:54am, Kris Kennaway wrote:
> } Andrew Doran wrote:
> } > So, I learned a few things since I put up the previous set of benchmarks:
> } >
> } > - The erratic behaviour from Linux is due to the glibc memory allocator.
> } > Using Google's tcmalloc, the problem disappears.
> }
> } Well you have to be careful there, tcmalloc apparently defers frees, and
> } is not really a general purpose malloc. The linux performance problems
> } are (were? I haven't tried recent kernels) real though.
>
> I would also argue that the average end user isn't likely to be
> doing things such as replacing the malloc library and that the
> benchmark should be run on a system that most users would be running
> (i.e. pick a popular distribution and run it out of the box).
I would agree with this.
> } > Kris Kennaway has kindly offered to try NetBSD on an 8-way system. I expect
> } > that NetBSD will hit a fairly clear ceiling due to poll, fcntl and socket
> } > I/O causing contention on kernel_lock. It will be interesting to see.
> }
> } Here is the initial run with CVS HEAD sources (I took out the obvious
> ^^^^
> } things from GENERIC.MP like I386_CPU support, etc, and removed the
> } default datasize and stack size limits). Same benchmark config that
> } Andrew is using, etc.
> }
> } http://people.freebsd.org/~kris/scaling/netbsd.png
> }
> } There are a couple of things to note:
> }
> } * the drop-off above 8 threads on FreeBSD is due to non-scalability of
> } mysql itself. i.e. it comes from pthread mutex contention in userland.
> } This is the only relevant lock contention point in the FreeBSD kernel
> } on this workload. There are some things we can do in libpthread to
> } mitigate the performance loss in the over-contended pthread situation,
> } but we haven't done them yet.
> }
> } * The tail end of the graph is somewhat noisy, which is the reason for
> } the jump at 19 threads (I only graphed a single run). The distribution
> } at 20 clients looks like:
> }
> } +------------------------------------------------------------+
> } | x x |
> } |x x x xxx x x xx x x xxx x xx|
> } | |_______________A_M_____________| |
> } +------------------------------------------------------------+
> } N Min Max Median Avg Stddev
> } x 20 2326.01 2758.86 2586.47 2572.856 116.69937
> }
> } Next, to try and reproduce Andrew's result, I disabled 4 CPUs (using
> } cpuctl in NetBSD) and compared FreeBSD and NetBSD again. I didnt do a
> } full graph yet, but the results are consistent with what I saw on 8 CPUs.
>
> cpuctl doesn't truly disable the cpus. You would probably need to
> disable them in the BIOS or build a custom kernel.
How do I disable them in the kernel?
> } This measurement shows that FreeBSD is performing 70-80% better than
> } NetBSD in this 4 CPU configuration. This is in contrast to Andrew's
> } findings which seem to show NetBSD performing 10% better than FreeBSD on
> } a 4 CPU system (a very old one though).
> }
> } I will try later with the experimental kernel Andrew sent me (which
> } includes the new scheduler). If it indeed gives a 100% performance
> } improvement that would be a significant result :-)
>
> Up above, you said that you used HEAD. In NetBSD, HEAD is still
> big lock / giant lock with only some minor exceptions. Given that a
> database benchmark would be very heavy on I/O, I would expect to see a
> major difference between HEAD and vmlocking.
Fine, but this kernel is what Andrew asked me to benchmark :)
Kris