tech-kern: Re: Thread benchmarks, round 2

Subject: Re: Thread benchmarks, round 2
To: Kris Kennaway <kris@FreeBSD.org>
From: John Nemeth <jnemeth@victoria.tc.ca>
List: tech-kern
Date: 10/05/2007 18:48:10
On Feb 25,  5:54am, Kris Kennaway wrote:
} Andrew Doran wrote:
} > So, I learned a few things since I put up the previous set of benchmarks:
} > 
} > - The erratic behaviour from Linux is due to the glibc memory allocator.
} >   Using Google's tcmalloc, the problem disappears.
} 
} Well you have to be careful there, tcmalloc apparently defers frees, and
} is not really a general purpose malloc.  The linux performance problems
} are (were? I haven't tried recent kernels) real though.

     I would also argue that the average end user isn't likely to be
doing things such as replacing the malloc library and that the
benchmark should be run on a system that most users would be running
(i.e. pick a popular distribution and run it out of the box).

} > Kris Kennaway has kindly offered to try NetBSD on an 8-way system. I expect
} > that NetBSD will hit a fairly clear ceiling due to poll, fcntl and socket
} > I/O causing contention on kernel_lock. It will be interesting to see.
} 
} Here is the initial run with CVS HEAD sources (I took out the obvious 
                                   ^^^^
} things from GENERIC.MP like I386_CPU support, etc, and removed the 
} default datasize and stack size limits).  Same benchmark config that 
} Andrew is using, etc.
} 
}    http://people.freebsd.org/~kris/scaling/netbsd.png
} 
} There are a couple of things to note:
} 
} * the drop-off above 8 threads on FreeBSD is due to non-scalability of 
} mysql itself.  i.e. it comes from pthread mutex contention in userland. 
}   This is the only relevant lock contention point in the FreeBSD kernel 
} on this workload.  There are some things we can do in libpthread to 
} mitigate the performance loss in the over-contended pthread situation, 
} but we haven't done them yet.
} 
} * The tail end of the graph is somewhat noisy, which is the reason for
} the jump at 19 threads (I only graphed a single run).  The distribution
} at 20 clients looks like:
} 
} +------------------------------------------------------------+
} |                                        x  x                |
} |x      x   x          xxx   x x  xx  x  x  xxx      x     xx|
} |                  |_______________A_M_____________|         |
} +------------------------------------------------------------+
}      N           Min           Max        Median           Avg     Stddev
} x  20       2326.01       2758.86       2586.47      2572.856  116.69937
} 
} Next, to try and reproduce Andrew's result, I disabled 4 CPUs (using 
} cpuctl in NetBSD) and compared FreeBSD and NetBSD again.  I didnt do a 
} full graph yet, but the results are consistent with what I saw on 8 CPUs.

     cpuctl doesn't truly disable the cpus.  You would probably need to
disable them in the BIOS or build a custom kernel.

} This measurement shows that FreeBSD is performing 70-80% better than 
} NetBSD in this 4 CPU configuration.  This is in contrast to Andrew's 
} findings which seem to show NetBSD performing 10% better than FreeBSD on 
} a 4 CPU system (a very old one though).
} 
} I will try later with the experimental kernel Andrew sent me (which 
} includes the new scheduler).  If it indeed gives a 100% performance 
} improvement that would be a significant result :-)

     Up above, you said that you used HEAD.  In NetBSD, HEAD is still
big lock / giant lock with only some minor exceptions.  Given that a
database benchmark would be very heavy on I/O, I would expect to see a
major difference between HEAD and vmlocking.

}-- End of excerpt from Kris Kennaway