Subject: Re: Thread benchmarks, round 2
To: Kris Kennaway <kris@FreeBSD.org>
From: John Nemeth <jnemeth@victoria.tc.ca>
List: tech-kern
Date: 10/05/2007 18:48:10
On Feb 25, 5:54am, Kris Kennaway wrote:
} Andrew Doran wrote:
} > So, I learned a few things since I put up the previous set of benchmarks:
} >
} > - The erratic behaviour from Linux is due to the glibc memory allocator.
} > Using Google's tcmalloc, the problem disappears.
}
} Well you have to be careful there, tcmalloc apparently defers frees, and
} is not really a general purpose malloc. The linux performance problems
} are (were? I haven't tried recent kernels) real though.
I would also argue that the average end user isn't likely to be
doing things such as replacing the malloc library and that the
benchmark should be run on a system that most users would be running
(i.e. pick a popular distribution and run it out of the box).
} > Kris Kennaway has kindly offered to try NetBSD on an 8-way system. I expect
} > that NetBSD will hit a fairly clear ceiling due to poll, fcntl and socket
} > I/O causing contention on kernel_lock. It will be interesting to see.
}
} Here is the initial run with CVS HEAD sources (I took out the obvious
^^^^
} things from GENERIC.MP like I386_CPU support, etc, and removed the
} default datasize and stack size limits). Same benchmark config that
} Andrew is using, etc.
}
} http://people.freebsd.org/~kris/scaling/netbsd.png
}
} There are a couple of things to note:
}
} * the drop-off above 8 threads on FreeBSD is due to non-scalability of
} mysql itself. i.e. it comes from pthread mutex contention in userland.
} This is the only relevant lock contention point in the FreeBSD kernel
} on this workload. There are some things we can do in libpthread to
} mitigate the performance loss in the over-contended pthread situation,
} but we haven't done them yet.
}
} * The tail end of the graph is somewhat noisy, which is the reason for
} the jump at 19 threads (I only graphed a single run). The distribution
} at 20 clients looks like:
}
} +------------------------------------------------------------+
} | x x |
} |x x x xxx x x xx x x xxx x xx|
} | |_______________A_M_____________| |
} +------------------------------------------------------------+
} N Min Max Median Avg Stddev
} x 20 2326.01 2758.86 2586.47 2572.856 116.69937
}
} Next, to try and reproduce Andrew's result, I disabled 4 CPUs (using
} cpuctl in NetBSD) and compared FreeBSD and NetBSD again. I didnt do a
} full graph yet, but the results are consistent with what I saw on 8 CPUs.
cpuctl doesn't truly disable the cpus. You would probably need to
disable them in the BIOS or build a custom kernel.
} This measurement shows that FreeBSD is performing 70-80% better than
} NetBSD in this 4 CPU configuration. This is in contrast to Andrew's
} findings which seem to show NetBSD performing 10% better than FreeBSD on
} a 4 CPU system (a very old one though).
}
} I will try later with the experimental kernel Andrew sent me (which
} includes the new scheduler). If it indeed gives a 100% performance
} improvement that would be a significant result :-)
Up above, you said that you used HEAD. In NetBSD, HEAD is still
big lock / giant lock with only some minor exceptions. Given that a
database benchmark would be very heavy on I/O, I would expect to see a
major difference between HEAD and vmlocking.
}-- End of excerpt from Kris Kennaway