Subject: Re: Thread benchmarks
To: Andrew Doran <ad@netbsd.org>
From: Kris Kennaway <kris@FreeBSD.org>
List: tech-kern
Date: 10/02/2007 00:26:55
Andrew Doran wrote:
>> In tests that have been run on p4 hardware, the FreeBSD system's graph
>> looks more like NetBSD's than the one presented here. FreeBSD's kernel
>> has a lot of debugging options that hurt performance on by default. Also,
>> FreeBSD's malloc defaults to 'AJ' in head, which would result in reduced
>> performance.
>
> I can try turning off debugging in the allocator. What else would you like
> me to try? I would like to provide remote access to the two systems but
> unfortunatley my Internet link is unreliable and I'm not in a position to
> leave them on 24x7. Some details on the test. I grabbed my.cnf from Jeff
> Roberson's weblog:
You should rebuild malloc with MALLOC_PRODUCTION defined (edit
lib/libc/stdlib/malloc.c) as well as making sure that either
/etc/malloc.conf is removed or symlinked to 'aj'. This is pretty important.
Could you also provide a copy of your FreeBSD kernel configuration file
just so we can double-check?
> http://people.freebsd.org/~jeff/bsd.cnf
OK, the only difference to my config is that I have
innodb_log_file_size=900M
instead of 100M.
> Relevant bits of dmesg from the MySQL host:
>
> total memory = 2047 MB
> avail memory = 2008 MB
> cpu0: Intel Pentium III Xeon (686-class), 701.64 MHz, id 0x6a1
> cpu0: features 383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
> cpu0: features 383fbff<PGE,MCA,CMOV,PAT,PSE36,MMX>
> cpu0: features 383fbff<FXSR,SSE>
> cpu0: I-cache 16 KB 32B/line 4-way, D-cache 16 KB 32B/line 4-way
> cpu0: L2 cache 1 MB 32B/line 8-way
> cpu0: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative
> cpu0: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way
> fxp0 at pci1 dev 6 function 0: i82559 Ethernet, rev 8
> fxp0: interrupting at ioapic0 pin 3 (irq 3)
> fxp0: Ethernet address 00:02:a5:45:a6:48
> inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
> inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
>
> The disk subsystem doesn't matter since I was running the read-only test,
> and with 10000 rows everything fits in core. I compiled MySQL by hand on
> each system:
>
> ./configure --prefix=/local/mysql --with-pthread --with-innodb
OK. The FreeBSD port also defines
--enable-thread-safe-client
--without-debug
--enable-assembler
(and some other options that don't look relevant). --with-pthread might
enable the first option but if not it could cause performance
anomalies (i.e. this is relevant for the client, of course). For
example I accidentally built postgresql without threaded client support
recently and spent a while trying to work out why sysbench suddenly ran
at half speed.
> Everything but necessary processes were killed on the two systems, so they
> were running at most sshd, screen, sysbench and the minimum to be able to
> log in. I did a warm-up run and then started testing:
>
> for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20; do
> echo "=> ${i} THREADS"
> sysbench --test=oltp --db-driver=mysql --mysql-host=${HOST} \
> --mysql-user=root --mysql-table-engine=innodb --num-threads=${i} \
> --max-time=60 --max-requests=0 --oltp-read-only=on run | \
> tee -a ${HOST}.txt
> done
I use
sysbench --test=oltp --num-threads=$1 --mysql-user=root --max-time=120
--max-requests=0 --oltp-read-only=on --db-driver=mysql
--mysql-host=192.168.5.120 run
which seems to be equivalent (the default table engine is innodb in our
config).
Can you run 'vmstat -w 1' for e.g. 30 seconds on your FreeBSD system
when the test is running? I see total CPU usage at 100%, with system at
20-25% and the rest user.
> The two systems are connected via 100Mbps switch. The sysbench host was
> running NetBSD/i386 4.99.30 and has a dual core CPU:
I tested on a quad 500 MHz p3 (i.e. 30% slower clock speed than your
system), via 100Mbps em0. Performance was already at the level of the
FreeBSD curve on your graph (about 320 tps across a range of loads), and
if I scale up by 700/500 then it's about the same as your NetBSD curve.
I suspect that this will actually underestimate performance a bit
because the CPU is an older generation than yours, so the difference is
not just clock speed. One thing that is kind of interesting is that
some of the locking optimizations that we have not yet committed don't
make a difference on this machine and workload, apparently they are only
important at 8 CPUs and above.
Anyway, this all suggests to me that something is going wrong on your
system, so if the above doesn't help then we'll have to look closer.
One other possibility is that your NIC may be misbehaving.
Kris