NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Testing memory performance
On Tue, 20 Nov 2018 00:27:22 +0100
Michael van Elst <mlelstv%serpens.de@localhost> wrote:
> There is a global lock for the page freelist.
OK I've made changes to my bench tool to synchronize all threads before
each stage. So threads now wait for all other threads to finish
pre-faulting pages, before they all start memcpy at the same time. This
makes it more clear where time is lost.
Did some more tests on Solaris, Linux and NetBSD. Looks like NetBSD
memcpy is actually a bit faster than Linux, but NetBSD is quite slow at
servicing page faults. The latency when pre-faulting those pages is
about 18 times longer on NetBSD, which results in longer overall execution
time.
Anyway, this has been an interesting exercise.
Solaris 11.3, x1 UltraSPARC-T2 1415 MHz, 8 cores per CPU, 8 hw threads per core
$ ./sv_mem -mode=wr -size=1g -block=1K -threads=16
Per-thread metrics:
T 16 mlock 0.00 msec, preflt 1880.88 msec, memcpy 1521.74 msec (672.91 MiB/sec)
T 14 mlock 0.00 msec, preflt 1896.63 msec, memcpy 1522.38 msec (672.63 MiB/sec)
T 10 mlock 0.00 msec, preflt 1872.01 msec, memcpy 1522.73 msec (672.48 MiB/sec)
T 2 mlock 0.00 msec, preflt 1889.55 msec, memcpy 1522.43 msec (672.61 MiB/sec)
T 8 mlock 0.00 msec, preflt 1862.79 msec, memcpy 1523.32 msec (672.22 MiB/sec)
T 6 mlock 0.00 msec, preflt 1875.76 msec, memcpy 1523.68 msec (672.06 MiB/sec)
T 5 mlock 0.00 msec, preflt 1869.91 msec, memcpy 1524.26 msec (671.80 MiB/sec)
T 12 mlock 0.00 msec, preflt 1880.11 msec, memcpy 1525.13 msec (671.42 MiB/sec)
T 4 mlock 0.00 msec, preflt 1884.96 msec, memcpy 1525.37 msec (671.31 MiB/sec)
T 1 mlock 0.00 msec, preflt 1885.92 msec, memcpy 1525.54 msec (671.24 MiB/sec)
T 9 mlock 0.00 msec, preflt 1875.25 msec, memcpy 1526.15 msec (670.97 MiB/sec)
T 13 mlock 0.00 msec, preflt 1869.48 msec, memcpy 1526.74 msec (670.71 MiB/sec)
T 15 mlock 0.00 msec, preflt 1869.14 msec, memcpy 1527.30 msec (670.46 MiB/sec)
T 7 mlock 0.00 msec, preflt 1889.29 msec, memcpy 1527.45 msec (670.40 MiB/sec)
T 3 mlock 0.00 msec, preflt 1880.53 msec, memcpy 1529.22 msec (669.62 MiB/sec)
T 11 mlock 0.00 msec, preflt 1876.53 msec, memcpy 1530.20 msec (669.19 MiB/sec)
Aggregate metrics, 16 threads, 16384.00 MiB:
mlock 0.00 msec
preflt 1897.69 msec
memcpy 1530.59 msec (10704.36 MiB/sec)
Linux 4.9.0, x2 Intel Xeon E5620 2395 MHz, 4 cores per CPU, 2 hw threads per core
$ ./sv_mem -mode=wr -size=1g -block=1K -threads=16
Per-thread metrics:
T 5 mlock 0.00 msec, preflt 1192.80 msec, memcpy 1141.42 msec (897.13 MiB/sec)
T 7 mlock 0.00 msec, preflt 1211.61 msec, memcpy 1144.62 msec (894.62 MiB/sec)
T 16 mlock 0.00 msec, preflt 1211.59 msec, memcpy 1145.37 msec (894.04 MiB/sec)
T 3 mlock 0.00 msec, preflt 1207.33 msec, memcpy 1146.42 msec (893.21 MiB/sec)
T 2 mlock 0.00 msec, preflt 1211.02 msec, memcpy 1146.36 msec (893.26 MiB/sec)
T 1 mlock 0.00 msec, preflt 1210.36 msec, memcpy 1146.57 msec (893.10 MiB/sec)
T 13 mlock 0.00 msec, preflt 1208.53 msec, memcpy 1146.67 msec (893.02 MiB/sec)
T 9 mlock 0.00 msec, preflt 1209.00 msec, memcpy 1146.33 msec (893.28 MiB/sec)
T 15 mlock 0.00 msec, preflt 1210.63 msec, memcpy 1147.20 msec (892.61 MiB/sec)
T 14 mlock 0.00 msec, preflt 1190.98 msec, memcpy 1147.90 msec (892.06 MiB/sec)
T 4 mlock 0.00 msec, preflt 1193.98 msec, memcpy 1147.89 msec (892.07 MiB/sec)
T 6 mlock 0.00 msec, preflt 1194.16 msec, memcpy 1148.72 msec (891.43 MiB/sec)
T 12 mlock 0.00 msec, preflt 1191.37 msec, memcpy 1149.35 msec (890.94 MiB/sec)
T 8 mlock 0.00 msec, preflt 1196.99 msec, memcpy 1149.30 msec (890.98 MiB/sec)
T 10 mlock 0.00 msec, preflt 1197.32 msec, memcpy 1149.37 msec (890.92 MiB/sec)
T 11 mlock 0.00 msec, preflt 1197.75 msec, memcpy 1152.12 msec (888.79 MiB/sec)
Aggregate metrics, 16 threads, 16384.00 MiB:
mlock 0.00 msec
preflt 1211.96 msec
memcpy 1152.58 msec (14215.02 MiB/sec)
NetBSD-8, x2 Intel Xeon E5620 2395 MHz, 4 cores per CPU, 2 hw threads per core
$ ./sv_mem -mode=wr -size=1g -block=1K -threads=16
Per-thread metrics:
T 16 mlock 0.00 msec, preflt 18116.24 msec, memcpy 945.99 msec (1082.46 MiB/sec)
T 9 mlock 0.00 msec, preflt 18112.29 msec, memcpy 949.79 msec (1078.13 MiB/sec)
T 10 mlock 0.00 msec, preflt 18131.93 msec, memcpy 955.33 msec (1071.88 MiB/sec)
T 8 mlock 0.00 msec, preflt 17868.22 msec, memcpy 959.28 msec (1067.46 MiB/sec)
T 4 mlock 0.00 msec, preflt 17437.47 msec, memcpy 958.71 msec (1068.11 MiB/sec)
T 6 mlock 0.00 msec, preflt 16743.15 msec, memcpy 958.53 msec (1068.31 MiB/sec)
T 3 mlock 0.00 msec, preflt 18130.67 msec, memcpy 944.33 msec (1084.36 MiB/sec)
T 2 mlock 0.00 msec, preflt 18060.20 msec, memcpy 958.34 msec (1068.51 MiB/sec)
T 11 mlock 0.00 msec, preflt 17655.18 msec, memcpy 957.95 msec (1068.95 MiB/sec)
T 12 mlock 0.00 msec, preflt 18058.58 msec, memcpy 957.50 msec (1069.45 MiB/sec)
T 15 mlock 0.00 msec, preflt 17168.35 msec, memcpy 957.22 msec (1069.77 MiB/sec)
T 7 mlock 0.00 msec, preflt 17579.01 msec, memcpy 951.91 msec (1075.73 MiB/sec)
T 1 mlock 0.00 msec, preflt 17644.25 msec, memcpy 952.11 msec (1075.51 MiB/sec)
T 13 mlock 0.00 msec, preflt 17778.01 msec, memcpy 952.61 msec (1074.94 MiB/sec)
T 14 mlock 0.00 msec, preflt 18120.07 msec, memcpy 952.99 msec (1074.51 MiB/sec)
T 5 mlock 0.00 msec, preflt 18088.50 msec, memcpy 952.96 msec (1074.55 MiB/sec)
Aggregate metrics, 16 threads, 16384.00 MiB:
mlock 0.00 msec
preflt 18131.97 msec
memcpy 960.60 msec (17056.01 MiB/sec)
Home |
Main Index |
Thread Index |
Old Index