Subject: Some kernel profiling on a 5000/133 with -current
To: port-pmax <port-pmax@netbsd.org>
From: Ken Wellsch <kwellsch@tampabay.rr.com>
List: port-pmax
Date: 08/24/2001 09:41:43
With the recent discussion about disk performance, I thought I would
run another kernel profile and see what may be happening.
I'm running a -current kernel on a 5000/133 (128Mb) with a Jun 1/01
userland. The profiling kernel is custom (e.g. MIPS1 only) via the
usual "comment out" stuff using a GENERIC kernel as a base.
The summary is that a remarkable amount of time is spent in the
assembler memcpy/memmove function. The profile I did back at the
end of January (with 1.5R) also shows this as a top consumer.
Does UBC and/or scsipi require a lot of buffer bouncing?
Hopefully the profiling itself is not the culprit here.
I suppose I should try and split memcpy & memmove for better profiling,
and also instrument them so they provide proper call-graph data etc. I'm
also curious about src/dst alignment distribution as well as the copy
size distribution... but I'm not knowledgeable enough to instrument that
so that I can tug the results out of a running kernel.
Here are the details from the profiling.
The first profile output is that produced after building GENERIC
overnight (with about 4 hours of idle time after it completed):
% cumulative self self total
time seconds seconds calls us/call us/call name
25.01 573.68 573.68 70264 8164.69 8164.69 mips_idle
8.58 770.42 196.74 _mcount
7.00 930.91 160.49 memmove
6.59 1082.07 151.15 __mcount
4.75 1190.98 108.91 13355734 8.15 8.15 lockmgr
4.69 1298.54 107.56 577608 186.22 186.22 mips_pagezero
4.02 1390.83 92.29 _splset_noprof
2.78 1454.60 63.77 2185474 29.18 38.93 uvn_attach
2.37 1508.97 54.37 1277593 42.56 277.91 uvm_fault
1.82 1550.64 41.67 1347762 30.92 44.64 pmap_enter
1.23 1578.89 28.25 533670 52.94 403.19 lookup
1.15 1605.17 26.28 1880346 13.98 226.94 syscall_plain
1.08 1629.96 24.79 _splraise_noprof
1.03 1653.56 23.60 1492532 15.81 90.38 ufs_lookup
1.00 1676.54 22.98 1950 11782.85 50933.92 ffs_sync
...
0.00 2293.80 0.00 1 0.00 47.84 vn_fcntl
I then ran "bonnie -s 800" and that took over an hour, producing:
% cumulative self self total
time seconds seconds calls us/call us/call name
11.43 433.44 433.44 memmove
9.99 812.10 378.66 _mcount
8.40 1130.36 318.27 105108 3027.99 3027.99 mips_idle
8.22 1442.12 311.76 __mcount
4.73 1621.33 179.21 _splset_noprof
4.56 1794.27 172.94 18343012 9.43 9.43 lockmgr
2.91 1904.57 110.30 2310538 47.74 64.96 uvn_attach
2.85 2012.43 107.85 579127 186.23 186.23 mips_pagezero
2.24 2097.50 85.07 4048512 21.01 28.16 pmap_enter
2.11 2177.43 79.93 2003441 39.90 214.02 uvm_fault
1.89 2249.16 71.73 38764271 1.85 4.49 st_scsibus_cmprss
1.54 2307.53 58.37 1401437 41.65 41.65 mips1_FlushDCache
1.48 2363.55 56.02 4751875 11.79 11.79 memset
1.42 2417.37 53.82 _splraise_noprof
1.31 2467.15 49.78 1734271 28.70 48.90 genfs_getpages
1.14 2510.38 43.23 13678369 3.16 3.16 pmap_is_modified
1.13 2553.33 42.95 1129678 38.02 213.59 ncr53c9x_intr
1.08 2594.36 41.02 2645605 15.51 382.60 syscall_plain
1.05 2633.97 39.62 1839952 21.53 67.28 uvn_findpages
...
0.00 3790.57 0.00 1 0.00 73.84 vn_fcntl
I then ran "bonnie -s 800" again, and this is my final result:
% cumulative self self total
time seconds seconds calls us/call us/call name
12.20 702.29 702.29 memmove
9.77 1264.69 562.40 _mcount
8.73 1767.05 502.36 140244 3582.07 3582.07 mips_idle
8.21 2239.42 472.37 __mcount
5.10 2532.87 293.45 _splset_noprof
4.13 2770.59 237.73 23342045 10.18 10.19 lockmgr
2.79 2931.02 160.43 2436080 65.85 90.39 uvn_attach
2.25 3060.25 129.23 6750366 19.14 25.29 pmap_enter
2.10 3180.97 120.73 63173285 1.91 4.92 st_scsibus_cmprss
1.91 3290.66 109.69 2662591 41.20 41.20 mips1_FlushDCache
1.88 3398.83 108.17 580935 186.20 186.20 mips_pagezero
1.83 3503.85 105.02 2730621 38.46 205.57 uvm_fault
1.80 3607.41 103.57 7750359 13.36 13.36 memset
1.57 3697.68 90.26 2767455 32.62 57.45 genfs_getpages
1.44 3780.31 82.63 _splraise_noprof
1.20 3849.46 69.16 1988694 34.77 210.07 ncr53c9x_intr
1.16 3916.46 66.99 21491923 3.12 3.12 pmap_is_modified
1.02 3975.18 58.72 2977164 19.72 56.78 uvn_findpages
...
0.00 5754.21 0.00 1 0.00 83.83 vn_fcntl