NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
continued zfs-related lockups
I have having continued zfs-related lockups on two systems and am
posting some anecdata/comments. I am building a LOCKDEBUG kernel to see
if that changes anything. Both systems are up-to-date netbsd-10.
System 1 is bare metal, 32G ram.
System 2 is xen, 4000M RAM in the dom0. Issues described are provoked
in the dom0 without domUs doing that much.
I am carrying a patch to reduce the arc size based on memory size.
Something like this should be committed because the current approach
just uses a lot. This is paged out of my head, but here are my boot
printfs on system 1:
ARCI 002 arc_abs_min 16777216
ARCI 002 arc_c_min 1067485440
ARCI 005 arc_c_max 4269941760
ARCI 010 arc_c_min 1067485440
ARCI 010 arc_p 2134970880
ARCI 010 arc_c 4269941760
ARCI 010 arc_c_max 4269941760
ARCI 011 arc_meta_limit 1067485440
Basically you can see about 4G of data and 1G for meta.
On system 2:
ARCI 002 arc_abs_min 16777216
ARCI 002 arc_c_min 131072000
ARCI 005 arc_c_max 524288000
ARCI 010 arc_c_min 131072000
ARCI 010 arc_p 262144000
ARCI 010 arc_c 524288000
ARCI 010 arc_c_max 524288000
ARCI 011 arc_meta_limit 131072000
Basically you can see 512MB for arc data and 128MB for meta.
These values should not mess up a 32G and 4000 MB system. One can argue
about whether they should be somewhat bigger or somewhat smaller of
course, and more importantly how memory pressure on other things should
interact. IMHO they should be considered part of the file cache.
Also, due to past experiences I have the following in sysctl.conf. I
think it was having processes paged out to make room for file cache when
that resulted in performance I didn't like.
# \todo Reconsider and document
vm.filemin=5
vm.filemax=10
vm.anonmin=5
vm.anonmax=80
vm.execmin=5
vm.execmax=50
vm.bufcache=5
* system 1 (32G bare metal)
The problem smells like "proceses running out of ram and asking for more
while the system is doing lots of zfs operations". Sort of pkg_rr or
build.sh running, and flipping tabs in firefox. The lockup starts
gradually and gets worse. If I don't leave firefox running overnight,
and especially if I don't leave piggy tabs open, crashes are much less
frequent.
I managed to catch it early and flip out of x to text and then into ddb.
I am still learning how to interpret things, but:
there were several processes in tstile. the underlying locks seem to
be:
- zfs:buf_hash_table+0x1300
- netbsd:vfs_suspend_lock (from a rename system call IIRC)
some of the wchans are flt_noram5. I realize that is normal
several pools super big:
- zfs_znode_cache: size 240 npages 822221
- zio_buf_512: size 512 npages 240926 nitems 735236 nout 1187132
I interpret this as
zfs has allocated too much ram
something did a fsop which requires vfs suspend
something else tried to operate during that suspend, perhaps
deadlocking with ram acquisition
The mystery is why others aren't seeing this.
* system 2 (4000 MB dom0)
This machine is used for building packages, in the dom0 and in 4 domUs,
and has distfiles, pkgsrc for about a year of quarters, current and wip,
binary packages and ccache dirs.
The real issue seems to be the ccache dirs. In total there are 16G of
cachefiles across 5 cpu/os/version tuple values (the 4 it uses, plus
netbsd-10-aarch64 over nfs from a RPI4). There are about 1.5M files.
Just find on that pushes pool usage from 657K to 3635K.
$ vmstat -m|tail -5; find /tank0/ccache -type f|wc -l; vmstat -m|tail -5
[white space adjusted to make this easier to follow]
zio_link_cache 48 3695 0 0 44 0 44 44 0 inf 0
Totals 763425 0 83834 82688 0 82688
In use 636990K, total allocated 657236K; utilization 96.9%
1510577
zio_link_cache 48 3695 0 0 44 0 44 44 0 inf 0
Totals 5223892 0 365030 539962 0 539962
In use 3539172K, total allocated 3635060K; utilization 97.4%
On this system, the symptom is that the system just stops responding. I
am running
/sbin/wdogctl -x -p 367 tco0
and it recovers automatically.
It has been crashing on /etc/daily. I am able to provoke the crash with
"find /big/place -type f | wc -l". It feels worse lately, and I
wondered about hardware. I rolled back to a kernel from 6/26 from 7/1.
It could be that my data has gotten bigger.
I find that total pool usage as reported by vmstat -m goes up as I run
find, to unreasoable levels. As an example after running find over
ccache and pkgsrc-current, I see
In use 1645282K, total allocated 1751476K; utilization 93.9%
which is way too much for a 4000M machine.
It seems like if I do
find /place1
wait many minutes
find /place2
then the second find does not vastly increase pools. But if I don't
wait, it does. I have seen pool total (top) as high as 3465M.
Eventually when pushed too far, the machine locks up, and entering ddb I
saw nothing in tstile.
Stay tuned for lockdebug info on system 2. I can crash that without
causing me extra effort.
Overall, it feels like zfs has some kind of cache that is not arc, and
has unreasonable limits.
I realize some of you think 4000M is low memory, but the system should
be stable if slow anyway. and 32G is really a healthy amount of ram
these days.
So: if you have a system with a lot of files in zfs, and you don't mind
crashing it, running find would be interesting.
Even if it doesn't crash, seeing vmstat -m would be interesting.
Here are pools on the 32G system, up 5 days;
Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
zio_buf_2048 2048 155225 0 153857 60590 59530 1060 17041 0 inf 1
arc_buf_t 32 960575 0 919523 1417 147 1270 1417 0 inf 0
uarea 24576 3507 0 2694 2148 866 1282 1282 0 inf 469
i915_vma 704 14335 0 6162 2362 727 1635 1635 0 inf 0
zio_buf_4096 4096 33584 0 31722 21870 20008 1862 5335 0 inf 0
range_seg_cache 64 735359 0 679561 3968 2062 1906 2985 0 inf 0
drm_i915_gem_ob 768 15917 0 6368 2656 746 1910 1910 0 inf 0
kmem-00256 256 339404 0 309748 18365 16446 1919 16939 0 inf 65
kmem-00008 8 1437175 0 1307639 1959 12 1947 1959 0 inf 19
dirhashblk 2048 4768 0 379 2269 74 2195 2229 0 inf 0
kmem-00768 768 38965 0 26829 3343 915 2428 2428 0 inf 0
kmem-00128 128 515001 0 469640 14110 11490 2620 14110 0 inf 767
zio_buf_2560 2560 177067 0 173838 152692 149463 3229 46055 0 inf 0
vmmpepl 128 326633 0 277087 3821 558 3263 3263 0 inf 284
ractx 32 618097 0 208209 3782 482 3300 3383 0 inf 10
vmembt 64 644698 0 492932 3492 0 3492 3492 0 inf 0
zio_cache 984 65228 0 64252 7865 4122 3743 4476 0 inf 3499
bufpl 272 82024 0 25305 4636 665 3971 3971 0 inf 11
kmem-00032 32 1100883 0 807683 4978 811 4167 4978 0 inf 116
buf16k 16384 29586 0 14699 5639 1422 4217 4430 0 inf 0
zio_data_buf_13 131072 109221 0 104477 92654 87910 4744 17380 0 inf 0
kmem-01024 1024 80367 0 55194 9091 2051 7040 7424 0 inf 746
kva-16384 16384 431364 3 290889 9652 289 9363 9634 0 inf 0
pcglarge 1024 1280332 0 1236977 44969 34130 10839 11712 0 inf 0
kmem-00384 384 517294 0 408327 20823 9926 10897 10897 0 inf 0
phpool-64 56 2038826 4 1494992 12943 1023 11920 12917 0 inf 20
pvpage 4096 31460 1 21252 24435 11658 12777 14270 0 inf 2569
zio_buf_3584 3584 283333 0 270526 253663 240856 12807 71768 0 inf 0
kmem-00064 64 2452484 0 1843883 15528 1886 13642 15528 0 inf 2091
radixnode 128 857688 0 410392 17462 2971 14491 14491 0 inf 0
zio_buf_3072 3072 355898 0 340436 306458 290996 15462 96733 0 inf 0
buf2k 2048 59752 0 18803 27664 7188 20476 20477 0 inf 0
anonpl 32 10356573 0 7483518 36255 13452 22803 24315 0 inf 0
pcgnormal 256 3758866 0 3660775 72144 49266 22878 36859 0 inf 1271
kmem-00192 192 1066440 0 735229 25691 336 25355 25691 0 inf 1414
arc_buf_hdr_t_f 200 2852920 0 2738056 46724 21056 25668 38169 0 inf 4
mutex 64 2386139 0 717605 26491 4 26487 26487 0 inf 0
ffsdino2 256 690230 0 206450 33605 475 33130 33130 0 inf 1268
ffsino 272 690230 0 206450 35861 522 35339 35339 0 inf 1377
zio_buf_16384 16384 704287 32 667026 644883 607622 37261 149436 0 inf 0
sa_cache 104 3115636 0 2072844 42952 1936 41016 42952 0 inf 30
rwlock 64 3849631 0 561234 52220 23 52197 52197 0 inf 0
namecache 128 2437368 0 850070 53742 0 53742 53742 0 inf 0
kmem-02048 2048 805665 0 663183 253638 182078 71560 251451 0 inf 319
zfs_znode_cache 240 3088835 0 2046043 109029 12668 96361 102010 0 inf 23
dmu_buf_impl_t 208 6425103 0 5303519 124086 2184 121902 124086 0 inf 2697
vcachepl 576 2252152 0 725245 240158 1866 238292 238292 0 inf 1
zio_buf_512 512 7094849 0 6011939 343777 96177 247600 279141 0 inf 0
dnode_t 632 5534168 0 4451068 372058 39269 332789 364352 0 inf 10837
In contrast, on a 5G domU with no zfs, I see 800M of pools. The top
users by npages are
Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
vmmpepl 144 80217 0 53575 1028 13 1015 1015 0 inf 12
kmem-2048 2048 8529 0 6268 1729 598 1131 1194 0 inf 0
kmem-1024 1024 17696 0 12741 1450 182 1268 1300 0 inf 4
pcglarge 1024 62614 0 58086 8374 7097 1277 1277 0 inf 145
kva-4096 4096 149007 0 42472 1989 172 1817 1817 0 inf 0
anonpl 32 1574629 0 1152567 3711 240 3471 3611 0 inf 0
buf1k 1024 156309 0 29550 4702 729 3973 4096 1 1 0
mutex 64 784470 0 518929 6424 977 5447 5470 0 inf 1
pvpl 40 1271682 0 726803 5734 121 5613 5703 0 inf 0
buf8k 8192 40441 42 17360 7585 1814 5771 6869 1 1 0
bufpl 296 175399 0 24677 12733 1139 11594 12114 0 inf 0
ncache 192 522831 0 271515 12258 47 12211 12211 0 inf 0
ffsdino2 256 797229 0 542806 27135 8784 18351 20976 0 inf 0
ffsino 256 790479 0 536056 27110 8736 18374 20976 0 inf 0
vcachepl 336 786851 0 531358 37452 13970 23482 27972 0 inf 0
Home |
Main Index |
Thread Index |
Old Index