NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
The following reply was made to PR port-evbarm/56944; it has been noted by GNATS.
From: Brad Spencer <brad%anduin.eldar.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: port-evbarm-maintainer%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
netbsd-bugs%netbsd.org@localhost, pjledge%me.com@localhost
Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1
VM results in kernel thread running away and filesystem hang
Date: Thu, 28 Jul 2022 08:36:34 -0400
Chuck Silvers <chuq%chuq.com@localhost> writes:
[snip]
> with the arbitrary limit on kernel virtual space removed and
> zfs_arc_free_target fixed, this doesn't appear to be a problem in practice.
> I suspect this is because enough kernel memory is accessed via the direct map
> rather than being mapped in the kernel heap that the system always runs out
> of free pages before it runs out of free kva.
>
> my current patch with both of these changes is attached.
>
> -Chuck
>
[patch snipped]
I applied the patch to a Xen amd64 DOMU and performed the test that
hangs. It will still cause the system to hang, but instead of a
complete hard hang, there is something more akin to a soft hang.
Nothing really responses any more on the guest (can't log into the
console, for example, but you can type your username), but at least
CTRL-T still works. A shell was stuck in "flt_noram5" and another in
"km_getwait2". In DDB on the guest console the UVM stats are thus:
db{0}> show uvmexp
Current UVM status:
pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=16
247536 VM pages: 7084 active, 3321 inactive, 5130 wired, 5 free
pages 8893 anon, 3648 file, 3010 exec
freemin=256, free-target=341, wired-max=82512
resv-pg=1, resv-kernel=5
bootpages=7737, poolpages=228145
faults=118126, traps=113048, intrs=426958, ctxswitch=527493
softint=143156, syscalls=2102209
fault counts:
noram=3, noanon=0, pgwait=0, pgrele=0
ok relocks(total)=1103(1103), anget(retrys)=25680(5), amapcopy=15229
neighbor anon/obj pg=20191/186916, gets(lock/unlock)=59508/1100
cases: anon=14483, anoncow=11195, obj=45762, prcopy=13743, przero=31327
daemon and swap counts:
woke=10, revs=10, scans=22876, obscans=8537, anscans=2215
busy=0, freed=10736, reactivate=179, deactivate=26203
pageouts=145, pending=2156, nswget=5
nswapdev=1, swpgavail=1048575
swpages=1048575, swpginuse=2301, swpgonly=2280, paging=16
In the hard hang case, the number of "free" would be much larger, so I
suspect something else is running out of resources at this point (the
number for free hints at that perhaps pointing to your free page
comment). I also noticed that the pool called "zio_data_buf_51" of size
1024 didn't grow much about 16,100 with this patch, as opposed to around
30,000 with the hard hang. Limiting the number of vnodes didn't seem to
effect the behavior of the softer hang. I may have also noticed that
the system was paging to swap even though all that was going on was a
zfs receive over a ssh connection.
--
Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org
Home |
Main Index |
Thread Index |
Old Index