NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
I just tried Chuck’s latest patch, and was able to transfer data for about 3 hours before the kernel thread got into the loop, up from about 15 minutes. So improvement, but not resolved.
I’ll see if I can get a DDB session running next time.
> On Jul 28, 2022, at 8:40 AM, Brad Spencer <brad%anduin.eldar.org@localhost> wrote:
>
> The following reply was made to PR port-evbarm/56944; it has been noted by GNATS.
>
> From: Brad Spencer <brad%anduin.eldar.org@localhost>
> To: gnats-bugs%netbsd.org@localhost
> Cc: port-evbarm-maintainer%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
> netbsd-bugs%netbsd.org@localhost, pjledge%me.com@localhost
> Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1
> VM results in kernel thread running away and filesystem hang
> Date: Thu, 28 Jul 2022 08:36:34 -0400
>
> Chuck Silvers <chuq%chuq.com@localhost> writes:
>
> [snip]
>
>> with the arbitrary limit on kernel virtual space removed and
>> zfs_arc_free_target fixed, this doesn't appear to be a problem in practice.
>> I suspect this is because enough kernel memory is accessed via the direct map
>> rather than being mapped in the kernel heap that the system always runs out
>> of free pages before it runs out of free kva.
>>
>> my current patch with both of these changes is attached.
>>
>> -Chuck
>>
>
> [patch snipped]
>
> I applied the patch to a Xen amd64 DOMU and performed the test that
> hangs. It will still cause the system to hang, but instead of a
> complete hard hang, there is something more akin to a soft hang.
> Nothing really responses any more on the guest (can't log into the
> console, for example, but you can type your username), but at least
> CTRL-T still works. A shell was stuck in "flt_noram5" and another in
> "km_getwait2". In DDB on the guest console the UVM stats are thus:
>
> db{0}> show uvmexp
> Current UVM status:
> pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=16
> 247536 VM pages: 7084 active, 3321 inactive, 5130 wired, 5 free
> pages 8893 anon, 3648 file, 3010 exec
> freemin=256, free-target=341, wired-max=82512
> resv-pg=1, resv-kernel=5
> bootpages=7737, poolpages=228145
> faults=118126, traps=113048, intrs=426958, ctxswitch=527493
> softint=143156, syscalls=2102209
> fault counts:
> noram=3, noanon=0, pgwait=0, pgrele=0
> ok relocks(total)=1103(1103), anget(retrys)=25680(5), amapcopy=15229
> neighbor anon/obj pg=20191/186916, gets(lock/unlock)=59508/1100
> cases: anon=14483, anoncow=11195, obj=45762, prcopy=13743, przero=31327
> daemon and swap counts:
> woke=10, revs=10, scans=22876, obscans=8537, anscans=2215
> busy=0, freed=10736, reactivate=179, deactivate=26203
> pageouts=145, pending=2156, nswget=5
> nswapdev=1, swpgavail=1048575
> swpages=1048575, swpginuse=2301, swpgonly=2280, paging=16
>
> In the hard hang case, the number of "free" would be much larger, so I
> suspect something else is running out of resources at this point (the
> number for free hints at that perhaps pointing to your free page
> comment). I also noticed that the pool called "zio_data_buf_51" of size
> 1024 didn't grow much about 16,100 with this patch, as opposed to around
> 30,000 with the hard hang. Limiting the number of vnodes didn't seem to
> effect the behavior of the softer hang. I may have also noticed that
> the system was paging to swap even though all that was going on was a
> zfs receive over a ssh connection.
>
>
>
> --
> Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org
>
Home |
Main Index |
Thread Index |
Old Index