NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
The following reply was made to PR port-evbarm/56944; it has been noted by GNATS.
From: Paul Lavoie <pjledge%me.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: port-evbarm-maintainer%netbsd.org@localhost,
gnats-admin%netbsd.org@localhost,
netbsd-bugs%netbsd.org@localhost
Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM
results in kernel thread running away and filesystem hang
Date: Thu, 28 Jul 2022 12:43:46 -0400
I just tried Chuck=E2=80=99s latest patch, and was able to transfer data =
for about 3 hours before the kernel thread got into the loop, up from =
about 15 minutes. So improvement, but not resolved.
I=E2=80=99ll see if I can get a DDB session running next time.
> On Jul 28, 2022, at 8:40 AM, Brad Spencer <brad%anduin.eldar.org@localhost> =
wrote:
>=20
> The following reply was made to PR port-evbarm/56944; it has been =
noted by GNATS.
>=20
> From: Brad Spencer <brad%anduin.eldar.org@localhost>
> To: gnats-bugs%netbsd.org@localhost
> Cc: port-evbarm-maintainer%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
> netbsd-bugs%netbsd.org@localhost, pjledge%me.com@localhost
> Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in =
Mac M1
> VM results in kernel thread running away and filesystem hang
> Date: Thu, 28 Jul 2022 08:36:34 -0400
>=20
> Chuck Silvers <chuq%chuq.com@localhost> writes:
>=20
> [snip]
>=20
>> with the arbitrary limit on kernel virtual space removed and
>> zfs_arc_free_target fixed, this doesn't appear to be a problem in =
practice.
>> I suspect this is because enough kernel memory is accessed via the =
direct map
>> rather than being mapped in the kernel heap that the system always =
runs out
>> of free pages before it runs out of free kva.
>>=20
>> my current patch with both of these changes is attached.
>>=20
>> -Chuck
>>=20
>=20
> [patch snipped]
>=20
> I applied the patch to a Xen amd64 DOMU and performed the test that
> hangs. It will still cause the system to hang, but instead of a
> complete hard hang, there is something more akin to a soft hang.
> Nothing really responses any more on the guest (can't log into the
> console, for example, but you can type your username), but at least
> CTRL-T still works. A shell was stuck in "flt_noram5" and another in
> "km_getwait2". In DDB on the guest console the UVM stats are thus:
>=20
> db{0}> show uvmexp
> Current UVM status:
> pagesize=3D4096 (0x1000), pagemask=3D0xfff, pageshift=3D12, =
ncolors=3D16
> 247536 VM pages: 7084 active, 3321 inactive, 5130 wired, 5 free
> pages 8893 anon, 3648 file, 3010 exec
> freemin=3D256, free-target=3D341, wired-max=3D82512
> resv-pg=3D1, resv-kernel=3D5
> bootpages=3D7737, poolpages=3D228145
> faults=3D118126, traps=3D113048, intrs=3D426958, ctxswitch=3D527493
> softint=3D143156, syscalls=3D2102209
> fault counts:
> noram=3D3, noanon=3D0, pgwait=3D0, pgrele=3D0
> ok relocks(total)=3D1103(1103), anget(retrys)=3D25680(5), =
amapcopy=3D15229
> neighbor anon/obj pg=3D20191/186916, gets(lock/unlock)=3D59508/1100
> cases: anon=3D14483, anoncow=3D11195, obj=3D45762, prcopy=3D13743, =
przero=3D31327
> daemon and swap counts:
> woke=3D10, revs=3D10, scans=3D22876, obscans=3D8537, anscans=3D2215
> busy=3D0, freed=3D10736, reactivate=3D179, deactivate=3D26203
> pageouts=3D145, pending=3D2156, nswget=3D5
> nswapdev=3D1, swpgavail=3D1048575
> swpages=3D1048575, swpginuse=3D2301, swpgonly=3D2280, paging=3D16
>=20
> In the hard hang case, the number of "free" would be much larger, so I
> suspect something else is running out of resources at this point (the
> number for free hints at that perhaps pointing to your free page
> comment). I also noticed that the pool called "zio_data_buf_51" of =
size
> 1024 didn't grow much about 16,100 with this patch, as opposed to =
around
> 30,000 with the hard hang. Limiting the number of vnodes didn't seem =
to
> effect the behavior of the softer hang. I may have also noticed that
> the system was paging to swap even though all that was going on was a
> zfs receive over a ssh connection.
>=20
>=20
>=20
> --=20
> Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - =
http://anduin.eldar.org
>=20
Home |
Main Index |
Thread Index |
Old Index