NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
Tobias Nygren <tnn%NetBSD.org@localhost> writes:
> The following reply was made to PR port-evbarm/56944; it has been noted by GNATS.
>
> From: Tobias Nygren <tnn%NetBSD.org@localhost>
> To: gnats-bugs%netbsd.org@localhost
> Cc:
> Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1
> VM results in kernel thread running away and filesystem hang
> Date: Wed, 27 Jul 2022 19:18:27 +0200
>
> When pagedaemon is spinning it is indicative of a memory pressure
> situation that is unresolvable. The interaction between pagedaemon and
> zfs is primarily ARC reclamation. Some observations:
I believe that I can reproduce this problem on demand with a Xen PVH
amd64 guest. I have one that if I do a particular zfs receve to it I
can cause a hang that appears to be what is being described in just a
couple of minutes.
I watched the system while I performed the zfs receive that hangs with
top and "vmstat -m". I didn't notice that top caught the pagedaemon
running away before the hang (although that may just be a display
thing.. and top was unable to print due to the hang), but I did notice
that the pool named "zio_data_buf_51" that is of size 1024 (there appear
to be two by that name) was increasing quite a lot during the receive.
The hang happened when that pool hit around 30000 requests. I can get
into ddb on the Xen console of the guest when the hang happens and a ps
there has a ">" next to the pagedaemon process, which I think means that
it was running. I should probably mention that this is a 9.99.98 guest,
so not the most recent -current.
> 1) It doesn't look like we initialise zfs_arc_free_target, unlike FreeBSD.
> 2) FreeBSD has additional code to check for kva fragmentation which
> we do not.
>
> So it might be worthwhile to experiment with zfs_arc_free_target to
> preemptively avoid the situation where the kernel fails to reclaim enough
> pages to continue working. Here's a patch for zfs.kmod you could try:
I tried this patch on the mentioned Xen guest and as far as I can tell
it did not seem to help the situation I am seeing. The system is
running with an otherwise unmodified arc.c file.
> --- external/cddl/osnet/dist/uts/common/fs/zfs/arc.c 4 May 2022 15:49:55 -0000 1.21
> +++ external/cddl/osnet/dist/uts/common/fs/zfs/arc.c 27 Jul 2022 17:10:16 -0000
> @@ -387,7 +387,7 @@ int zfs_arc_grow_retry = 0;
> int zfs_arc_shrink_shift = 0;
> int zfs_arc_p_min_shift = 0;
> uint64_t zfs_arc_average_blocksize = 8 * 1024; /* 8KB */
> -u_int zfs_arc_free_target = 0;
> +u_int zfs_arc_free_target = 32 * 1024 * 1024;
>
> /* Absolute min for arc min / max is 16MB. */
> static uint64_t arc_abs_min = 16 << 20;
> @@ -3919,6 +3919,14 @@ arc_available_memory(void)
> r = FMR_LOTSFREE;
> }
>
> +#ifdef __NetBSD__
> + n = PAGESIZE * ((int64_t)freemem - desfree);
> + if (n < lowest) {
> + lowest = n;
> + r = FMR_LOTSFREE;
> + }
> +#endif
> +
>
I should also mention that if I let /etc/daily run on this guest it will
also hang the system probably when the core file check or some like that
runs across the ZFS file set. I have not been entirely able to narrow
down which part of the daily cron run is tripping the hang, but I do
know that the hang disappears if I comment out /etc/daily from the root
cron tab. I also see this exact same hang on a fairly new 9.x Xen guest
that has ZFS filesets on it, with the same "solution" of commenting out
/etc/daily.
--
Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org
Home |
Main Index |
Thread Index |
Old Index