NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang



Chuck Silvers <chuq%chuq.com@localhost> writes:

[snip]

>  with the arbitrary limit on kernel virtual space removed and
>  zfs_arc_free_target fixed, this doesn't appear to be a problem in practice.
>  I suspect this is because enough kernel memory is accessed via the direct map
>  rather than being mapped in the kernel heap that the system always runs out
>  of free pages before it runs out of free kva.
>  
>  my current patch with both of these changes is attached.
>  
>  -Chuck
>  

[patch snipped]

I applied the patch to a Xen amd64 DOMU and performed the test that
hangs.  It will still cause the system to hang, but instead of a
complete hard hang, there is something more akin to a soft hang.
Nothing really responses any more on the guest (can't log into the
console, for example, but you can type your username), but at least
CTRL-T still works.  A shell was stuck in "flt_noram5" and another in
"km_getwait2".  In DDB on the guest console the UVM stats are thus:

db{0}> show uvmexp
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=16
  247536 VM pages: 7084 active, 3321 inactive, 5130 wired, 5 free
  pages  8893 anon, 3648 file, 3010 exec
  freemin=256, free-target=341, wired-max=82512
  resv-pg=1, resv-kernel=5
  bootpages=7737, poolpages=228145
  faults=118126, traps=113048, intrs=426958, ctxswitch=527493
   softint=143156, syscalls=2102209
  fault counts:
    noram=3, noanon=0, pgwait=0, pgrele=0
    ok relocks(total)=1103(1103), anget(retrys)=25680(5), amapcopy=15229
    neighbor anon/obj pg=20191/186916, gets(lock/unlock)=59508/1100
    cases: anon=14483, anoncow=11195, obj=45762, prcopy=13743, przero=31327
  daemon and swap counts:
    woke=10, revs=10, scans=22876, obscans=8537, anscans=2215
    busy=0, freed=10736, reactivate=179, deactivate=26203
    pageouts=145, pending=2156, nswget=5
    nswapdev=1, swpgavail=1048575
    swpages=1048575, swpginuse=2301, swpgonly=2280, paging=16

In the hard hang case, the number of "free" would be much larger, so I
suspect something else is running out of resources at this point (the
number for free hints at that perhaps pointing to your free page
comment).  I also noticed that the pool called "zio_data_buf_51" of size
1024 didn't grow much about 16,100 with this patch, as opposed to around
30,000 with the hard hang.  Limiting the number of vnodes didn't seem to
effect the behavior of the softer hang.  I may have also noticed that
the system was paging to swap even though all that was going on was a
zfs receive over a ssh connection.



-- 
Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org


Home | Main Index | Thread Index | Old Index