Re: continued zfs-related lockups

To: Greg Troxel <gdt%lexort.com@localhost>
Subject: Re: continued zfs-related lockups
From: Simon Burge <simonb%NetBSD.org@localhost>
Date: Thu, 07 Nov 2024 00:53:40 +1100

Hi Greg,

Greg Troxel wrote:

> The good news is that the problem is not subtle and I have been able to
> reproduce the lockup.  And several times, if just barely provoked, the system
> came back. At least once, it didn't come back.
>
> I created a netbsd-current domU (pvhvm) with
>
>   6G RAM 
>   xbd0: 32G ffs2 root
>   xbd1: 8G swap
>   xbd2: 32G gpt with one big zfs partiition
>
>   tank11: pool with just dk0 from xbd2
>
> Not sure it matters, but the backing disks for the xbdN are zvol in zfs on
> dom0, on a not particularly new Sandisk 1T SATA SSD.
>
> I wrote a script:
>
>   create 100 dirs with 100 files each
>   sync
>   sleep 10
>   remove the files
>   sync
>
> Long ago I wrote a program "touchmem" to allocate a specific amount of memory,
> writing into each page to force allocation.

I can reproduce this behaviour with:

	for d in $(seq 0 99); do
	  echo dir $d; mkdir dir$d
	  seq 0 99 | xargs -n 1 -I % sh -c "echo $d % > dir$d/%"
	done
	rm -rf dir? dir?? &
	vmstat
	   [ check how much KB is free ]
	dd if=/dev/zero of=/dev/null bs=820000k count=50
	   [ where 820000 kB was just under the amount of memory free ]

After creating the files, this also works to trigger the messages:

	vmstat
	   [ check how many KB is free ]
	dd if=/dev/zero of=/dev/null bs=820000k count=50
	   [ where 820000 kB was just under the amount of memory free ]
	find dir* -type f | xargs cat > /dev/null


The "dd if=/dev/zero of=/dev/null bs=XXX" thing is a good way to
allocate a chunk of user memory, probably quite similar to how your
"touchmem" program does in practice.

> I found that the removal process was slow, and if I ran touchmem 6000 (to
> allocate 6000K) I would get on the console (this is an example where it
> came back).

zfs rm is known to be slow and not simple to fix :/  It effectively
does a synchronous write for each unlink.


> [ 2247.3254720] arc_reclaim_thread: negative free_memory -15888384

Doesn't this mean "Can you try to free 15888384 bytes if possible"?

On my test host I see a number of these types of messages

[ 21715.5174433] arc_reclaim_thread: free memory = -2420736

and

   # vmstat -s | awk '/target/ { print ; print $1 * 4096, "bytes" }'
        2730 target free pages
   11182080 bytes

which means we'd like to free up about 2.3MB (591 pages) to reach the
system target of 2730 pages.

Running a few "sysctl kstat.zfs.misc.arcstats.size" shows:

 - before the "dd" and "find ... cat":
	kstat.zfs.misc.arcstats.size = 31990280
 - during the "dd":
	kstat.zfs.misc.arcstats.size = 31991240
	kstat.zfs.misc.arcstats.size = 31996984
 - after "dd" and "find ... cat" finishes
	kstat.zfs.misc.arcstats.size = 31995776

I think this is ZFS noticing free memory is low and trying to do
something about it, but perhaps not very successfully?

> I wonder if others who have problems also see this kernel message.

This is on an amd64 qemu VM with 1GB of RAM and a 384MB disk (all ZFS).

Cheers,
Simon.

Follow-Ups:
- Re: continued zfs-related lockups
  - From: Greg Troxel

References:
- Re: continued zfs-related lockups
  - From: Greg Troxel

Prev by Date: unmount(2) returning EROFS
Next by Date: Re: unmount(2) returning EROFS
Previous by Thread: Re: continued zfs-related lockups
Next by Thread: Re: continued zfs-related lockups
Indexes:

Home | Main Index | Thread Index | Old Index