tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: continued zfs-related lockups
Hi Greg,
Greg Troxel wrote:
> The good news is that the problem is not subtle and I have been able to
> reproduce the lockup. And several times, if just barely provoked, the system
> came back. At least once, it didn't come back.
>
> I created a netbsd-current domU (pvhvm) with
>
> 6G RAM
> xbd0: 32G ffs2 root
> xbd1: 8G swap
> xbd2: 32G gpt with one big zfs partiition
>
> tank11: pool with just dk0 from xbd2
>
> Not sure it matters, but the backing disks for the xbdN are zvol in zfs on
> dom0, on a not particularly new Sandisk 1T SATA SSD.
>
> I wrote a script:
>
> create 100 dirs with 100 files each
> sync
> sleep 10
> remove the files
> sync
>
> Long ago I wrote a program "touchmem" to allocate a specific amount of memory,
> writing into each page to force allocation.
I can reproduce this behaviour with:
for d in $(seq 0 99); do
echo dir $d; mkdir dir$d
seq 0 99 | xargs -n 1 -I % sh -c "echo $d % > dir$d/%"
done
rm -rf dir? dir?? &
vmstat
[ check how much KB is free ]
dd if=/dev/zero of=/dev/null bs=820000k count=50
[ where 820000 kB was just under the amount of memory free ]
After creating the files, this also works to trigger the messages:
vmstat
[ check how many KB is free ]
dd if=/dev/zero of=/dev/null bs=820000k count=50
[ where 820000 kB was just under the amount of memory free ]
find dir* -type f | xargs cat > /dev/null
The "dd if=/dev/zero of=/dev/null bs=XXX" thing is a good way to
allocate a chunk of user memory, probably quite similar to how your
"touchmem" program does in practice.
> I found that the removal process was slow, and if I ran touchmem 6000 (to
> allocate 6000K) I would get on the console (this is an example where it
> came back).
zfs rm is known to be slow and not simple to fix :/ It effectively
does a synchronous write for each unlink.
> [ 2247.3254720] arc_reclaim_thread: negative free_memory -15888384
Doesn't this mean "Can you try to free 15888384 bytes if possible"?
On my test host I see a number of these types of messages
[ 21715.5174433] arc_reclaim_thread: free memory = -2420736
and
# vmstat -s | awk '/target/ { print ; print $1 * 4096, "bytes" }'
2730 target free pages
11182080 bytes
which means we'd like to free up about 2.3MB (591 pages) to reach the
system target of 2730 pages.
Running a few "sysctl kstat.zfs.misc.arcstats.size" shows:
- before the "dd" and "find ... cat":
kstat.zfs.misc.arcstats.size = 31990280
- during the "dd":
kstat.zfs.misc.arcstats.size = 31991240
kstat.zfs.misc.arcstats.size = 31996984
- after "dd" and "find ... cat" finishes
kstat.zfs.misc.arcstats.size = 31995776
I think this is ZFS noticing free memory is low and trying to do
something about it, but perhaps not very successfully?
> I wonder if others who have problems also see this kernel message.
This is on an amd64 qemu VM with 1GB of RAM and a 384MB disk (all ZFS).
Cheers,
Simon.
Home |
Main Index |
Thread Index |
Old Index