I have seen two machines lock up, and some experimentation implicates WAPBL and USB-connected big disks. I am curious if anyone else has seen this. Note that wapbl does a cache flush on committing the journal (vfs.wapbl.flush_disk_cache = 1). system 1: observed with netbsd-5 around a year ago soekris net5501, 512M ram, 2T wd elements external usb drive (ufs2) this disk is known to take a long time (0.5s) to flush the cache when doing rdiff-backup to the disk, or other write-heavy workloads, the machine froze and it appeared that all processes were in tstile. I believe ping still worked. I stopped mounting the external disk with wapbl (an internal 40G disk still uses it, but it's not a backup target), and the machine has been 100% solid. system 2: observed with netbsd-6 from this spring evbppc (p2020) with 2G ram, 3T usb disk (ufs2) lockups observed when doing a git clone of a huge repo. The clone went ok, but the subsequent checkout precipitated the problem. The watchdog reset worked fine, and I'm not sure what state things were in. I remembered the net5501 issue, and after turning off wapbl the system is stable, completing a self-hosting build and builds of packages. The on-disk journal should be only 64M, so that wouldn't seem to really stress memory. So I wonder if there is something that backs up in RAM when there's a continuous stream of writes, and running out isn't handled gracefully.
Attachment:
pgpYtq151BeGm.pgp
Description: PGP signature