I have a large number of systems, some -5, some -6, and (leaving ancient/flaky hardware aside) except for 1 all are completely stable. The problem machine is running recent netbsd-5, amd64. It has a SCSI tape drive, which basically works fine. ahc0 at pci1 dev 0 function 0: Adaptec 29160 Ultra160 SCSI adapter ahc0: interrupting at ioapic0 pin 21 ahc0: aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs scsibus0 at ahc0: 16 targets, 8 luns per target scsibus0: waiting 2 seconds for devices to settle... st0 at scsibus0 target 6 lun 0: <CERTANCE, ULTRIUM 2, 1914> tape removable st0: density code 66, 512-byte blocks, write-enabled st0: sync (12.50ns offset 127), 16-bit (160.000MB/s) transfers The machine has 3 disks, all normal SATA, with 2 in raidframe RAID1 and one by itself. The machine is prone to lockups, where it will forward packets fine, and top left running shows a lot of processes stuck in tstile. They happen to be check_disk from nagios_plugins, and I'm not at at all sure that's the cause vs the symptom. So I think something is getting wedged in the kernel either the biglock staying locked, or in the storage/block world somehow. Because the only odd thing is the st0, I am suspecting the scsi layer locking code. It's a production machine so LOCKDEBUG seems a bit much. I realize we should try to get a kernel crash dump. Has anyone else seen issues that might look like this?
Attachment:
pgpHjpZVVSv22.pgp
Description: PGP signature