Subject: Re: SCSI probs on spork 10
To: NetBSD list <netbsd@mrynet.com>
From: Jim Bernard <jbernard@mines.edu>
List: port-sparc
Date: 06/19/2001 20:51:14
On Tue, Jun 19, 2001 at 04:53:20PM -0500, NetBSD list wrote:
> cvs versions beyond 1.5V of the NetBSD kernel hang on my Sparc 10.
>
> The last working kernel I have is:
> NetBSD dudley 1.5V NetBSD 1.5V (MRYSPARC) #3: Tue May 15 14:58:39 CDT 2001 root@dudley:/usr/src/sys/arch/sparc/compile/MRYSPARC sparc
>
> Since sometime after that date, disk activity will hang (I'm guessing)
> the SCSI bus. A typical way I can cause this to happen is by cvs'ing
> the source tree. In about 5 seconds, the machine stops all disk access.
> No errors are produced. Sessions continue to echo characters, but no
> commands are executed. It's just off in la-la land (much as I often
> aspire to be).
>
> Anyone else having such an issue? I've cleared out /usr/src many
> times in the last month since then and started over, but the problem
> persists here. The kernel I'm running is GENERIC modified for 128 users.
>
> Please help guide me in tracking this down if possible. Meanwhile,
> the 1.5V is working fine with the latest cvs userland. Help with
> the kernel debugger would be nice, as I'm not familiar with debugging
> NetBSD kernels.
Same here, also on a sparc 20. With recent kernels I occasionally see
scsi parity errors on one disk (can't seem to find a real hardware fault,
though), and eventually the system just hangs. Some things continue to
work for a while after others stop working (logins via ssh seem to be one
of the first to go, while sendmail continues to work much longer). I also
found that if I happened to be logged in while it was in this state,
it would eventually just not execute commands, and after a bit more time
even window focus changes would stop working. The last working kernel
I have is 1.5U from mid April. The first one on which I observed the
failure was 1.5V from May 29. Unfortunately, that's all the info I
have on the problem so far.
BTW: I noticed that the most recent working kernel shows tagged queuing
rejected on all the disks, e.g.:
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST34555N, 0930> SCSI2 0/direct fixed
sd0(esp0:0:0): max sync rate 10.00MB/s
esp0: tagged queuing rejected: target 0
whereas the problematic kernel I built June 16 shows it enabled:
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST34555N, 0930> SCSI2 0/direct fixed
sd0: 4340 MB, 6300 cyl, 8 head, 176 sec, 512 bytes/sect x 8888924 sectors
sd0: sync (100.0ns offset 15), 8-bit (10.000MB/s) transfers, tagged queueing
I don't know whether this is related to the problem.
--Jim