Subject: Re: ahc & mpt scsi timeouts
To: Tracy Di Marco White <netbsd@gendalia.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 05/27/2006 09:30:08
On Fri, May 26, 2006 at 11:36:43PM -0500, Tracy Di Marco White wrote:
>
> I have a machine with 4 tape drives attached, each on their own scsi
> chain, to do backups with. I regularly get these timeouts, hanging
> the process accessing a drive, and requiring me to restart the machine,
> and causing problems with backups. The tape drives are attached via
> ahc(4) cards. It also has two spool disks, attached via mpt(4).
>
> I am running a not exactly new current at this point. It is a
> multiprocessor machine that I am running UP in hopes that it would
> be more stable. The only modification I have to the kernel is
> that I doubled ST_IO_TIME in src/sys/dev/scsipi/stvar.h from
> 3 minutes to 6 minutes.
>
> Is there something I can do to make these stop happening, and
> allow backups to work more consistently?
>
> The mpt timeouts only prevent me from booting, and if I reboot
> it, possibly a few times, it'll eventually come up. They look
> like:
>
> probe(mpt0:0:0:0): command timeout
> mpt0: timeout on request index = 0xfe, seq = 0x00000068
> mpt0: Status 0x80000000, Mask 0x00000001, Doorbell 0x24000000
> mpt0: request state: On Chip
> probe(mpt0:0:1:0): command timeout
>
> and are repeated over & over until I drop to the debugger
> to reboot, or it finally drops to single user mode, unable
> to mount the spool disks and I reboot it. These don't happen
> in any consistent fashion.
>
> As for the adaptec timeouts, I don't see them in any consistent
> fashion either. I believe they're usually on commands where a
> tape is being mounted for write.
>
> ahc4:SCB 0xe - timed out
Do you always see the timeout on mpt0 and ahc4, or can it occurs with any
adapter ?
I notice that mpt0 and ahc4 share interrupt with the PERC
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--