Subject: Re: Ultra2 FAS problems
To: john heasley <heas@shrubbery.net>
From: Andrey Petrov <petrov@netbsd.org>
List: port-sparc64
Date: 08/28/2002 23:02:12
On Thu, Aug 29, 2002 at 04:37:40AM +0000, john heasley wrote:
> Tue, Aug 27, 2002 at 12:46:32PM -0700, john heasley:
> > Mon, Aug 26, 2002 at 12:29:35PM -0700, Andrey Petrov:
> > > On Mon, Aug 26, 2002 at 12:02:46PM -0700, john heasley wrote:
> > > > Sun, Aug 25, 2002 at 03:25:24PM -0700, Andrey Petrov:
> > > > > I suspect that your drive start negotiating sync/wide mode
> > > > > first and this is not supported by esp driver. If you can
> > > >
> > > > what do you mean by "first"? before something in particular?
> > >
> > > Usually controller starts sync/wide negotiating after getting target
> > > capabilities. The logs you sent to me quite long ago showed
> > > that drive can also start negotiating on its own, that what I meant 'first'.
> >
> > thanks, btw. thought i might have been lost again. :)
> >
> > > I put your patch into esp driver, can you try it?
> > >
>
> btw, the DMA errors such as these (another u2):
>
> Aug 28 03:21:50 pine /netbsd: esp0: error: csr=b2930a13<INT,ERR,DRAINING=0,IEN,E
> NDMA,DSBL_SCSI_DRN,BURST=0,TCI
> Aug 28 03:21:50 pine /netbsd: esp0: DMA error; resetting
> Aug 28 03:21:50 pine /netbsd: esp0: SCSI bus reset
> Aug 28 03:21:50 pine /netbsd: esp0: error: csr=b2930a13<INT,ERR,DRAINING=0,IEN,E
> NDMA,DSBL_SCSI_DRN,BURST=0,TCI
> Aug 28 03:21:50 pine /netbsd: esp0: DMA error; resetting
> Aug 28 03:21:50 pine /netbsd: esp0: SCSI bus reset
>
> that get under load (eg: cvs update or build) seem to have stopped. but,
> now i'm seeing these under load.
>
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid adf0
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid 3bf0
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid cdf0
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid cdf0
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid 73f0
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid 1bf0
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid cdf0
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid cdf0
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid cdf0
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid cdf0
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid cdf0
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid cdf0
> esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 0, resid cdf0
>
Other people noticed it too, it happened before the latest change
and it's somehow related to tagged-queueing. It usually helps if
you disable it:
esp* at sbus0 flags 0xff0000
I found this problem quit challenging, it's very intermitten and
very time sensitive, like you change drive access pattern, turn on
driver trace and it's gone. Help is always welcome, and you have
the docs, John.
Andrey