Subject: Re: crash dump failing on machine with 4GB
To: Chris Ross <cross+netbsd@distal.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-sparc64
Date: 09/30/2007 11:39:22
On Sat, Sep 29, 2007 at 11:17:34PM -0400, Chris Ross wrote:
>
> On Sep 29, 2007, at 22:04, Greg Oster wrote:
> >>On Sep 29, 2007, at 15:19, Martin Husemann wrote:
> >>>On Sat, Sep 29, 2007 at 02:53:23PM -0400, Chris Ross wrote:
> >>>> Any idea where the scsipi_xfer gets allocated or "hand-
> >>>>crafted" in
> >>>>the cmd_c before esiop_cmd_end() is called?
> >>>
> >>>sys/dev/scsipi/sd.c:1560
> >>>
> >>>It has XS_CTL_NOSLEEP|XS_CTL_POLL set in xs_control and I think in
> >>>this case the callout should not be touched.
> >>
> >> Okay. Cool. So, in the case that XS_CTL_POLL is set, it would
> >>make sense
> >>that there isn't (or "shouldn't be" ?) a callout, right? I'll whack
> >>that into my
> >>kernel, which will be the "more correct" way to stop that problem,
> >>and let
> >>me get back to the original problem of figuring out why the crash-
> >>dump
> >>doesn't work. :-)
> >
> >Try adding the line:
> >
> > callout_init(&xs->xs_callout, 0);
> >
> >after the line:
> >
> > xs->datalen = nwrt * sectorsize;
> >
> >in sd.c:sddump().
> >
> >That got rid of the panic for me...
>
> So, I guess the question here is, since you're seeing this on an ahc,
> on an i386, is this a [eo]siop bug, or a scsipi bug? Your solution
> assumes
sddump() uses a static scsipi xfer; it's true that the callout isn't
initialized here. But see below
> it's a bug that affects all (or at least most) things. It makes
> sense to me
> to fix it the way Martin and Manuel suggested, where you presume based
> on control bits whether the callout is "valid" or likely to be in-
> use. However,
> if that means changing all of the device drivers for most or all
> controllers,
> maybe what you suggest here might be better. I'll leave that
> decision to
> people who know the scsi subsystem better than I. But, something I
> wanted
> to mention.
I don't think it's a good thing for the HBA drivers to use the callout when
using polled mode. Interrupts are blocked at this point, and they should deal
with the timeout in the poll loop. Otherwise, if the device is not responsive,
the driver will hang on the command forever.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--