Subject: Re: crash dump failing on machine with 4GB
To: matthew green <mrg@eterna.com.au>
From: Chris Ross <cross+netbsd@distal.com>
List: port-sparc64
Date: 09/27/2007 23:38:30
On Sep 27, 2007, at 20:04, matthew green wrote:
>> dumping to dev 7,1 offset 4310231
>> dump 4096 esiop0: unable to load cmd DMA map: -1i/o error
>> sd0(esiop0:0:0:0): polling command not done
>> panic: scsipi_execute_xs
>> cpu0: kdb breakpoint at 13f3e80
>> Stopped in pid 0.2 (system) at netbsd:cpu_Debugger+0x4: nop
>> db>
>
> So, does anyone have any suggestions on where I should go from
> here? I looked into the "unable to load cmd DMA map" error,
> which is
> returning an EIO from a call to bus_dmamap_load(). Should I try to
> track down into that function (via the macro, etc) and figure
> out if
> it's returning an EIO for some reason relating to the physical
> memory
> address it's given? Or, can someone look at the code in
> doshutdown()
> to see if the physical memory mapping calls "look right"? I was
> looking at amd64, figuring that it would be more likely to have
> this
> functionality working, and I notice that the pmap_* call(s) it uses
> are different, but that may not be unusual...
>
> you mean it's bus_dmamap_load() is different? yeah, that is gonna
> be expected..
No, actually I meant the pmap_* calls in the dumpsys code in the
respective
machdep.c's are different. amd64 uses pmap_map() before calling the
bdev's dump function. the sparc64 code is using pmap_kenter_pa()
followed by pmap_update(). Again, this isn't code I know anything
about,
I was just randomly asking if it was expected that different pmap*()
routines would be used on the two architectures.
The bus_dmamap_load() call, in dev/ic/esiop.c, was just what I found
when looking at where that printed out string came from...
> hmm, i don't see how sparc64 bus_dmamap_load() could return EIO?
> see machdep.c:_bus_dmamap_load(). oh, the message above says it
> returns -1... which also seems not possible...
>
> is the above text exactly what it says? i don't see where the
> "i/o error" comes from? there should be a newline after the -1.
> (perhaps you changed this?)
I hadn't chased down where the code that becomes bus_dmamap_load()
on the sparc64 was yet. That's what I was asking about. I see that
it's a
macro in bus.h, but hadn't tracked it to real code past that.
The text is what it says. The "i/o error" string is after the
call to the bdevsw's
dump function in dumpsys(), in machdep.c. It's the return from the dump
function, which is the d_dump member of the bdevsw for the device
to be dumped to, I think. That's what's returning the EIO.
Sorry to have confused things by asking about two different things in
the same paragraph. :-) I *presume* that the dump function is
returning
the EIO *because* the underlying bus_dmamap_load() is failing, inside
of the esiop code. But, again, I hadn't tracked that out yet.
>> ps,
>> Is the last argument to sd_flush(), quoted in the backtrace
>> above,
>> indicative of a problem? Just looks "odd" compared to the rest of
>> the parameters.
>
> it is just garbage on the stack. looking at sd.c:
>
> static int sd_flush(struct sd_softc *, int);
>
> so only the first two arguments are relevant.
Ahh. Cool. Thanks. That was information I needed. :-) That makes
much more sense now...
- Chris