Subject: re: crash dump failing on machine with 4GB
To: Chris Ross <cross+netbsd@distal.com>
From: matthew green <mrg@eterna.com.au>
List: port-sparc64
Date: 09/28/2007 10:04:09
On Sep 26, 2007, at 14:22, Chris Ross wrote:
> On Sep 26, 2007, at 13:51, matthew green wrote:
>> can you get a stack trace with symbols? or use gdb to
>> find them out from these values?
>
> Of course. Here's a backtrace after the failed "reboot 0x104"
> used to cause the dump attempt.
>
> dumping to dev 7,1 offset 4310231
> dump 4096 esiop0: unable to load cmd DMA map: -1i/o error
> sd0(esiop0:0:0:0): polling command not done
> panic: scsipi_execute_xs
> cpu0: kdb breakpoint at 13f3e80
> Stopped in pid 0.2 (system) at netbsd:cpu_Debugger+0x4: nop
> db> bt
> scsipi_execute_xs(5f89c00, e0016d96, a, 0, 0, 4) at
> netbsd:scsipi_execute_xs+0x3
> 18
> sd_flush(746fc00, 103, 0, 0, 0, 8000000000001034) at netbsd:sd_flush
> +0x84
> sd_shutdown(746fc00, 5, 0, 0, e0016fb8, 0) at netbsd:sd_shutdown+0x18
> doshutdownhooks(161eaa8, 5, 0, 10, 1857800, f) at
> netbsd:doshutdownhooks+0x30
So, does anyone have any suggestions on where I should go from
here? I looked into the "unable to load cmd DMA map" error, which is
returning an EIO from a call to bus_dmamap_load(). Should I try to
track down into that function (via the macro, etc) and figure out if
it's returning an EIO for some reason relating to the physical memory
address it's given? Or, can someone look at the code in doshutdown()
to see if the physical memory mapping calls "look right"? I was
looking at amd64, figuring that it would be more likely to have this
functionality working, and I notice that the pmap_* call(s) it uses
are different, but that may not be unusual...
you mean it's bus_dmamap_load() is different? yeah, that is gonna
be expected..
hmm, i don't see how sparc64 bus_dmamap_load() could return EIO?
see machdep.c:_bus_dmamap_load(). oh, the message above says it
returns -1... which also seems not possible...
is the above text exactly what it says? i don't see where the
"i/o error" comes from? there should be a newline after the -1.
(perhaps you changed this?)
Thanks. I know not everyone has a 4GB sparc64 to play with, so
I'm happy to work on this, but I will need to get this machine into
production in the not-too-distant future, so need to keep moving.
ps,
Is the last argument to sd_flush(), quoted in the backtrace above,
indicative of a problem? Just looks "odd" compared to the rest of
the parameters.
it is just garbage on the stack. looking at sd.c:
static int sd_flush(struct sd_softc *, int);
so only the first two arguments are relevant.