Subject: Re: crash dump failing on machine with 4GB
To: NetBSD port-sparc64 mailing list <port-sparc64@NetBSD.org>
From: Chris Ross <cross+netbsd@distal.com>
List: port-sparc64
Date: 09/26/2007 13:29:06
On Sep 26, 2007, at 12:07, Chris Ross wrote:
>   Is this a known issue?  I have a sparc64 machine with 4GB of memory.

   Not unexpectedly, this appears to be an int overflow issue.   
Making the following change:

--- sys/arch/sparc64/sparc64/machdep.c  11 Sep 2007 16:00:06  
-0000      1.202
+++ sys/arch/sparc64/sparc64/machdep.c  26 Sep 2007 17:24:50 -0000
@@ -759,7 +759,7 @@
         for (mp = &phys_installed[0], j = 0; j < phys_installed_size;
                         j++, mp = &phys_installed[j]) {
-               unsigned i = 0, n;
+               unsigned long i = 0, n;
                 paddr_t maddr = mp->start;
#if 0
@@ -781,8 +781,7 @@
                                 printf("%ld ", todo / (1024*1024));
                         pmap_kenter_pa(dumpspace, maddr, VM_PROT_READ);
                         pmap_update(pmap_kernel());
-                       error = (*dump)(dumpdev, blkno,
-                                       (void *)dumpspace, (int)n);
+                       error = (*dump)(dumpdev, blkno, (void *) 
dumpspace, n);
                         pmap_kremove(dumpspace, n);
                         pmap_update(pmap_kernel());
                         if (error)


   causes it to produce a new error.  n is capped at 8192 by other  
code, so the latter segment above is probably not even an issue.  I  
don't know enough about the lower-level device code to know what I'm  
hitting, so I thought I'd ask.  This wasn't getting hit before  
because n was 0, due to the overflow.

   I'm seeing now:

db> reboot 0x104
Frame pointer is at 0xe0016651
Call traceback:
13ea690(1, d, 0, e00171e0, ffffffffffffffff, 0, e0016731) fp = e0016731
10be120(104, 0, e00170a8, 1860800, 1860b88, 188c7a8, e00167f1) fp =  
e00167f1
10bd658(1, 0, 4, e0017170, e0017298, 188c7a8, e00168c1) fp = e00168c1
10bdc88(180f2c8, 4, 0, 0, e0017388, 0, e0016a11) fp = e0016a11
10c163c(13f3f08, 0, 2, 1898819, 0, 0, e0016b01) fp = e0016b01
13f5264(0, 0, 0, 0, 4, 1000000, e0016bd1) fp = e0016bd1
13f2dd8(101, e0017b60, 98b31e1fa, 957d95e00000000, 1d00000000,  
18a4800, e0017131) fp = e0017131
1008c1c(e0017b60, 101, 13f3f00, 1d0006, 400, 187a998, e00172b1) fp =  
e00172b1
13c234c(189b950, 187f3e0, ffffffff, 0, 1818c00, 1d, e0017491) fp =  
e0017491
13c29a8(61c4800, e0017e0c, a847c1a, 7477, ffff, 40, e0017551) fp =  
e0017551
100911c(0, 0, e0017ed0, 1877998, 13c2960, 1000000, e0017621) fp =  
e0017621
1288640(0, 0, 4, 6, 187a800, 1000000, ffbd561) fp = ffbd561

dumping to dev 7,1 offset 4310231
dump 4096 esiop0: unable to load cmd DMA map: -1i/o error
sd0(esiop0:0:0:0): polling command not done
panic: scsipi_execute_xs
cpu0: kdb breakpoint at 13f3f00
Stopped in pid 0.2 (system) at  netbsd:cpu_Debugger+0x4:        nop
db>

   So, back into "anyone know anything"?  I'll still continue to dig  
around...

                                    - Chris