Subject: IP20 unsuccessful install
To: None <port-sgimips@netbsd.org>
From: Havard Eidnes <he@netbsd.org>
List: port-sgimips
Date: 03/28/2004 23:01:30
----Next_Part(Sun_Mar_28_23:01:30_2004_827)--
Content-Type: Text/Plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Hi,
we recently physically brought back an old IP20 machine with the
intent of installing/testing NetBSD on it. After having tried my
own cross-compiled install kernel 3 or 4 times, I've concluded
that there is something fishy going on, possibly triggered by our
hardware.
Typically, we walk through most of sysinst, tell it to partition
the disk and fetch the install sets, but when it comes time to
unpacking the sets, it invariably drops to DDB after a while:
61% |********************** | 4064 KB 61.56 KB/s - s=
talled -
Stopped at 0x882e10c8: lw v0,8(s3)
db> =
Inspection of the registers show that s3 is 0, so this appears to
be a null pointer de-reference in the kernel.
The decoded stack backtrace (had to use the symbols file) is:
db> trace
wd33c93_abort+64 (1,bfb8011f,1,17) ra 882e3b3c sz 40
wd33c93_timeout+104 (1,bfb8011f,1,17) ra 881fa68c sz 72
softclock+2f8 (1,bfb8011f,1,17) ra 881d4314 sz 32
hardclock+258 (1,bfb8011f,1,17) ra 882b1774 sz 32
mips3_clock_intr+b0 (1,bfb8011f,1,17) ra 882aeb70 sz 48
cpu_intr+78 (1,bfb8011f,1,17) ra 8828ac50 sz 40
mips3_KernIntr+84 (8890f480,0,c3018000,0) ra 88069178 sz 128
cpu_switch+68 (8890f480,0,c3018000,0) ra 881f4514 sz 24
mi_switch+224 (8890f480,0,c3018000,0) ra 881f3af8 sz 56
ltsleep+258 (8890f480,0,c3018000,0) ra 88209814 sz 56
882095b8+25c (8890f480,0,c3018000,0) ra 88207770 sz 64
dofileread+b0 (88906cc0,0,c3018000,1005cbf0) ra 8820769c sz 96
sys_read+8c (88906cc0,0,c3018000,1005cbf0) ra 882918a4 sz 56
syscall_plain+1ec (88906cc0,0,c3018000,1005cbf0) ra 8828aa9c sz 80
mips3_SystemCall+b4 (88906cc0,0,c3018000,1005cbf0) ra 553420 sz 0
PC 0x553420: not in kernel space
0+553420 (88906cc0,0,c3018000,1005cbf0) ra 0 sz 0
User-level: curlwp NULL
db> =
As can be seen from the "stalled" message above, it's been doing
approximately nothing for a while before this problem strikes.
We were slightly uncertain how the unit selector connector should
be installed on our 1.2GB <IBM OEM, 0663E15, eSfS> drive, because
the disk responds on all targets... However, all the other disk
writing done up to that point from within sysinst has apparently
worked OK.
The dmesg is attached below.
To my eyes (looking at objdump & source of wd33c93_abort()), that
this happens somewhere in
scsipi_printaddr(acb->xs->xs_periph);
near the top of the function, and it appears that it's acb that's
NULL; it's the
12f8: 8e620008 lw v0,8(s3)
instruction it stops at. Below is disassembly of the first part
of wd33c93_abort(), as well as "show reg" output from DDB.
Now, as to why it's decided that it needed to abort the I/O I
have no idea, and why it didn't get any acb, I also don't know...
Hints for further debugging gratefully accepted. I think our
next move will be to try another drive...
Regards,
- H=E5vard
----Next_Part(Sun_Mar_28_23:01:30_2004_827)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
00001294 <wd33c93_abort>:
1294: 27bdffd8 addiu sp,sp,-40
1298: afbf0024 sw ra,36(sp)
129c: afb40020 sw s4,32(sp)
12a0: afb3001c sw s3,28(sp)
12a4: afb20018 sw s2,24(sp)
12a8: afb10014 sw s1,20(sp)
12ac: afb00010 sw s0,16(sp)
12b0: 00808821 move s1,a0
12b4: 00a09821 move s3,a1
12b8: 8c840124 lw a0,292(a0)
12bc: 8e250128 lw a1,296(s1)
12c0: 00c0a021 move s4,a2
12c4: 0c000000 jal 0 <wd33c93_attach>
12c4: R_MIPS_26 bus_space_read_1
12c8: 00003021 move a2,zero
12cc: 8e240124 lw a0,292(s1)
12d0: 8e250128 lw a1,296(s1)
12d4: 24070017 li a3,23
12d8: 00003021 move a2,zero
12dc: 0c000000 jal 0 <wd33c93_attach>
12dc: R_MIPS_26 bus_space_write_1
12e0: 00409021 move s2,v0
12e4: 8e250128 lw a1,296(s1)
12e8: 8e240124 lw a0,292(s1)
12ec: 0c000000 jal 0 <wd33c93_attach>
12ec: R_MIPS_26 bus_space_read_1
12f0: 24060001 li a2,1
12f4: 00408021 move s0,v0
12f8: 8e620008 lw v0,8(s3)
12fc: 00000000 nop
1300: 8c440030 lw a0,48(v0)
1304: 00000000 nop
1308: 8c820004 lw v0,4(a0)
130c: 00000000 nop
1310: 8c420000 lw v0,0(v0)
1314: 00000000 nop
1318: 8c42000c lw v0,12(v0)
131c: 00000000 nop
1320: 0040f809 jalr v0
1324: 00000000 nop
1328: 3c040000 lui a0,0x0
1328: R_MIPS_HI16 .rodata
132c: 24840198 addiu a0,a0,408
132c: R_MIPS_LO16 .rodata
1330: 02003021 move a2,s0
1334: 02802821 move a1,s4
1338: 0c000000 jal 0 <wd33c93_attach>
1338: R_MIPS_26 printf
----Next_Part(Sun_Mar_28_23:01:30_2004_827)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
db> show reg
at 0x88320004
v0 0x16
v1 0xf800
a0 0x1
a1 0xbfb8011f
a2 0x1
a3 0x17
t0 0xc0046034
t1 0x882916b8
t2 0xffffffff
t3 0x880690ac
t4 0
t5 0
t6 0
t7 0
s0 0x16
s1 0xc0046000
s2 0
s3 0
s4 0x8832a798
s5 0
s6 0
s7 0x8830d790
t8 0
t9 0x5dcc00
k0 0
k1 0
gp 0x8863f1b0
sp 0xc3019c10
fp 0x1
ra 0x882e10c4
sr 0xf802
mdlo 0x5c26ce98
mdhi 0xbfb
bad 0
cs 0
pc 0x882e10c8
0x882e10c8: lw v0,8(s3)
db>
----Next_Part(Sun_Mar_28_23:01:30_2004_827)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 1.6ZK (INSTALL32_IP2x) #18: Wed Mar 24 22:17:13 CET 2004
he@stegg.urc.uninett.no:/usr/users/he/src/sys/arch/sgimips/compile/obj.sgimips/INSTALL32_IP2x
49152 KB memory, 40872 KB free, 768 KB for ARCS
mainbus0 (root): SGI-IP20 [SGI, 6906a2c8], 1 processor
cpu0 at mainbus0: MIPS R4000 CPU (0x422) Rev. 2.2 with MIPS R4010 FPC Rev. 0.0
cpu0: 8KB/16B direct-mapped L1 Instruction cache, 48 TLB entries
cpu0: 8KB/16B direct-mapped write-back L1 Data cache
cpu0: 1024KB/128B direct-mapped write-back L2 Unified cache
int0 at mainbus0 addr 0x1fb801c0: bus 50MHz, CPU 100MHz
imc0 at mainbus0 addr 0x1fa00000: revision 1
gio0 at imc0
unknown GIO card (product 0x7f revision 0xff) at gio0 slot 2 addr 0x1f000000 not configured
hpc0 at gio0 addr 0x1fb80000: SGI HPC1.5
zsc0 at hpc0 offset 0xd10
zstty0 at zsc0 channel 1 (console i/o)
zstty1 at zsc0 channel 0
zsc1 at hpc0 offset 0xd00
zsc1: channel 1 not configured
zsc1: channel 0 not configured
int0: cannot share interrupts yet.
sq0 at hpc0 offset 0x100: SGI Seeq 80c03
sq0: Ethernet address 08:00:69:06:a2:c8
wdsc0 at hpc0 offset 0x11f: WD33C93B SCSI, rev=0, target 0
scsibus0 at wdsc0: 8 targets, 8 luns per target
dpclock0 at hpc0 offset 0xe00
biomask 07 netmask 07 ttymask 0f clockmask bf
md0: internal 3072 KB image area
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 1 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd0: drive offline
sd0: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
sd1 at scsibus0 target 2 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd1: drive offline
sd1: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
sd2 at scsibus0 target 3 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd2: drive offline
sd2: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
sd3 at scsibus0 target 4 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd3: drive offline
sd3: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
sd4 at scsibus0 target 5 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd4: drive offline
sd4: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
sd5 at scsibus0 target 6 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd5: drive offline
sd5: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
sd6 at scsibus0 target 7 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd6: drive offline
sd6: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
boot device: sd0
root on md0a dumps on md0b
WARNING: clock gained 3 days -- CHECK AND RESET THE DATE!
root file system type: ffs
----Next_Part(Sun_Mar_28_23:01:30_2004_827)----