Subject: Testing of sgivol etc.
To: None <port-sgimips@netbsd.org>
From: Havard Eidnes <he@netbsd.org>
List: port-sgimips
Date: 11/10/2001 23:58:24
Hi,
I just took some time to install a new disk on my SGI Indigo2,
together with some more memory. I upgraded to today's kernel, and
decided to test the new "sgivol" utility, and the two first steps
worked as advertized:
viola# ./sgivol sd1
No SGI volumn header found, magic=3D6c6c6c6c
viola# ./sgivol -i sd1
disklabel shows 35843670 sectors
checksum: 00000000
root part: 0
swap part: 1
bootfile:
Volume header files:
SGI partitions:
0:a blocks 35840535 first 3135 type 7 (EFS)
8:i blocks 3135 first 0 type 0 (Volume Header)
10:k blocks 35843670 first 0 type 6 (Volume)
Do you want to update volume (y/n)? y
viola#
However, this one didn't:
viola# ./sgivol sd1
On the console appeared:
sd1: no disk label
sd1: no disk label
Stopped in pid 1245 (sgivol) at 0x8811bd4c: mfhi v0
db> =
At this point the machine is still running diskless, of course. Some
digging resulted in identification of where it crashed; here's the DDB
trace with the subroutines identified by running gdb on the image
afterwards:
db> trace
8811bc68+e4 (200,5072df,a12,8ba75fd8) ra 880211a4 sz 32
sdstrategy
0x8811bd4c <sdstrategy+228>: mfhi $v0
88020f40+264 (8811bc68,5072df,a12,100000) ra 8811c408 sz 80
physio
8811c3d8+30 (8811bc68,d2d27e78,a12,100000) ra 88069ed4 sz 32
sdread
88069dac+128 (8811bc68,d2d27e78,a12,100000) ra 880bcb20 sz 96
spec_read
880bcaa4+7c (8811bc68,d2d27e78,a12,100000) ra 88060910 sz 24
nfsspec_read
88060804+10c (8811bc68,d2d27e78,d2d27e78,100000) ra 88036f44 sz 64
vn_read
88036e80+c4 (8811bc68,d2d27e78,d2d27e78,100003e0) ra 88036e64 sz 96
dofileread
88036dc0+a4 (8811bc68,d2d27e78,d2d27e78,100003e0) ra 880f9b74 sz 56
sys_read
880f9964+210 (8811bc68,d2d27e78,d2d27e78,100003e0) ra 8800305c sz 80
syscall_plain
mips3_SystemCall+b0 (8811bc68,d2d27e78,d2d27e78,100003e0) ra 3010c080 s=
z 0
PC 0x3010c080: not in kernel space
0+3010c080 (8811bc68,d2d27e78,d2d27e78,100003e0) ra 0 sz 0
User-level: pid 1245
db> =
The offending line of code appears to be
if (lp->d_secsize =3D=3D DEV_BSIZE) {
sector_aligned =3D (bp->b_bcount & (DEV_BSIZE - 1)) =3D=
=3D 0;
} else {
>>> sector_aligned =3D (bp->b_bcount % lp->d_secsize) =3D=3D=
0; <<<
}
I *think* lp->d_secsize is either initialized to 0 or read from the
disk.
The section of code for the marked line above appears to be:
0x8811bd34 <sdstrategy+204>: lw $a0,48($s0)
0x8811bd38 <sdstrategy+208>: nop
0x8811bd3c <sdstrategy+212>: divu $zero,$a0,$v1
0x8811bd40 <sdstrategy+216>: bnez $v1,0x8811bd4c <sdstrategy+228>=
0x8811bd44 <sdstrategy+220>: nop
0x8811bd48 <sdstrategy+224>: break 0x7
0x8811bd4c <sdstrategy+228>: mfhi $v0
and sure enough, v1 is zero:
db> show reg
...
v1 0
a0 0x200
...
I decided that the problem was the missing or uninitialized disk
label, and after some failed attempts I managed to wedge one in place.
This could not be done through an operation which would try to read
the missing disklabel, as that would hit the above problem as well, so
I ended up modifying a proto-file from one of my other systems and
doing
# disklabel -R -r sd1 new-label.sd1
whereafter the label became sufficiently initialized that I could
proceed with tuning the contents of the disk label.
The root cause for the problem may be insufficient provision of
default values for the in-core disklabel when the label on the disk is
missing.
Regards,
- H=E5vard