Subject: kern/17150: userland program make sparc64 fall over
To: None <gnats-bugs@gnats.netbsd.org>
From: None <lha@stacken.kth.se>
List: netbsd-bugs
Date: 06/03/2002 03:28:32
>Number: 17150
>Category: kern
>Synopsis: userland program make sparc64 fall over
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jun 02 18:29:00 PDT 2002
>Closed-Date:
>Last-Modified:
>Originator: Love
>Release: NetBSD 1.5ZC
>Organization:
Stacken Computer Club
>Environment:
System: NetBSD nutcracker.stacken.kth.se 1.5ZC NetBSD 1.5ZC (NUTCRACKER) #18: Wed May 29 13:16:51 CEST 2002 lha@nutcracker.stacken.kth.se:/usr/src/sys/arch/i386/compile/NUTCRACKER i386
Architecture: i386
Machine: i386
>Description:
arla is a afs implementation that uses a package called lwp
for context switching. since I got/stole/borrowed a UltraSparc
I thought I should try it out.
Now we didn't support the sparcv6 for netbsd (just linux and
solaris), so it should be a simple hack to make it support
netbsd too.
The code we inherited did wrong, for example, it alligns the stack to
the wrong boundery (linux program never noticed since they ran in 32
bit mode, at least when it tried it 2 years ago). Now I'm a losy
assembler programmer and did wrong, and to wrongs doesn't to one
right.
So,
# pwd
/sources/arla-obj
# cd lwp
# ./testlwp
usage: ./testlwp cmd ...
Where cmd is one of:
pc Producer Consumer test
sleep Sleeptest
selectconsumer Select consumer
selectproducer (special case, just print a string on stdout repeatally)
cancel Test iomgr cancel
deadlock-write deadlockdetection
deadlock-read deadlockdetection
deadlock-read2 deadlockdetection
overrun-stack over run the stack
underrun-stack under run the stack
version Print version
Use several of these tests together to test their interopability
# ./testlwp pc
starting LWkdb breakpoint at 10086bc
1 tt=30 tstate=4411080405 tpc=0x1001498 tnpc=0x100149c
2 tt=30 tstate=82000603 tpc=0x1298d30 tnpc=0x1298d34
Stopped in pid 272 (testlwp) at winfixspill+0x1c8: nop
db> trace
end(trap type 0x34: pc=100ab30 npc=100ab34 pstate=800016<PEF,PRIV,IE>
kernel trap 34: mem address not aligned
Type 'go' to resume
ok go
Faulted in DDB; continuing...
db> reboot 100
syncing disks... P support
startin9 g I8 OMGR support
3 done
Frame pointer is at 0x1c08e01
Call traceback:
12b9ba4(0, 1, 1819400, 180c800, 1839e00, 180c800, 1c08ec1) fp = 1c08ec1
1136c34(100, 0, 1839de0, 1839c00, 13413e0, 187a800, 1c08f81) fp = 1c08f81
1136890(10086c0, 0, ffffffffffffffff, 1c09920, 1136bec, 187a960, 1c09051) fp = 1c09051
11363c8(180c9a8, 0, 1, f, f005b2f8, 0, 1c091b1) fp = 1c091b1
113ad24(10086c0, 10086c0, 187a800, 90d5c20, 1298d30, 1298d34, 1c09291) fp = 1c09291
12c47c4(0, 0, 0, 0, 30, 1298d34, 1c09361) fp = 1c09361
12c1c9c(101, 1c09e20, 90d5e6b, 0, 4002d, 0, 1c09421) fp = 1c09421
1008e40(1c09e20, 101, 10086bc, 140414, 1066f8, 0, 1c09571) fp = 1c09571
107248(18050e8, 6, 7, 0, 1093c0, 2095a0, 1c09751) fp = 1c09751
40203654(40210000, 2d0, 2d0, 3c, 0, 0, 40230a2f) fp = 40230a2f
dumping to dev 7,9 offset 262253
dump starting dump, blkno 262256
panic: dma0: cannot allocate DVMA address
kdb breakpoint at 12c4954
Stopped in pid 272 (testlwp) at cpu_Debugger+0x4: nop
The interesting functions are savecontext() and return returnto(),
but its not there the crash is going to happen, it happens later.
(gdb) file testlwp
Reading symbols from testlwp...done.
(gdb) disas savecontext
Dump of assembler code for function savecontext:
0x105b00 <savecontext>: save %sp, -192, %sp
0x105b04 <savecontext+4>: ta 3
0x105b08 <savecontext+8>: sethi %hi(0), %l0
0x105b0c <savecontext+12>: mov %l0, %l0 ! 0x0
0x105b10 <savecontext+16>: sethi %hi(0x209400), %g1
0x105b14 <savecontext+20>: or %g1, 0x1b0, %g1 ! 0x2095b0 <PRE_Block>
0x105b18 <savecontext+24>: sllx %l0, 0x20, %l0
0x105b1c <savecontext+28>: or %l0, %g1, %l0
0x105b20 <savecontext+32>: mov 1, %l1
0x105b24 <savecontext+36>: stb %l1, [ %l0 ]
0x105b28 <savecontext+40>: stx %fp, [ %i1 ]
0x105b2c <savecontext+44>: stx %g1, [ %i1 + 8 ]
0x105b30 <savecontext+48>: stx %g2, [ %i1 + 0x10 ]
0x105b34 <savecontext+52>: stx %g3, [ %i1 + 0x18 ]
0x105b38 <savecontext+56>: stx %g4, [ %i1 + 0x20 ]
0x105b3c <savecontext+60>: stx %g5, [ %i1 + 0x28 ]
0x105b40 <savecontext+64>: stx %g6, [ %i1 + 0x30 ]
0x105b44 <savecontext+68>: stx %g7, [ %i1 + 0x38 ]
0x105b48 <savecontext+72>: rd %y, %g1
0x105b4c <savecontext+76>: stx %g1, [ %i1 + 0x40 ]
0x105b50 <savecontext+80>: cmp %i2, 0
0x105b54 <savecontext+84>: be,a 0x105b70 <L1>
0x105b58 <savecontext+88>: nop
0x105b5c <savecontext+92>: restore
0x105b60 <savecontext+96>: add %o2, 7, %o2
0x105b64 <savecontext+100>: and %o2, -8, %o2
0x105b68 <savecontext+104>: call %o0
0x105b6c <savecontext+108>: sub %o2, 0xc1, %sp
End of assembler dump.
(gdb) disas returnto
Dump of assembler code for function returnto:
0x105b78 <returnto>: ta 3
0x105b7c <returnto+4>: ldx [ %o0 ], %g1
0x105b80 <returnto+8>: sub %g1, 0xc0, %fp
0x105b84 <returnto+12>: sub %fp, 0xc0, %sp
0x105b88 <returnto+16>: ldx [ %o0 + 0x40 ], %g1
0x105b8c <returnto+20>: mov %g1, %y
0x105b90 <returnto+24>: ldx [ %o0 + 8 ], %g1
0x105b94 <returnto+28>: ldx [ %o0 + 0x10 ], %g2
0x105b98 <returnto+32>: ldx [ %o0 + 0x18 ], %g3
0x105b9c <returnto+36>: ldx [ %o0 + 0x20 ], %g4
0x105ba0 <returnto+40>: ldx [ %o0 + 0x28 ], %g5
0x105ba4 <returnto+44>: ldx [ %o0 + 0x30 ], %g6
0x105ba8 <returnto+48>: ldx [ %o0 + 0x38 ], %g7
0x105bac <returnto+52>: sethi %hi(0), %l0
0x105bb0 <returnto+56>: mov %l0, %l0 ! 0x0
0x105bb4 <returnto+60>: sethi %hi(0x209400), %g1
0x105bb8 <returnto+64>: or %g1, 0x1b0, %g1 ! 0x2095b0 <PRE_Block>
0x105bbc <returnto+68>: sllx %l0, 0x20, %l0
0x105bc0 <returnto+72>: or %l0, %g1, %l0
0x105bc4 <returnto+76>: clr %l1
0x105bc8 <returnto+80>: stb %l1, [ %l0 ]
0x105bcc <returnto+84>: restore
0x105bd0 <returnto+88>: restore
0x105bd4 <returnto+92>: retl
0x105bd8 <returnto+96>: nop
End of assembler dump.
It started to crash when I changed
0x105b6c <savecontext+108>: sub %o2, 0xc0, %sp
to
0x105b6c <savecontext+108>: sub %o2, 0xc1, %sp
db> ps
PID PPID PGRP UID S FLAGS COMMAND WAIT
>218 216 218 0 7 0x5806 testlwp
216 210 216 0 3 0x4086 gdb wait
210 1 210 0 3 0x4086 csh pause
203 1 203 0 3 0x84 inetd pause
194 1 194 0 3 0x84 sshd select
132 1 132 0 3 0x84 mount_mfs mfsidl
102 1 102 0 2 0x84 syslogd
85 1 85 0 3 0x84 dhclient select
6 0 0 0 3 0x20204 aiodoned aiodone
5 0 0 0 3 0x20204 ioflush syncer
4 0 0 0 3 0x20204 reaper reaper
3 0 0 0 3 0x20204 pagedaemon pgdaemo
2 0 0 0 3 0x20204 scsibus0 sccomp
1 0 1 0 3 0x4084 init wait
0 -1 0 0 3 0x20204 swapper schedul
db> trace/t 0t218
trace: pid 218 at 0x92dd331
issignal(5, 5, 0, 9982008206, 0, 90d5de0) at issignal+0x198
trap(92dded0, 1874800, 105b08, 1899400, 0, 0) at trap+0x6e4
Lslowtrap_reenter(1, 2, 20, 22210b, ffffffffffffffff, 0) at Lslowtrap_reenter+0x
70
db> c
panic: winfault: double invalid window at 0x3ff, nsaved=7
kdb breakpoint at 12c4954
1 tt=30 tstate=4411080403 tpc=0x1001498 tnpc=0x100149c
2 tt=30 tstate=82000601 tpc=0x1298d30 tnpc=0x1298d34
Stopped in pid 218 (testlwp) at cpu_Debugger+0x4: nop
db> c
syncing disks...
SIR Reset
Watchdog Reset, Rebooting.
Resetting ...
I'll keep my build/source-tree for a couple of days (and after that
until I run out of diskspace).
Rather then just guessing I thought some a now is I've fixed my
original problem the right way so now my netbsd/sparc64 is
running. Still it would be great if my sparc didn't crash when I did
stupid things to it.
>How-To-Repeat:
ftp http://www.e.kth.se/~lha/testlwp
chmod +x testlwp
./testlwp pc
>Fix:
Dunno
>Release-Note:
>Audit-Trail:
>Unformatted: