Subject: mmap not working right on NetBSD/sparc 1.5? - segfault on write
To: None <port-sparc@netbsd.org>
From: Greg Troxel <gdt@ir.bbn.com>
List: port-sparc
Date: 12/08/2000 19:24:47
I am trying to run coda and having trouble with liblwp, their thread
package. The message below addresses 1.5_BETA2, but the problem is
still present with 1.5 release and the GENERIC kernel from the
release. I've now had the problem consistently on two machines - the
other is an IPX that has never had any sign of flakiness.
Short summary: lwp mmap's some memory for a stack, and then tries to
write into it. The first write segfaults. However, if I break in gdb
before the write, I can read/write the mmap'd stack from gdb with no
trouble.
Could this have anything to do with the recent register window issues?
Any help/clues would be appreciated.
------- Forwarded Message
Message-Id: <200011281704.MAA31241@telemann.coda.cs.cmu.edu>
From: Greg Troxel <gdt@ir.bbn.com>
To: codalist@telemann.coda.cs.cmu.edu
Subject: lwp broken on NetBSD/sparc 1.5_BETA2 ?
Date: Tue, 28 Nov 2000 12:04:27 -0500
I have a sparc ELC running NetBSD 1.5_BETA2. I have a slight
suspicion that the hardware is not 100% ok (POST failure with no
explanation, occasional cc core dumps when compiling really huge
files), but I have built cvs, emacs, perl, kth-krb4, arla etc. and am
running X, so it is at least 99.999% ok. (I can use arla to
read/write afs servers at MIT, etc.) This problem is repeatable,
which none of the other wierdnesses are. I can try this on another
sparc sometime. [I have, and it has the same symptoms.]
I am trying to build coda, using the latest CVS. I am having a number
of problems that have not occurred when doing the same under NetBSD/i386
1.4.2 or FreeBSD/i386 {3.3,4.2-betaish}.
[trimmed]
1) testlwp-static dumps core. The problem is in Initialize_Stack. It
appears to have successfully mmap()d a stack at 0x4500000, and with
gdb I can read and write this memory space. I have appended a bunch
of gdb output. I recompiled that file without -O2, but I see no
important differences. The instruction which loses is a stb trying to
write a 0 to 0x4500000. However, I'm not enough of a sparc weenie to
know if I'm getting a delayed segfault from a prior instruction.
However, it loses on this instruction with or without -O2.
[trimmed]
(gdb) disass Initialize_Stack
Dump of assembler code for function Initialize_Stack:
0x14a70 <Initialize_Stack>: save %sp, -112, %sp
0x14a74 <Initialize_Stack+4>: st %i0, [ %fp + 0x44 ]
0x14a78 <Initialize_Stack+8>: st %i1, [ %fp + 0x48 ]
0x14a7c <Initialize_Stack+12>: sethi %hi(0x25800), %o0
0x14a80 <Initialize_Stack+16>:
ld [ %o0 + 0x20 ], %o1 ! 0x25820 <lwp_stackUseEnabled>
0x14a84 <Initialize_Stack+20>: cmp %o1, 0
0x14a88 <Initialize_Stack+24>: be 0x14ae4 <Initialize_Stack+116>
0x14a8c <Initialize_Stack+28>: nop
0x14a90 <Initialize_Stack+32>: clr [ %fp + -12 ]
0x14a94 <Initialize_Stack+36>: ld [ %fp + -12 ], %o0
0x14a98 <Initialize_Stack+40>: ld [ %fp + 0x48 ], %o1
0x14a9c <Initialize_Stack+44>: cmp %o0, %o1
0x14aa0 <Initialize_Stack+48>: bl 0x14ab0 <Initialize_Stack+64>
0x14aa4 <Initialize_Stack+52>: nop
0x14aa8 <Initialize_Stack+56>: b 0x14adc <Initialize_Stack+108>
0x14aac <Initialize_Stack+60>: nop
0x14ab0 <Initialize_Stack+64>: ld [ %fp + 0x44 ], %o0
0x14ab4 <Initialize_Stack+68>: ld [ %fp + -12 ], %o1
0x14ab8 <Initialize_Stack+72>: add %o0, %o1, %o0
0x14abc <Initialize_Stack+76>: ldub [ %fp + -9 ], %o1
0x14ac0 <Initialize_Stack+80>: and %o1, -1, %o2
0x14ac4 <Initialize_Stack+84>: stb %o2, [ %o0 ]
0x14ac8 <Initialize_Stack+88>: ld [ %fp + -12 ], %o0
0x14acc <Initialize_Stack+92>: add %o0, 1, %o1
0x14ad0 <Initialize_Stack+96>: st %o1, [ %fp + -12 ]
0x14ad4 <Initialize_Stack+100>: b 0x14a94 <Initialize_Stack+36>
0x14ad8 <Initialize_Stack+104>: nop
0x14adc <Initialize_Stack+108>: b 0x14af4 <Initialize_Stack+132>
0x14ae0 <Initialize_Stack+112>: nop
0x14ae4 <Initialize_Stack+116>: ld [ %fp + 0x44 ], %o0
0x14ae8 <Initialize_Stack+120>: sethi %hi(0xbadbac00), %o2
0x14aec <Initialize_Stack+124>: or %o2, 0x1ba, %o1 ! 0xbadbadba
0x14af0 <Initialize_Stack+128>: st %o1, [ %o0 ]
0x14af4 <Initialize_Stack+132>: ret
0x14af8 <Initialize_Stack+136>: restore
End of assembler dump.
(gdb) i local
i = 0
# this was done after the segfault. Note that o2 is 0 and o0 is stackbase.
(gdb) i reg
g0 0x0 0
g1 0x100d4eec 269307628
g2 0x0 0
g3 0x0 0
g4 0x0 0
g5 0x0 0
g6 0x0 0
g7 0xffffffff -1
o0 0x45000000 1157627904
o1 0x0 0
o2 0x0 0
o3 0x1000 4096
o4 0x3 3
o5 0x1002 4098
sp 0xeffff580 -268438144
o7 0x25844 153668
l0 0x90400087 -1874853753
l1 0x100c5ed8 269246168
l2 0x100c5edc 269246172
l3 0xfc1 4033
l4 0x1 1
l5 0x1 1
l6 0xf1cb3000 -238342144
l7 0x100f3158 269431128
i0 0x45000000 1157627904
i1 0x1000 4096
i2 0x1 1
i3 0x0 0
i4 0x0 0
i5 0x1000 4096
fp 0xeffff5f0 -268438032
i7 0x13350 78672
y 0x3000 12288
psr 0x90900086 -1869610874
wim 0x0 0
tbr 0x0 0
pc 0x14ac4 84676
npc 0x14ac8 84680
fpsr 0x0 0
cpsr 0x0 0
(gdb) bt
#0 0x14ac4 in Initialize_Stack (stackptr=0x45000000 "", stacksize=4096)
at lwp.c:1111
#1 0x13358 in LWP_CreateProcess (ep=0x10da0 <OtherProcess>, stacksize=4096,
priority=0, parm=0x0, name=0x257f0 "OtherProcess", pid=0xeffff6d0)
at lwp.c:606
#2 0x10e74 in main (argc=1, argv=0xeffff7c4) at testlwp.c:82
#3 0x10a58 in ___start ()
(gdb) print stackptr
$4 = 0x45000000 ""
(gdb) print stackptr[0]
$5 = 0 '\000'
(gdb) set stackptr[0] = 1
(gdb) print stackptr[0]
$6 = 1 '\001'
------- End of Forwarded Message