port-sparc64: -current very unstable on Ultra10

Subject: -current very unstable on Ultra10
To: None <port-sparc64@netbsd.org>
From: Gert Doering <gert@greenie.muc.de>
List: port-sparc64
Date: 10/30/2004 10:24:18
Hi,

for testing with wscons/wsfb and XFree86 on Creator3D (which is not yet
working for me), I recently got me a new Ultra10, and installed NetBSD 
-current on it (bootstrapping from 2.0_RC2).  

"uname" reports

   NetBSD zeta.medat.de 2.99.10 NetBSD 2.99.10

/usr/src was checked out with -r HEAD on 2004/10/29

The system "as is" works, but I have not yet been able to build a full
world with X11 (build.sh -x), as the system keeps falling into ddb or 
rightout crashing on me.

Typical problems manifest like this (on the serial console):


pmap_page_protect: pseg empty!
kdb breakpoint at 12816a4
Stopped in pid 24367.1 (sh) at  netbsd:cpu_Debugger+0x4:        nop
db> cont

pmap_page_protect: pseg empty!
kdb breakpoint at 12816a4
Stopped in pid 15862.1 (sh) at  netbsd:cpu_Debugger+0x4:        nop
db> cont
hme0: status=30001<GOTFRAME,RXTOHOST,NORXD>
pmap_page_protect: pseg empty!
kdb breakpoint at 12816a4
Stopped in pid 15862.1 (sh) at  netbsd:cpu_Debugger+0x4:        nop
db> cont
pmap_page_protect: pseg empty!
kdb breakpoint at 12816a4
Stopped in pid 15862.1 (sh) at  netbsd:cpu_Debugger+0x4:        nop
db> cont

- when this happens, usually I can make it go on by typing "cont" 
a few times - sometimes once, sometimes 3-4 times.

The other thing that's happening is that it likes to crash when calling
"sync":

gert@zeta.medat.de:/usr/src$ sync
data fault: pc=100aac4 addr=1000000
kernel trap 30: data access exception
Stopped in pid 25690.1 (sync) at        netbsd:pseg_get+0x54:   ldxa            [ %o2 + %g0] 20, %o0
db> cont
panic: kernel fault
Begin traceback...
End traceback...
syncing disks... panic: lockmgr: locking against myself
Begin traceback...
End traceback...
Frame pointer is at 0x8c90041
Call traceback:
12767f0(11, 5, 0, 0, 0, 11, 8c90101) fp = 8c90101
11c192c(104, 0, fffe, 137a3e1, 11c2354, 0, 8c901c1) fp = 8c901c1
11a29d8(13979b8, 2, 0, 6fdd400, 8c90890, 1812c00, 8c90281) fp = 8c90281
11f1a6c(6ff72d0, 10, 6ff7210, 11, 8c909b0, 13, 8c90351) fp = 8c90351
11f0e60(6ff7210, 10012, 139c2c8, 0, 6ff7210, a, 8c90431) fp = 8c90431
11e77f0(6ff7210, 10012, 705e140, 0, 700, 0, 8c904f1) fp = 8c904f1
115e364(0, 10012, 0, 706f110, 0, 1813778, 8c905b1) fp = 8c905b1
11eb328(0, 2, 66b0180, 706f110, 1275a04, 11, 8c90681) fp = 8c90681
11e963c(1813400, 0, 0, 0, 8c91088, f, 8c90751) fp = 8c90751
127681c(139b800, 5, 0, 0, 0, 11, 8c90811) fp = 8c90811
11c192c(100, 0, fffe, 137a3e1, 11c2354, f, 8c908d1) fp = 8c908d1
127f2d4(13be8c8, 8c91340, 1000000, 1, 0, 1812c00, 8c90991) fp = 8c90991
100871c(8c91340, 30, 100aac4, 1000000, 1000400, 0, 8c90a91) fp = 8c90a91
127d35c(6fda660, 40900000, 100000001000400, 400, 1000000, 80000000171be130, 8c90c71) fp = 8c90c71
11f472c(75d1a20, 2050000, 1, 1fea800, 8, 0, 8c90d31) fp = 8c90d31
11f1edc(1, 1, 0, 6fdd400, 8c915e0, 11, 8c90f41) fp = 8c90f41
115fbe0(6ff7210, 0, 0, 11, 0, 0, 8c91031) fp = 8c91031
115f790(6ff7210, 10012, 139c2c8, 0, 6ff7210, 0, 8c910f1) fp = 8c910f1
11f1748(8c91ab0, 10012, 705e140, 0, 123e15c, 1000000, 8c91201) fp = 8c91201
115e3dc(6ff7210, 66b0180, 0, 0, 0, 706f110, 8c91301) fp = 8c91301
11eb328(0, 2, 66b0180, 706f110, 0, 0, 8c913d1) fp = 8c913d1
1280148(1813400, 8c91dd0, 8c91dc0, 0, 8c91dd0, 8c91d10, 8c914a1) fp = 8c914a1
1008cb8(8c91ed0, 24, 40430f1c, 8c91dd0, 40430f1c, 40430f20, 8c91621) fp = 8c91621
100c34(ffffffffffffcdb0, 81c06000, 0, 0, 0, 0, ffffffffffffc271) fp = ffffffffffffc271

dumping to dev 12,1 offset 132201
dump starting dump, blkno 132204
cmdide0:0: unable to load xfer table DMA map for drive 0, error=-1
wddump: DMA error
device not ready
rebooting


I'm a bit surprised at this.  I have a number of other ultras (2x U5, 1x
U10) running NetBSD 2.0 (not -current), and all of them have survived
at least one "build.sh -x" without crashing.

OTOH, the machine in question has an original Sun 4 G hard disk, which
seems to be the slowest piece of hardware ever built.  The U5s have
recent IDE disks (non-Sun, larger & faster).  Maybe this is triggering
something in the IDE subsystem?

My gut feeling right now is that the crashes are related to "heavy 
disk activity", but I am not sure how to proceed now...

gert
-- 
USENET is *not* the non-clickable part of WWW!
                                                           //www.muc.de/~gert/
Gert Doering - Munich, Germany                             gert@greenie.muc.de
fax: +49-89-35655025                        gert@net.informatik.tu-muenchen.de