Subject: kern/25536: SMP kernel crash (thread related ?)
To: None <gnats-bugs@gnats.NetBSD.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: netbsd-bugs
Date: 05/11/2004 12:45:51
>Number: 25536
>Category: kern
>Synopsis: SMP kernel crash (thread related ?)
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue May 11 10:46:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator: Manuel Bouyer
>Release: NetBSD 2.0_BETA, 200405050000 build
>Organization:
LIP6, Universite Paris VI.
>Environment:
System: NetBSD antifer.ipv6.lip6.fr 2.0_BETA NetBSD 2.0_BETA (GENERIC.MP) #0: Sat May 8 00:33:21 UTC 2004 autobuild@tgm.netbsd.org:/autobuild/netbsd-2-0/i386/OBJ/autobuild/netbsd-2-0/src/sys/arch/i386/compile/GENERIC.MP i386
Architecture: i386
Machine: i386
NetBSD 2.0_BETA (GENERIC.MP) #0: Sat May 8 00:33:21 UTC 2004
autobuild@tgm.netbsd.org:/autobuild/netbsd-2-0/i386/OBJ/autobuild/netbsd-2-0/src/sys/arch/i386/compile/GENERIC.MP
total memory = 97916 KB
avail memory = 87900 KB
BIOS32 rev. 0 found at 0xe0000
mainbus0 (root)
mainbus0: Intel MP Specification (Version 1.4) (COMPAQ Workstation )
cpu0 at mainbus0: apid 1 (boot processor)
cpu0: Intel Pentium Pro (686-class), 199.45 MHz, id 0x617
cpu0: features fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features fbff<PGE,MCA,CMOV>
cpu0: I-cache 8 KB 32b/line 4-way, D-cache 8 KB 32b/line 2-way
cpu0: L2 cache 256 KB 32b/line 4-way
cpu0: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative
cpu0: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way
cpu0: calibrating local timer
cpu0: apic clock running at 66 MHz
cpu0: 16 page colors
cpu1 at mainbus0: apid 0 (application processor)
cpu1: starting
cpu1: Intel Pentium Pro (686-class), 199.43 MHz, id 0x619
cpu1: features fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features fbff<PGE,MCA,CMOV>
cpu1: I-cache 8 KB 32b/line 4-way, D-cache 8 KB 32b/line 2-way
cpu1: L2 cache 256 KB 32b/line 4-way
cpu1: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative
cpu1: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way
>Description:
While loading a page with lots of images in firefox, the browser
stopped responding. top showed it didn't have much CPU activity,
waiting on poll(). I then tried to kill it. at first kill and kill -9
from top didn't do anything. Then I tried a 'kill -9 %' from the
xterm where I stared it, and the screen frooze. I blindly typed
ctrl-alt-esc then reboot(0x104) and got a core dump.
antifer# ps -axl -M netbsd.0.core
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
0 0 1208619026 0 -18 0 0 0 schedule RWKs ?? 0:00.00 [swapper]
0 1 1208619026 0 10 0 64 0 wait RWs ?? 0:00.00 init
0 2 1208619026 0 14 0 0 0 crypto_w RWK ?? 0:00.00 [cryptoret]
0 3 1208619026 0 -6 0 0 0 sccomp RWK ?? 0:00.00 [scsibus0]
0 4 1208619026 0 -6 0 0 0 atath RWK ?? 0:00.00 [atabus0]
0 5 1208619026 0 -6 0 0 0 atath RWK ?? 0:00.00 [atabus1]
0 6 1208619026 0 -6 0 0 0 sccomp RWK ?? 0:00.00 [atapibus0]
0 7 1208619026 0 -18 0 0 0 lfswrite RWK ?? 0:00.00 [lfs_writer]
0 8 1208619026 0 -18 0 0 0 pgdaemon RWK ?? 0:00.00 [pagedaemon]
0 9 1208619026 0 18 0 0 0 syncer RWK ?? 0:00.00 [ioflush]
0 10 1208619026 0 -18 0 0 0 aiodoned RWK ?? 0:00.00 [aiodoned]
0 97 1208619026 0 2 0 32648 0 select RWs ?? 0:00.00 /usr/X11R6/bi
0 98 1208619026 1 10 0 400 0 wait RWs ?? 0:01.00 xdm: :0
0 99 1208619026 15 2 0 340 0 select RWs ?? 0:15.00 /usr/sbin/ssh
17 106 1208619026 0 18 0 964 0 pause RWs ?? 0:00.00 sendmail: Que
0 193 1208619026 0 2 0 700 0 select RW ?? 0:00.00 (xterm)
331 196 1208619026 0 2 0 496 0 select RWs ?? 0:00.00 (fetchmail)
0 208 1208619026 0 2 0 292 0 - RWs ?? 0:00.00 /usr/sbin/sys
0 237 1208619026 0 2 0 148 0 select RWs ?? 0:00.00 /usr/sbin/ypb
0 238 1208619026 0 2 0 324 0 poll RWs ?? 0:00.00 /usr/sbin/rpc
0 254 1208619026 21 2 0 44 0 nfsd RWL ?? 0:21.00 nfsd: server
0 274 1208619026 0 10 0 0 0 nfsidl RWK ?? 0:00.00 [nfsio]
0 275 1208619026 0 2 0 440 0 select RWs ?? 0:00.00 (amd)
0 278 1208619026 0 10 0 0 0 nfsidl RWK ?? 0:00.00 [nfsio]
0 279 1208619026 0 10 0 0 0 nfsidl RWK ?? 0:00.00 [nfsio]
0 284 1208619026 0 10 0 0 0 nfsidl RWK ?? 0:00.00 [nfsio]
0 286 1208619026 0 10 0 220 0 nanoslee RWs ?? 0:00.00 /usr/sbin/cro
0 312 1208619026 21 2 0 44 0 nfsd RWL ?? 0:21.00 (nfsd)
0 313 1208619026 21 2 0 112 0 poll RWs ?? 0:21.00 nfsd: master
0 317 1208619026 21 2 0 44 0 nfsd RWL ?? 0:21.00 nfsd: server
0 332 1208619026 21 2 0 44 0 nfsd RWL ?? 0:21.00 nfsd: server
0 345 1208619026 25 2 0 60 0 kqread RWs ?? 0:25.00 /usr/sbin/ine
0 373 1208619026 21 2 0 116 0 poll RWs ?? 0:21.00 (lpd)
0 495 1208619026 0 18 0 1140 0 pause RWs ?? 0:00.00 (ntpd)
0 539 1208619026 15 2 0 172 0 select RWs ?? 0:15.00 (xdm)
331 1328 1208619026 0 2 0 420 0 poll RW ?? 0:00.00 (xmeter)
331 1463 1208619026 16 -22 0 0 0 - ZW ?? 0:00.00 (xli)
0 1777 1208619026 0 2 0 700 0 select RW ?? 0:00.00 (xterm)
0 1855 1208619026 0 2 0 700 0 select RW ?? 0:00.00 (xterm)
0 2047 1208619026 0 2 0 700 0 select RW ?? 0:00.00 (xterm)
331 2746 1208619026 1 2 0 196 0 poll RW ?? 0:01.00 (xmailbox)
0 2981 1208619026 0 2 0 696 0 - RWs ?? 0:00.00 (xterm)
0 3046 1208619026 0 2 0 700 0 select RW ?? 0:00.00 (xterm)
331 3048 1208619026 0 2 0 304 0 select RW ?? 0:00.00 (fvwm)
331 3051 1208619026 0 2 0 128 0 poll RW ?? 0:00.00 (oclock)
331 3165 1208619026 2 18 0 212 0 pause RW ?? 0:02.00 (csh)
0 3190 1208619026 0 2 0 700 0 select RW ?? 0:00.00 (xterm)
331 3250 1208619026 36 -22 0 0 0 - ZW ?? 0:00.00 (csh)
0 3352 1208619026 0 2 0 700 0 select RW ?? 0:00.00 (xterm)
0 3452 1208619026 0 2 0 700 0 select RW ?? 0:00.00 (xterm)
0 3579 1208619026 0 2 0 700 0 select RW ?? 0:00.00 (xterm)
331 3636 1208619026 0 2 0 208 0 poll RW ?? 0:00.00 (xload)
331 3300 1208619026 0 2 0 432 0 select RWs+ p0 0:00.00 (ssh)
331 3113 1208619026 9 3 0 884 0 ttyin RWs+ p1 0:09.00 (tcsh)
331 1623 1208619026 0 2 0 256 0 poll RW+ p2 0:00.00 (top)
331 3349 1208619026 0 18 0 984 0 pause RWs p2 0:00.00 (tcsh)
331 1357 1208619026 19 3 0 884 0 ttyin RWs+ p3 0:19.00 (tcsh)
331 3315 1208619026 13 3 0 884 0 ttyin RWs+ p4 0:13.00 (tcsh)
331 1313 1208619026 36 64 0 12924 0 - RWLa p5 0:36.00 (firefox-bin)
331 1679 1208619026 1 29 0 980 0 - RWs+ p5 0:01.00 (tcsh)
331 3537 1208619026 7 3 0 884 0 ttyin RWs+ p6 0:07.00 (tcsh)
331 3019 1208619026 0 2 0 676 0 select RWs+ p7 0:00.00 (ssh)
331 194 1208619026 0 2 0 568 0 select RWs+ p8 0:00.00 (ssh)
331 330 1208619026 0 2 0 492 0 select RW+ p9 0:00.00 (ssh)
331 812 1208619026 0 18 0 888 0 pause RWs p9 0:00.00 (tcsh)
0 660 1208619026 17 3 0 48 0 ttyin RWs+ E0 0:17.00 (getty)
0 160 1208619026 17 3 0 48 0 ttyin RWs+ E1 0:17.00 /usr/libexec/
0 671 1208619026 17 3 0 48 0 ttyin RWs+ E2 0:17.00 /usr/libexec/
0 666 1208619026 17 3 0 48 0 ttyin RWs+ E3 0:17.00 (getty)
(gdb) target kcore netbsd.0.core
#0 0x00000001 in ?? ()
(gdb) where
#0 0x00000001 in ?? ()
#1 0xc0424f0f in cpu_reboot ()
#2 0xc034bd41 in db_reboot_cmd ()
#3 0xc034b887 in db_command ()
#4 0xc034b597 in db_command_loop ()
#5 0xc034e69f in db_trap ()
#6 0xc04224de in kdb_trap ()
#7 0xc0430dc2 in trap ()
#8 0xc010c6bf in calltrap ()
#9 0xc051bc38 in internal_command ()
#10 0xc051bd14 in wskbd_translate ()
#11 0xc051b9f5 in wskbd_cngetc ()
#12 0xc0432155 in cngetc ()
#13 0xc034d07d in db_readline ()
#14 0xc034d126 in db_read_line ()
#15 0xc034b586 in db_command_loop ()
#16 0xc034e69f in db_trap ()
#17 0xc04224de in kdb_trap ()
#18 0xc0430dc2 in trap ()
#19 0xc010c6bf in calltrap ()
#20 0xc04307af in syscall_plain ()
antifer# ps -ax -O paddr -M netbsd.0.core |grep firefox
1313 c54c0334 p5 RWLa 0:36.00 (firefox-bin)
(gdb) proc 0xc54c0334
can not access 0x24, invalid translation (invalid PTE)
can not access 0x24, invalid translation (invalid PTE)
cannot read pcb at 0x24
>How-To-Repeat:
looks random. I'm using this box for some time now, it's the first
crash. But I upgraded kernel and userland (base only, I didn't
rebuild the packages) to the 20040505 snapshot yesterday.
Before that it was running:
May 7 18:05:49 antifer /netbsd: NetBSD 2.0_BETA (GENERIC.MP) #0: Tue Mar 30 17:
43:13 CEST 2004
May 7 18:05:49 antifer /netbsd: bouyer@pop:/local/pop1/bouyer/tmp/i386/o
bj/local/pop1/bouyer/current/src/sys/arch/i386/compile/GENERIC.MP
>Fix:
unknown.
>Release-Note:
>Audit-Trail:
>Unformatted: