Subject: port-sparc/5641: kernel fault on sun4c machines
To: None <gnats-bugs@gnats.netbsd.org>
From: Brad Spencer <brad@anduin.eldar.org>
List: netbsd-bugs
Date: 06/22/1998 17:00:53
>Number: 5641
>Category: port-sparc
>Synopsis: kernel fault on sun4c machines
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: gnats-admin (GNATS administrator)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jun 22 14:05:00 1998
>Last-Modified:
>Originator: Brad Spencer
>Organization:
Sitting at home
>Release: Mid to late May 1998
>Environment:
NetBSD valinor.eldar.org 1.3E NetBSD 1.3E (VALINOR) #3: Sat May 23 10:30:37 EDT 1998 brad@elrond.eldar.org:/usr/src/sys/arch/sparc/compile/VALINOR sparc
>Description:
I have a Sparc 2 which will panic with a 'kernel fault' under load.
The machine is being used to run Majordomo and typically has a number
of sendmail daemons running. It is doing little else. Prior to the
SS2, an IPX did the same task.
Here are a couple of crash dump outputs:
valinor% gdb /sys/arch/sparc/compile/VALINOR/netbsd.gdb
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (sparc-netbsd), Copyright 1996 Free Software Foundation, Inc...
(gdb) target kcore netbsd.3.core
panic: kernel fault
#0 mi_switch () at ../../../../kern/kern_synch.c:631
631 cpu_switch(p);
(gdb) where
#0 mi_switch () at ../../../../kern/kern_synch.c:631
#1 0xf00261f0 in bpendtsleep () at ../../../../kern/kern_synch.c:370
#2 0xf009a098 in uvm_scheduler () at ../../../../uvm/uvm_glue.c:421
#3 0xf00192cc in main () at ../../../../kern/init_main.c:412
(gdb) print *p
$1 = {p_forw = 0x0, p_back = 0x0, p_list = {le_next = 0x0,
le_prev = 0xf01f7208}, p_cred = 0xf010d180, p_fd = 0xf00ff760,
p_stats = 0xf00e434c, p_limit = 0xf010cce0, p_vmspace = 0xf0105380,
p_sigacts = 0xf00e4220, p_flag = 516, p_unused = 0 '\000',
p_stat = 3 '\003', p_pad1 = "\000", p_pid = 0, p_hash = {le_next = 0x0,
le_prev = 0x0}, p_pglist = {le_next = 0xf01f7000, le_prev = 0xf010b458},
p_pptr = 0x0, p_sibling = {le_next = 0x0, le_prev = 0x0}, p_children = {
lh_first = 0xf01f7000}, p_oppid = 0, p_dupfd = 0, p_estcpu = 0,
p_cpticks = 0, p_pctcpu = 0, p_wchan = 0xf0107ff8,
p_wmesg = 0xf0099fa8 "scheduler", p_swtime = 41738, p_slptime = 7,
p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {
tv_sec = 0, tv_usec = 0}}, p_rtime = {tv_sec = 0, tv_usec = 357198},
p_uticks = 0, p_sticks = 35, p_iticks = 0, p_traceflag = 0, p_tracep = 0x0,
p_siglist = 0, p_textvp = 0x0, p_locks = 0, p_simple_locks = 0,
p_holdcnt = 0, p_emul = 0xf00e6ef4, p_spare = {0}, p_sigmask = 0,
p_sigignore = 407404544, p_sigcatch = 0, p_priority = 4 '\004',
p_usrpri = 50 '2', p_nice = 20 '\024',
p_comm = "swapper\000\000\000\000\000\000\000\000\000", p_pgrp = 0xf010b450,
p_thread = 0x0, p_addr = 0xf00e4000, p_md = {md_tf = 0x0, md_fpstate = 0x0,
md_flags = 0}, p_xstat = 0, p_acflag = 0, p_ru = 0x0}
(gdb) quit
.... and ....
valinor% gdb /sys/arch/sparc/compile/VALINOR/netbsd.gdb
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (sparc-netbsd), Copyright 1996 Free Software Foundation, Inc...
(gdb) target kcore netbsd.4.core
panic: kernel fault
#0 mi_switch () at ../../../../kern/kern_synch.c:631
631 cpu_switch(p);
(gdb) where
#0 mi_switch () at ../../../../kern/kern_synch.c:631
#1 0xf00261f0 in bpendtsleep () at ../../../../kern/kern_synch.c:370
#2 0xf00410bc in biowait (bp=0xf02034d0) at ../../../../kern/vfs_bio.c:811
#3 0xf00a5eb8 in uvm_swap_io (pps=0x3, startslot=-268169216, npages=1,
flags=1048576) at ../../../../uvm/uvm_swap.c:1763
#4 0xf00a5c24 in uvm_swap_get (page=0xf0186b94, swslot=1007, flags=2)
at ../../../../uvm/uvm_swap.c:1630
#5 0xf0097968 in uao_get (uobj=0xf00f06e8, offset=4028132244, pps=0xf00e5ce0,
npagesp=0xf00e5ce0, centeridx=0, access_type=0, advice=1, flags=0)
at ../../../../uvm/uvm_aobj.c:929
#6 0xf00993d8 in uvm_fault (orig_map=0x4, vaddr=4052602880, fault_type=2,
access_type=7) at ../../../../uvm/uvm_fault.c:1281
#7 0xf0099a2c in uvm_fault_wire (map=0xf00f0730, start=4052598784,
end=4052606976) at ../../../../uvm/uvm_fault.c:1692
#8 0xf0099f2c in uvm_swapin (p=0xf0311600) at ../../../../uvm/uvm_glue.c:360
#9 0xf009a0c8 in uvm_scheduler () at ../../../../uvm/uvm_glue.c:438
#10 0xf00192cc in main () at ../../../../kern/init_main.c:412
(gdb) print *p
$1 = {p_forw = 0xf0401a00, p_back = 0xf0105b30, p_list = {le_next = 0x0,
le_prev = 0xf01f7208}, p_cred = 0xf010d180, p_fd = 0xf00ff760,
p_stats = 0xf00e434c, p_limit = 0xf010cce0, p_vmspace = 0xf0105380,
p_sigacts = 0xf00e4220, p_flag = 516, p_unused = 0 '\000',
p_stat = 2 '\002', p_pad1 = "\000", p_pid = 0, p_hash = {le_next = 0x0,
le_prev = 0x0}, p_pglist = {le_next = 0xf01f7000, le_prev = 0xf010b458},
p_pptr = 0x0, p_sibling = {le_next = 0x0, le_prev = 0x0}, p_children = {
lh_first = 0xf01f7000}, p_oppid = 0, p_dupfd = 0, p_estcpu = 0,
p_cpticks = 0, p_pctcpu = 0, p_wchan = 0x0, p_wmesg = 0xf0041068 "biowait",
p_swtime = 24807, p_slptime = 0, p_realtimer = {it_interval = {tv_sec = 0,
tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_rtime = {
tv_sec = 0, tv_usec = 453156}, p_uticks = 0, p_sticks = 62, p_iticks = 0,
p_traceflag = 0, p_tracep = 0x0, p_siglist = 0, p_textvp = 0x0, p_locks = 0,
p_simple_locks = 0, p_holdcnt = 0, p_emul = 0xf00e6ef4, p_spare = {0},
p_sigmask = 0, p_sigignore = 407404544, p_sigcatch = 0,
p_priority = 17 '\021', p_usrpri = 50 '2', p_nice = 20 '\024',
p_comm = "swapper\000\000\000\000\000\000\000\000\000", p_pgrp = 0xf010b450,
p_thread = 0x0, p_addr = 0xf00e4000, p_md = {md_tf = 0x0, md_fpstate = 0x0,
md_flags = 0}, p_xstat = 0, p_acflag = 0, p_ru = 0x0}
(gdb) quit
The machine usually panics once every couple of days pretty much in
the same way. From every dump I have seen it appears that the
"swapper" process was being scheduled to run.
>How-To-Repeat:
It is a little tough to say. I am not sure that it is enough for
things to be loaded down, but it certainly only seems to panic when it
is busy working.
>Fix:
Don't know... However, kernel configs, kernel dumps, or access to the
machine is available upon request.
>Audit-Trail:
>Unformatted: