Subject: -current kernel broken
To: None <port-sparc@NetBSD.ORG>
From: der Mouse <mouse@Collatz.McRCIM.McGill.EDU>
List: port-sparc
Date: 06/12/1996 17:05:09
Well, I finally brought "my" NetBSD/sparc machine up to -current. But
the new kernel croaked. Here's a ten-finger copy:
/dev/rsd0f: 75206 files, 727590 used, 156911 free (19511 frags, 17175 blocks, 2.2% fragmentation)
trap type 0x83: pc=f804fff4 npc=f804fff8 psr=8000c6<S,PS>
panic: flush windows
Stopped at [...]
db> trace
_trap(83, 8000c6, f804fff4, f992abc8, f861c400, 0) at _trap+0x220
slowtrap(10, 30, f85f28c0, f85f28f0, f8622700, 3ff) at slowtrap+0x124
_null_node_alloc(f8604760, f804febc, f8623700, f85e6034, f9929000, f8624c00) at _null_node_alloc+0x244
_null_node_create(0, f8614000, f992ad70, 1, f8609200, f8618c00) at _null_node_create+0xa4
_nullfs_mount(0, f7fffd2a, f7fff898, f992ae00, f861c400, f804f630) at _nullfs_mount+0x9c
_sys_mount(0, f992af28, f992af20, f8047adc, 800084, f992afb0) at _sys_mount+0x43c
_syscall(15, f992afb0, 0, 3, 3fc, 0) at _syscall+0x1f0
syscall(2b68, f7fffd2a, 0, f7fff898, 400086, f992afb0) at syscall+0x120
db>
This proved to be repeatable. I removed the nullfs mount from
/etc/fstab and then could bring the machine up.
Further investigation reveals the true cause. 0xf804fff4 is
_null_lock+0x138, and upon disassembling the code, I find
0xf804ffd0 <null_lock+276>: st %o4, [ %o0 + 0x10 ]
0xf804ffd4 <null_lock+280>: b 0xf804fec8 <null_lock+12>
0xf804ffd8 <null_lock+284>: ld [ %l0 ], %o0
0xf804ffdc <null_lock+288>: ld [ %o0 + 0x134 ], %o0
0xf804ffe0 <null_lock+292>: cmp %o0, 0
0xf804ffe4 <null_lock+296>: be,a 0xf804fff0 <null_lock+308>
0xf804ffe8 <null_lock+300>: mov -1, %o0
0xf804ffec <null_lock+304>: ld [ %o0 + 0x30 ], %o0
0xf804fff0 <null_lock+308>: st %o0, [ %i0 + 0x14 ]
0xf804fff4 <null_lock+312>: ta 3
0xf804fff8 <null_lock+316>: st %i7, [ %i0 + 0x18 ]
Correlating the disassembly with the source, this has to be from the
segment
#ifdef DIAGNOSTIC
if (curproc)
nn->null_pid = curproc->p_pid;
else
nn->null_pid = -1;
nn->null_lockpc = RETURN_PC(0);
nn->null_lockpc2 = RETURN_PC(1);
#endif
and in null.h, I find that the relevant definition for RETURN_PC is
#define RETURN_PC(frameno) __builtin_return_address(frameno)
which gcc is turning into something involving a "ta 3". But the trap
handler isn't prepared to deal with flush-windows traps from within the
kernel.
The source tree was supped June 10th AM; examining the sources from
this morning's sup (ie, June 12th) I see no indication that any of the
pieces leading to this panic have changed: gcc still appears prepared
to generate a flush-windows trap when __builtin_return_address is used,
null.h still defines RETURN_PC to use it, null_lock() still uses
RETURN_PC ifdef DIAGNOSTIC, and the trap code still looks prepared to
panic if a flush-windows trap strikes from within the kernel.
On an unrelated note, I have RASTERCONSOLE, RASTERCONS_SMALLFONT, and
RASTERCONS_FULLSCREEN, and my screen showed up 80x34. I'm still
looking into this one; I mention it in case it reminds anyone of
anything.
der Mouse
mouse@collatz.mcrcim.mcgill.edu