Subject: port-mips/25942: setcontext() causes kernel panic on MIPS
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <wileyc@rezrov.net>
List: netbsd-bugs
Date: 06/16/2004 15:36:38
>Number: 25942
>Category: port-mips
>Synopsis: setcontext() causes kernel panic on MIPS
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: port-mips-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jun 16 06:38:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator: Christopher SEKIYA
>Release: NetBSD 2.0_BETA
>Organization:
>Environment:
System: NetBSD indigo.rezrov.net 2.0_BETA NetBSD 2.0_BETA (GENERIC32_IP2x) #0: Sun Jun 13 20:36:50 JST 2004 wileyc@izu:/usr/builder/sgimips-2.0/obj/sys/arch/sgimips/compile/GENERIC32_IP2x sgimips
Architecture: mipseb
Machine: sgimips
>Description:
On 23 March 2004, a change was introduced into libc/arch/mips/gen that used
setcontext() to implement longjmp(), rather than the sigreturn() scheme used
in 1.6.
Unfortunately, this change causes kernel panics under various circumstances;
invoking csh as a login shell is the most visible case. This happens under
both -current (as of 16 June) and 2.0_BETA. On sgimips, the panic is:
Apr 22 13:28:07 mod80 login: ROOT LOGIN (root) ON console
bus error: cpu_stat 00000203 addr 0887fe40, gio_stat 00000000 addr 1fbc4003
panic: cache error @ EPC 0x882b0010 ErrCtl 0x3 CacheErr 0xa01934f3
panic: cache error @ EPC 0x882b579c ErrCtl 0x3 CacheErr 0xa03b5519
panic: cache error @ EPC 0x882b579c ErrCtl 0x3 CacheErr 0xa03b5519
Stopped in pid 305.1 (csh) at 0x882b1b18: jr ra
bdslot: nop
db>
This is Not Good(tm).
Possible causes:
* longjmp() is not restoring registers properly.
* The rtld isn't doing fixups properly. Invoking a static-linked csh
does not produce the panic.
* Cache botch. Apparently, an Alchemy Pb1500 doesn't have any problems
at all -- but it's the only MIPS CPU supported by NetBSD that has
a fully coherent cache.
* pmap botch. Suggested by Christos.
* toolchain miscompilation. Suggested by Nishimura-san.
>How-To-Repeat:
Invoke csh as a login shell.
>Fix:
I don't know exactly what's wrong with setcontext(), but backing out the
change (and adding COMPAT_16 to the kernel config) prevents the panics.
The patch I've been using to back out the change is:
Index: lib/libc/arch/mips/gen/Makefile.inc
===================================================================
RCS file: /cvsroot/src/lib/libc/arch/mips/gen/Makefile.inc,v
retrieving revision 1.24
diff -u -r1.24 Makefile.inc
--- lib/libc/arch/mips/gen/Makefile.inc 23 Mar 2004 12:31:52 -0000 1.24
+++ lib/libc/arch/mips/gen/Makefile.inc 13 Jun 2004 10:23:44 -0000
@@ -15,7 +15,7 @@
SRCS+= flt_rounds.c fpgetmask.c fpgetround.c fpgetsticky.c fpsetmask.c \
fpsetround.c fpsetsticky.c
-SRCS+= setjmp.S __setjmp14.S __longjmp14.c
+SRCS+= setjmp.S __setjmp14.S
SRCS+= _setjmp.S
SRCS+= sigsetjmp.S __sigsetjmp14.S
SRCS+= byte_swap_2.S byte_swap_4.S bswap64.c
Index: lib/libc/arch/mips/gen/__setjmp14.S
===================================================================
RCS file: /cvsroot/src/lib/libc/arch/mips/gen/__setjmp14.S,v
retrieving revision 1.10
diff -u -r1.10 __setjmp14.S
--- lib/libc/arch/mips/gen/__setjmp14.S 23 Mar 2004 02:21:49 -0000 1.10
+++ lib/libc/arch/mips/gen/__setjmp14.S 13 Jun 2004 10:23:45 -0000
@@ -130,6 +130,23 @@
move v0, zero
j ra
REG_EPILOGUE
+END(__setjmp14)
+
+LEAF(__longjmp14)
+#ifdef __ABICALLS__
+ .set noreorder
+ .cpload t9
+ .set reorder
+ subu sp, sp, 32
+ .cprestore 16
+#endif
+ REG_PROLOGUE
+ /* save return value in sc_regs[_R_V0] */
+ REG_S a1,(_OFFSETOF_SC_REGS + _R_V0 * SZREG)(a0)
+ REG_EPILOGUE
+ li v0, SYS_compat_16___sigreturn14
+ syscall
botch:
+ jal _C_LABEL(longjmperror)
jal _C_LABEL(abort)
-END(__setjmp14)
+END(__longjmp14)
>Release-Note:
>Audit-Trail:
>Unformatted: