Re: port-evbmips/59236 (Multiple segfaults in erlite3 boot)

To: port-evbmips-maintainer%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,gson%gson.org@localhost (Andreas Gustafsson)
Subject: Re: port-evbmips/59236 (Multiple segfaults in erlite3 boot)
From: "Rin Okuyama via gnats" <gnats-admin%NetBSD.org@localhost>
Date: Sat, 19 Apr 2025 06:10:02 +0000 (UTC)
The following reply was made to PR port-evbmips/59236; it has been noted by GNATS.

From: Rin Okuyama <rokuyama.rk%gmail.com@localhost>
To: gnats-bugs%netbsd.org@localhost, port-evbmips-maintainer%netbsd.org@localhost,
 netbsd-bugs%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, riastradh%NetBSD.org@localhost,
 Andreas Gustafsson <gson%gson.org@localhost>, Martin Husemann <martin%duskware.de@localhost>
Cc: 
Subject: Re: port-evbmips/59236 (Multiple segfaults in erlite3 boot)
Date: Sat, 19 Apr 2025 15:06:09 +0900

 On 2025/04/19 4:04, riastradh%NetBSD.org@localhost wrote:
 > Synopsis: Multiple segfaults in erlite3 boot
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: riastradh%NetBSD.org@localhost
 > State-Changed-When: Fri, 18 Apr 2025 19:04:59 +0000
 > State-Changed-Why:
 > This is probably the the same CN50xx bug that we have been puzzling
 > over in PR port-mips/59064: jemalloc switch to 5.3 broke userland
 > <https://gnats.NetBSD.org/59064>.
 > 
 > Can you try the patch at the bottom of this message?
 > 
 > https://mail-index.NetBSD.org/netbsd-bugs/2025/04/14/msg088307.html
 
 Thank you very much for working on this problem!
 
 However, unfortunately, even with your patch, erlite3 cannot boot
 into multiuser mode, both for n64 and n32 userlands:
 https://gist.github.com/rokuyama/7bbe1619e55e8e3aba5bf3b112a23725
 
 On the other hand, MIPSSIM64 kernel on QEMU successfully boots into
 multiuser mode.
 
 In the above-mentioned log, debug printf is enabled for trap():
 ```
 diff --git a/sys/arch/mips/mips/trap.c b/sys/arch/mips/mips/trap.c
 index 58caf19e2d2..a079dec91dd 100644
 --- a/sys/arch/mips/mips/trap.c
 +++ b/sys/arch/mips/mips/trap.c
 @@ -448,8 +448,8 @@ trap(uint32_t status, uint32_t cause, vaddr_t vaddr, 
 vaddr_t pc,
   		rv = uvm_fault(map, va, ftype);
   		pcb->pcb_onfault = onfault;
 
 -#if defined(VMFAULT_TRACE)
 -		if (!KERNLAND_P(va))
 +#if defined(VMFAULT_TRACE) || 1
 +		if (!KERNLAND_P(va) && rv != 0)
   			printf(
   			    "uvm_fault(%p (pmap %p), %#"PRIxVADDR
   			    " (%"PRIxVADDR"), %d) -> %d at pc %#"PRIxVADDR"\n",
 ```
 
 You can see SEGVs are caused by read access to NULL:
 ```
 [  13.3599689] uvm_fault(0x980000041f9c0c00 (pmap 0x980000041fce44d0), 0 
 (0), 1) -> 14 at pc 0xfff83b1db4
 [1]   Segmentation fault (core dumped) /sbin/ifconfig lo0 inet6 
  >/dev/null 2>&1
 ...
 [  19.5399661] uvm_fault(0x980000041f20c800 (pmap 0x980000041fce44d0), 0 
 (0), 1) -> 14 at pc 0xfff8391db4
 [1]   Segmentation fault (core dumped) awk "/^sendmail[ \t]/{print\$2}" 
 /etc/mailer.conf
 ```
 
 As you pointed out earlier, SEGVs can be avoided by replacing
 `user_reserved_insn` with `user_gen_exception`, i.e.:
 https://gist.github.com/rokuyama/c7a50b8e7a62dc25f3f536f1434eea9b
 
 By grep'ping into Linux codes, I've found they check TLB entry
 for PC before fetching it:
 https://github.com/torvalds/linux/commit/5b10496b6e65#diff-bbe4c1a54ce7bd13e6109d887383993c3b5276a1362f84092e9ef31dc84064d9R390
 
 and our `user_gen_exception` path uses copyin(9), of course.
 
 I don't know ~anything for mips, and much more destructive results
 may happen for this "double-fault scenario", although...
 
 Thanks,
 rin
 
 > If you open one of the core dumps in gdb (if you are able to do that
 > from another machine where everything isn't segfaulting all the time,
 > e.g. if the core dump is written to nfs) and do `x/i $pc' and `bt', I
 > bet you will find it in malloc_default (via some stack trace through
 > jemalloc) at this instruction:
 > 
 > 00008a58 <malloc_default>:
 > malloc_default():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
 >      8a58:       27bdff70        addiu   sp,sp,-144
 >      8a5c:       ffbc0078        sd      gp,120(sp)
 >      8a60:       3c1c0000        lui     gp,0x0
 >                          8a60: R_MIPS_GPREL16    malloc_default
 >                          8a60: R_MIPS_SUB        *ABS*
 >                          8a60: R_MIPS_HI16       *ABS*
 >      8a64:       0399e021        addu    gp,gp,t9
 >      8a68:       279c0000        addiu   gp,gp,0
 >                          8a68: R_MIPS_GPREL16    malloc_default
 >                          8a68: R_MIPS_SUB        *ABS*
 >                          8a68: R_MIPS_LO16       *ABS*
 > tsd_fetch_impl():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
 >      8a6c:       8f820000        lw      v0,0(gp)
 >                          8a6c: R_MIPS_TLS_GOTTPREL       je_tsd_tls
 >      8a70:       7c03e83b        0x7c03e83b
 > malloc_default():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
 >      8a74:       ffb10040        sd      s1,64(sp)
 >      8a78:       ffb00038        sd      s0,56(sp)
 > tsd_fetch_impl():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
 >      8a7c:       00433021        addu    a2,v0,v1
 > malloc_default():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
 >      8a80:       ffbf0088        sd      ra,136(sp)
 >      8a84:       ffbe0080        sd      s8,128(sp)
 >      8a88:       ffb70070        sd      s7,112(sp)
 >      8a8c:       ffb60068        sd      s6,104(sp)
 >      8a90:       ffb50060        sd      s5,96(sp)
 >      8a94:       ffb40058        sd      s4,88(sp)
 >      8a98:       ffb30050        sd      s3,80(sp)
 >      8a9c:       ffb20048        sd      s2,72(sp)
 > tsd_fetch_impl():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:422
 >   => 8aa0:       90c30258        lbu     v1,600(a2)
 > 
 > And I bet you will find that $v0 holds the address malloc_default+0x18,
 > i.e., the pc of this instruction:
 > 
 > tsd_fetch_impl():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
 >      8a6c:       8f820000        lw      v0,0(gp)
 >                          8a6c: R_MIPS_TLS_GOTTPREL       je_tsd_tls
 >   => 8a70:       7c03e83b        0x7c03e83b
 > 
 > The instruction 0x7c03e83b is sometimes also written
 > 
 > 	rdhwr	$3,$29
 > 
 > or
 > 
 > 	rdhwr	v1,ulr
 > 
 > but it is architecturally undefined so it traps to the kernel to
 > emulate, and the kernel is supposed to return the thread's tcb pointer
 > in v1.
 > 
 > But as a side effect, the emulation clobbers the register v0 with the
 > address of the excepting instruction, rather than leaving it as the
 > value it found at -1234(gp) (or whatever; written as 0(gp) above, but
 > the linker will replace it by some probably-nonzero number; you can use
 > `objdump --disassemble=malloc_default libc.so' to find it), which is
 > decidedly not the instruction address malloc_default+0x18 but rather
 > some tls offset that is reasonable to add to the tcb pointer.
 > 
 > Now, the emulation routine
 > https://nxr.netbsd.org/xref/src/sys/arch/mips/mips/mipsX_subr.S?r=1.115#1297
 > is not _supposed_ to clobber v0 -- it goes out of its way to save v0 on
 > the kernel stack and restore it before returning from the exception:
 > 
 >     1312 	/* Need two working registers */
 >     1313 	REG_S	AT, CALLFRAME_SIZ+TF_REG_AST(k0)
 >     1314 	REG_S	v0, CALLFRAME_SIZ+TF_REG_V0(k0)
 > ...
 >     1349 	REG_L	AT, CALLFRAME_SIZ+TF_REG_AST(k0)# restore reg
 >     1350 	REG_L	v0, CALLFRAME_SIZ+TF_REG_V0(k0) # restore reg
 >     1351 	eret
 > 
 > But, in all my trials, it has been consistently corrupted in the same
 > way.  The best theory we have for why it is corrupted is cn50xx CPUs --
 > found in erlite3 (but not er4) -- have some kind of register-writeback
 > bug (which passes through some register renaming unchanged) provoked by
 > the particular combination of reading MIPS_COP_0_EXC_PC and eret so
 > that after the eret, the exception pc gets written back to v0 even
 > though we just restored v0 from the kernel stack.
 > 
 > So, all that said, here is a summary of the science we did on my
 > erlite3, together with a patch that seems to address the issue and --
 > under the theory that it is the register that we move MIPS_COP_0_EXC_PC
 > into -- will only corrupt a temporary register k0 which is not
 > accessible to userland and treated as garbage on any kernel entry
 > points, so it's safe:
 > 
 > https://mail-index.NetBSD.org/netbsd-bugs/2025/04/14/msg088307.html
 > 
 > 
 >
Prev by Date: Re: port-evbmips/59236 (Multiple segfaults in erlite3 boot)
Next by Date: port-mips/59327: user stack pointer is not aligned properly
Previous by Thread: Re: port-evbmips/59236 (Multiple segfaults in erlite3 boot)
Next by Thread: Re: port-evbmips/59236 (Multiple segfaults in erlite3 boot)
Indexes:
Home | Main Index | Thread Index | Old Index