NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-evbmips/59236 (Multiple segfaults in erlite3 boot)
The following reply was made to PR port-evbmips/59236; it has been noted by GNATS.
From: Rin Okuyama <rokuyama.rk%gmail.com@localhost>
To: gnats-bugs%netbsd.org@localhost, port-evbmips-maintainer%netbsd.org@localhost,
netbsd-bugs%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, riastradh%NetBSD.org@localhost,
Andreas Gustafsson <gson%gson.org@localhost>, Martin Husemann <martin%duskware.de@localhost>
Cc:
Subject: Re: port-evbmips/59236 (Multiple segfaults in erlite3 boot)
Date: Sat, 19 Apr 2025 15:06:09 +0900
On 2025/04/19 4:04, riastradh%NetBSD.org@localhost wrote:
> Synopsis: Multiple segfaults in erlite3 boot
>
> State-Changed-From-To: open->feedback
> State-Changed-By: riastradh%NetBSD.org@localhost
> State-Changed-When: Fri, 18 Apr 2025 19:04:59 +0000
> State-Changed-Why:
> This is probably the the same CN50xx bug that we have been puzzling
> over in PR port-mips/59064: jemalloc switch to 5.3 broke userland
> <https://gnats.NetBSD.org/59064>.
>
> Can you try the patch at the bottom of this message?
>
> https://mail-index.NetBSD.org/netbsd-bugs/2025/04/14/msg088307.html
Thank you very much for working on this problem!
However, unfortunately, even with your patch, erlite3 cannot boot
into multiuser mode, both for n64 and n32 userlands:
https://gist.github.com/rokuyama/7bbe1619e55e8e3aba5bf3b112a23725
On the other hand, MIPSSIM64 kernel on QEMU successfully boots into
multiuser mode.
In the above-mentioned log, debug printf is enabled for trap():
```
diff --git a/sys/arch/mips/mips/trap.c b/sys/arch/mips/mips/trap.c
index 58caf19e2d2..a079dec91dd 100644
--- a/sys/arch/mips/mips/trap.c
+++ b/sys/arch/mips/mips/trap.c
@@ -448,8 +448,8 @@ trap(uint32_t status, uint32_t cause, vaddr_t vaddr,
vaddr_t pc,
rv = uvm_fault(map, va, ftype);
pcb->pcb_onfault = onfault;
-#if defined(VMFAULT_TRACE)
- if (!KERNLAND_P(va))
+#if defined(VMFAULT_TRACE) || 1
+ if (!KERNLAND_P(va) && rv != 0)
printf(
"uvm_fault(%p (pmap %p), %#"PRIxVADDR
" (%"PRIxVADDR"), %d) -> %d at pc %#"PRIxVADDR"\n",
```
You can see SEGVs are caused by read access to NULL:
```
[ 13.3599689] uvm_fault(0x980000041f9c0c00 (pmap 0x980000041fce44d0), 0
(0), 1) -> 14 at pc 0xfff83b1db4
[1] Segmentation fault (core dumped) /sbin/ifconfig lo0 inet6
>/dev/null 2>&1
...
[ 19.5399661] uvm_fault(0x980000041f20c800 (pmap 0x980000041fce44d0), 0
(0), 1) -> 14 at pc 0xfff8391db4
[1] Segmentation fault (core dumped) awk "/^sendmail[ \t]/{print\$2}"
/etc/mailer.conf
```
As you pointed out earlier, SEGVs can be avoided by replacing
`user_reserved_insn` with `user_gen_exception`, i.e.:
https://gist.github.com/rokuyama/c7a50b8e7a62dc25f3f536f1434eea9b
By grep'ping into Linux codes, I've found they check TLB entry
for PC before fetching it:
https://github.com/torvalds/linux/commit/5b10496b6e65#diff-bbe4c1a54ce7bd13e6109d887383993c3b5276a1362f84092e9ef31dc84064d9R390
and our `user_gen_exception` path uses copyin(9), of course.
I don't know ~anything for mips, and much more destructive results
may happen for this "double-fault scenario", although...
Thanks,
rin
> If you open one of the core dumps in gdb (if you are able to do that
> from another machine where everything isn't segfaulting all the time,
> e.g. if the core dump is written to nfs) and do `x/i $pc' and `bt', I
> bet you will find it in malloc_default (via some stack trace through
> jemalloc) at this instruction:
>
> 00008a58 <malloc_default>:
> malloc_default():
> /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
> 8a58: 27bdff70 addiu sp,sp,-144
> 8a5c: ffbc0078 sd gp,120(sp)
> 8a60: 3c1c0000 lui gp,0x0
> 8a60: R_MIPS_GPREL16 malloc_default
> 8a60: R_MIPS_SUB *ABS*
> 8a60: R_MIPS_HI16 *ABS*
> 8a64: 0399e021 addu gp,gp,t9
> 8a68: 279c0000 addiu gp,gp,0
> 8a68: R_MIPS_GPREL16 malloc_default
> 8a68: R_MIPS_SUB *ABS*
> 8a68: R_MIPS_LO16 *ABS*
> tsd_fetch_impl():
> /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
> 8a6c: 8f820000 lw v0,0(gp)
> 8a6c: R_MIPS_TLS_GOTTPREL je_tsd_tls
> 8a70: 7c03e83b 0x7c03e83b
> malloc_default():
> /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
> 8a74: ffb10040 sd s1,64(sp)
> 8a78: ffb00038 sd s0,56(sp)
> tsd_fetch_impl():
> /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
> 8a7c: 00433021 addu a2,v0,v1
> malloc_default():
> /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
> 8a80: ffbf0088 sd ra,136(sp)
> 8a84: ffbe0080 sd s8,128(sp)
> 8a88: ffb70070 sd s7,112(sp)
> 8a8c: ffb60068 sd s6,104(sp)
> 8a90: ffb50060 sd s5,96(sp)
> 8a94: ffb40058 sd s4,88(sp)
> 8a98: ffb30050 sd s3,80(sp)
> 8a9c: ffb20048 sd s2,72(sp)
> tsd_fetch_impl():
> /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:422
> => 8aa0: 90c30258 lbu v1,600(a2)
>
> And I bet you will find that $v0 holds the address malloc_default+0x18,
> i.e., the pc of this instruction:
>
> tsd_fetch_impl():
> /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
> 8a6c: 8f820000 lw v0,0(gp)
> 8a6c: R_MIPS_TLS_GOTTPREL je_tsd_tls
> => 8a70: 7c03e83b 0x7c03e83b
>
> The instruction 0x7c03e83b is sometimes also written
>
> rdhwr $3,$29
>
> or
>
> rdhwr v1,ulr
>
> but it is architecturally undefined so it traps to the kernel to
> emulate, and the kernel is supposed to return the thread's tcb pointer
> in v1.
>
> But as a side effect, the emulation clobbers the register v0 with the
> address of the excepting instruction, rather than leaving it as the
> value it found at -1234(gp) (or whatever; written as 0(gp) above, but
> the linker will replace it by some probably-nonzero number; you can use
> `objdump --disassemble=malloc_default libc.so' to find it), which is
> decidedly not the instruction address malloc_default+0x18 but rather
> some tls offset that is reasonable to add to the tcb pointer.
>
> Now, the emulation routine
> https://nxr.netbsd.org/xref/src/sys/arch/mips/mips/mipsX_subr.S?r=1.115#1297
> is not _supposed_ to clobber v0 -- it goes out of its way to save v0 on
> the kernel stack and restore it before returning from the exception:
>
> 1312 /* Need two working registers */
> 1313 REG_S AT, CALLFRAME_SIZ+TF_REG_AST(k0)
> 1314 REG_S v0, CALLFRAME_SIZ+TF_REG_V0(k0)
> ...
> 1349 REG_L AT, CALLFRAME_SIZ+TF_REG_AST(k0)# restore reg
> 1350 REG_L v0, CALLFRAME_SIZ+TF_REG_V0(k0) # restore reg
> 1351 eret
>
> But, in all my trials, it has been consistently corrupted in the same
> way. The best theory we have for why it is corrupted is cn50xx CPUs --
> found in erlite3 (but not er4) -- have some kind of register-writeback
> bug (which passes through some register renaming unchanged) provoked by
> the particular combination of reading MIPS_COP_0_EXC_PC and eret so
> that after the eret, the exception pc gets written back to v0 even
> though we just restored v0 from the kernel stack.
>
> So, all that said, here is a summary of the science we did on my
> erlite3, together with a patch that seems to address the issue and --
> under the theory that it is the register that we move MIPS_COP_0_EXC_PC
> into -- will only corrupt a temporary register k0 which is not
> accessible to userland and treated as garbage on any kernel entry
> points, so it's safe:
>
> https://mail-index.NetBSD.org/netbsd-bugs/2025/04/14/msg088307.html
>
>
>
Home |
Main Index |
Thread Index |
Old Index