NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-mips/59064 (jemalloc switch to 5.3 broke userland)



The following reply was made to PR port-mips/59064; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Rin Okuyama <rokuyama.rk%gmail.com@localhost>
Cc: Martin Husemann <martin%duskware.de@localhost>, gnats-bugs%netbsd.org@localhost,
	port-mips-maintainer%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
	netbsd-bugs%netbsd.org@localhost, martin%NetBSD.org@localhost
Subject: Re: port-mips/59064 (jemalloc switch to 5.3 broke userland)
Date: Sun, 13 Apr 2025 12:42:29 +0000

 > Date: Sun, 13 Apr 2025 13:39:54 +0900
 > From: Rin Okuyama <rokuyama.rk%gmail.com@localhost>
 >=20
 > On 2025/04/12 23:51, Taylor R Campbell wrote:
 > > Can you try the attached patch?  Will require a clean build of
 > > anything that uses bsd.lib.mk.  (Will also need something to wash the
 > > embarrassment off my face if this turns out to be the culprit!)
 >=20
 > Thank you very much for finding it out!
 
 The clue that tipped me off was the t_tls_static test failure in the
 pmax releng testbed, which started after a few changes to bsd.*.mk and
 to make(1):
 
 https://releng.NetBSD.org/b5reports/pmax/commits-2025.01.html#build-2025.01=
 .14.16.46.38
 
 > Statically-linked binaries (specifically, /rescue/*) on n{64,32}
 > userland on ERLite-3 work just fine on ERLite-3, if "initial-exec"
 > attribute is removed at the same time.
 >=20
 > Also, libc/tls and ld.elf_so tests becomes working again
 > (except for t_rtld_r_debug).
 
 Nice!  I added some extra diagnostics to t_rtld_r_debug -- maybe they
 will help to figure out what's going on.
 
 > I forgot to mention, but userland works even with "initial-exec"
 > TLS model on QEMU and GXemul for mips somehow. Emulation may be
 > not precise enough, or our TLS handling relays on some undefined
 > H/W behaviors?
 
 Is this for emulating the  RDHWR $3,$29  instruction, 0x7c03e83b?
 
 There's a funny comment in sys/arch/mips/include/lwp_private.h (which
 was originally added by matt@ to sys/arch/mips/include/mcontext.h
 rev. 1.21 back in 2015):
 
      57 		// For some reason the syscall is much faster than
      58 		// emulating rdhwr $3,$29 on a CN50xx
 
 https://nxr.NetBSD.org/xref/src/sys/arch/mips/include/lwp_private.h?r=3D1.1=
 #57
 
 I wonder if that's related -- gcc emits the RDHWR instruction itself,
 rather than going through the __lwp_gettcb_fast function.
 
 > By examining `VMFAULT_TRACE` codes of mips/trap.c, __BIT(40) is
 > turned on for fault addresses, e.g., 0x1fff0a25050 (for most cases?).
 > This is odd as our user address space is only 40-bit for mips64.
 >=20
 > I've not figured out what is going on for ERLite-3...
 
 Curious...  Is it different on other MIPS?  Does the other information
 in the print confirm that this is supposed to be a user address?  Can
 you find what userland was doing to provoke this?
 
 > PS
 > Also, your patch fixes recent ATF regressions for arm:
 >=20
 > - lib/libc/tls/t_tls_static:t_tls_static
 > - usr.bin/c++/t_cxxruntime:cxxruntime_static
 > - usr.bin/c++/t_static_destructor:static_destructor_static
 >=20
 > I've just noticed that these tests abort by calling libc stub of
 > _tls_get_addr(), in a similar manner to mips.
 
 Excellent!  I have committed the fix (and added a note to UPDATING
 that libraries require a clean build).
 


Home | Main Index | Thread Index | Old Index