On Wed, 15 May 2024 at 20:43, Havard Eidnes <he%netbsd.org@localhost> wrote:
>
> Hi,
>
> as you may know, I have been trying to keep the rust pkgsrc
> package up-to-date.
>
> As part of that effort, I do testing of the resulting compilers,
> and for "full success" I require that rust self-host on the
> target CPU architecture, and for amd64 and i386 I should be able
> to build and run firefox using the new compiler.
>
> However, lately I've been seeing failures in my testing which is
> a bit worrying. I've mostly assumed that "if there is something
> CPU-specific about an issue, it must show up and be tested for
> and fixed for one of the other platforms", but lately it appears
> that is not so much the case.
>
> Briefly, the test results for the recent versions of rust on
> NetBSD can be summarized as follows:
>
> 1.75.0 1.76.0 1.77.1 1.78.0
>
> aarch64 x x f f
> aarch64eb f f f f
> amd64 x x x x
> armv6 x x o ?
> armv7 x x f f
> i386 x x x x
> mipsel f f f f
> ppc x x x x
> riscv64 x x x x
> sparc64/10.0 x x x f
> sparc64/9.0 x x x x
>
> x = tests ok
> f = failed one or more tests
> o = ongoing, unknown
>
>
> We can perhaps disregard aarch64eb (where I just recently was
> able to spin up a qemu instance and do some actual testing), and
> mipsel (which is o32 mips32, which has never been able to
> self-host due to address space limitations, and rust-bin for this
> platform appears to have broken with a bump of the embedded LLVM,
> ref. https://github.com/rust-lang/rust/issues/118978).
>
>
> However, the rest is a bit worrying, here's a walkthrough:
>
> armv7: This is reported upstream in
> https://github.com/rust-lang/rust/issues/123549
>
> My more detailed test log shows:
>
> f NetBSD 9 earmv7hf-el / rust-bin dua-cli (cargo crashes early)
> f NetBSD 9 earmv7hf-el / llvm-16 native-build
> f NetBSD 9 earmv7hf-el / internal-llvm native-build (segv null pointer deref)
>
> The first one is trying to use the cross-built rust compiler via
> the rust177-bin package to build dua-cli.
>
> For second and the third, I end up with
>
> rustc exited with signal: 11 (SIGSEGV) (core dumped)
>
> after respectively nearly 40 or 60 hours wallclock time using a
> 4-CPU instance with MAKE_JOBS?=3 in a qemu instance (on an Intel
> host).
>
>
> armv6 most probably doesn't fare much better. To complete the
> last successful native build, I had to expand the MAXDSIZ kernel
> parameter to a value slightly above the default, i.e. run a
> custom kernel to complete the build. This one appears to run up
> against
>
> https://github.com/rust-lang/rust/issues/116758
>
> ...and now that I check, the wip/rust177 build had gotten stuck
> and had gone un-noticed for a while, now re-started.
>
>
> aarch64: this is perhaps more worrying, since this one "is
> supposed to work", is "a current platform" and isn't so resource-
> constrained as armv6 or armv7. This one reports a stack
> overflow in our case:
>
> thread 'rustc' has overflowed its stack
> fatal runtime error: stack overflow
>
> after around 33 hours wallclock time building on my virtual 4-CPU
> arm64 qemu instance (with MAKE_JOBS?=3), building with the
> bundled LLVM, during the "stage 2" build.
>
> This is reported upstream as
> https://github.com/rust-lang/rust/issues/123551
> but so far it appears nobody else is seeing a similar failure.
>
>
> riscv64: this one completes (knock on wood!), also built using a
> qemu instance, but when I run the dua-cli application via "dua i"
> I get "dua: text relocations" on the console (visible after I
> exit the curses-like program).
>
>
> sparc64: this one is a little weird. Upstream report
>
> https://github.com/rust-lang/rust/issues/117231
>
> Mostly rust builds OK on my NetBSD 9.2 host, using the C++
> compiler there (which is 7.5.0), including rust 1.78.0. However,
> on NetBSD 10.0, it looked like the bundled GCC (10.5.0)
> mis-compiles parts of LLVM(?) in 1.77.x, and there I've had to
> bump the required GCC to 12, but now that isn't working in 1.78.0
> anymore -- I get
>
> rustc exited with signal: 4 (SIGILL) (core dumped)
> error: could not compile `proc-macro-test-impl` (lib)
>
> after some 141 hours wallclock time.
>
> Earlier I've had trouble with the combination of "NetBSD/sparc64
> 10.0" and "external LLVM", but I'm re-trying that at the moment.
> We'll see. My sparc64 hosts are not fast...
>
>
> So... It would be really nice to get some assistance to figure
> out some of these failures. Perhaps aarch64 as a first priority,
> then the 32-bit arm platforms as second.
I've started building 1.78, using llvm 18.1.5 (I usually use the
built-in llvm, decided to test irst with the new llvm). This is on
$ uname -a
NetBSD narvi 10.99.10 NetBSD 10.99.10 (GENERIC64) #0: Tue May 14
01:56:21 BST 2024 sysbuil
d%ymir.lorien.lan@localhost:/dumps/sysbuild/evbarm64/obj/home/sysbuild/src/sys/arch/evbarm/compile/GE
NERIC64 evbarm
BTW I'm hardly the type to plug in Big Red, but their free tier OCI
offering is rather usable for software development - you can get 4
core / 24 gb + up to 4 about 60 GB disks entirely for free, no limit
(well, there is a limit, but the above configuration is just below its
border). There was an earlier message on the list regarding the
installation - it is trivial, basically one just builds a generic
Ubuntu guest, uploads the NetBSD live system, boots Ubuntu in single
user and overwrites its disk with the live system, all that from the
console interface one gets on OCI. I can attach the dmesg, if it is of
interest; it works rather well.
BTW, in the same line of thought, I have a problem wirh wip/lldb
18.1.5 - it fails for aarch64, both when ran from wip and whatever one
gets these days from git manually - I sent a message earlier to the
list.
Chavdar
>
>
> Best regards,
>
> - Håvard
And eventually it failed, using the builtin llvm, creating the second stage compiler, with a stack exceeded error. I increased the stack size to the max of 64m, to no effect.
--
----