pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: mongodb4 on aarch64: ssl instruction probing woes, net/unifi with other mongo?



Jonathan Perkin <jperkin%mnx.io@localhost> writes:

> * On 2024-11-23 at 18:22 GMT, Greg Troxel wrote:
>
>>I built mongodb4, but running it gets me "Illegal instruction" and a
>>core.
>>
>>Before I try harder, any wisdom?
>
> Usual first principles, backtrace on the core and see where it's being
> generated from, then see if there are build options that avoid that
> section of code, or fix the assembly to use supported instructions, or
> if it turns out not to be assembly then see if it's using some -march
> flags that aren't supported by your CPU.

I am 90%+ sure that what's going on is that openssl has a scheme to try
instructions that only work on some cpus and figure out what works and
then use different asm for efficiency, and then wildly guessing that
somehow mongodb4 is also playing with signal handlers.

This is based on previous experience seeing very similar backtraces in
openssl-land, when debugging, but the program running fine. Or then
failing on some other fault, which one can get to in gdb with enough c
commands.

Program received signal SIGILL, Illegal instruction.
0x0000f25d2afb8e28 in _armv8_pmull_probe () from /usr/lib/libcrypto.so.15
(gdb) bt
#0  0x0000f25d2afb8e28 in _armv8_pmull_probe () from /usr/lib/libcrypto.so.15
#1  0x0000f25d2afb924c in OPENSSL_cpuid_setup () from /usr/lib/libcrypto.so.15
#2  0x0000fffff3686628 in _rtld_call_init_function () from /usr/libexec/ld.elf_so
#3  0x0000fffff3686934 in _rtld_call_init_functions () from /usr/libexec/ld.elf_so
#4  0x0000fffff368720c in _rtld () from /usr/libexec/ld.elf_so
#5  0x0000fffff3680b10 in _rtld_start () from /usr/libexec/ld.elf_so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

However, I kept continuing and it seems perhaps the real culprit is

Program received signal SIGILL, Illegal instruction.
0x00000000030d41a0 in __static_initialization_and_destruction_0(int, int) [clone .constprop.0] ()
(gdb) bt
#0  0x00000000030d41a0 in __static_initialization_and_destruction_0(int, int) [clone .constprop.0] ()
#1  0x00000000023c55d8 in ___start ()
#2  0x0000fffff17d0b10 in _rtld_start () from /usr/libexec/ld.elf_so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

I'll see if I can understand that, -g/-O0 etc.


On this system I have a bunch of other things built, all ok.


Home | Main Index | Thread Index | Old Index