tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Kernel locks when configuring motherboard ethernet
Hi, all,
On several amd64 systems which boot via UEFI, the first attempt to make
changes to the built-in Realtek ethernet port locks the kernel.
I observed this issue a while ago, but only on old (2014) AMD AM1
motherboards, so I thought it was a quirk of older UEFI. I noticed this
issue didn't happen if I configured BIOS settings to enable the network
stack, even though I don't turn on PXE for either IPv4 or IPv6. I figured
NetBSD isn't initializing the re* interface properly when booting via
UEFI, but the BIOS does, so problem circumvented.
However, I've recently set up some newer systems that've been showing the
same issue. They are Ryzen AM4 and AM5 systems with BIOSes that've been
updated within the last month or two.
The observed behavior is that if the network is very busy when the system
boots, a majority of the time the kernel will lock. If the network is
idle, the kernel won't lock up.
In the past, I've never been able to get in to the kernel debugger, since
the lockup prevents all keyboard activity, but I have a colocated system
with a serial console where I can drop in to the debugger.
Here's what I got after a lockup after trying to configure re0:
[ 30.8543400] fatal breakpoint trap in supervisor mode
[ 30.8543400] trap type 1 code 0 rip 0xffffffff80235385 cs 0x8 rflags 0x202 cr2 0x7f7ede210ff8 ilevel 0x8 rsp 0xffffc48839ae4be8
[ 30.8543400] curlwp 0xffff8fdc02285100 pid 0.11 lowest kstack 0xffffc48839ae02c0
Stopped in pid 0.11 (system) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x86d
intr_kdtrace_wrapper() at netbsd:intr_kdtrace_wrapper+0x26
Xhandle_ioapic_edge1() at netbsd:Xhandle_ioapic_edge1+0x75
--- interrupt ---
bus_space_read_2() at netbsd:bus_space_read_2+0xb
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x37
Xhandle_ioapic_edge18() at netbsd:Xhandle_ioapic_edge18+0x75
--- interrupt ---
_kernel_lock() at netbsd:_kernel_lock+0xca
if_link_state_change_work() at netbsd:if_link_state_change_work+0x1e
workqueue_worker() at netbsd:workqueue_worker+0xd9
ds 5c20
es 4c18
fs 524
gs 5230
rdi ffffffff81845120 x86_io
rsi 3f8
rbp ffffc48839ae4be8
rbx ffffc4803df8c006
rdx 1
rcx 100
rax 7f
r8 0
r9 0
r10 ffffc4884f9f8eec
r11 ffffc4884f9f8ee8
r12 ffff8fd5096a1790
r13 7fd
r14 c6
r15 ffff8fd5096a16c0
rip ffffffff80235385 breakpoint+0x5
cs 8
rflags 202
rsp ffffc48839ae4be8
ss 10
netbsd:breakpoint+0x5: leave
Also, I tried a kernel with LOCKDEBUG and it paniced before finishing boot:
Configuring network interfaces: re0[ 22.9855318] cpu0[564 sh]: hogging kernel lock
[ 22.9855318] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x56
[ 22.9855318] ipi_cpu_handler() at netbsd:ipi_cpu_handler+0x70
[ 22.9855318] x86_ipi_handler() at netbsd:x86_ipi_handler+0x6f
[ 22.9855318] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x18
[ 22.9855318] --- interrupt ---
[ 22.9855318] Xspllower() at netbsd:Xspllower+0xe
[ 22.9855318] Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+0x1e
[ 22.9855318] --- interrupt ---
[ 22.9855318] bus_space_read_2() at netbsd:bus_space_read_2+0xb
[ 22.9855318] intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x37
[ 22.9855318] Kernel lock error: _kernel_lock,266: spinout
[ 23.7255334] lock address : netbsd:kernel_lock
[ 23.7755332] type : spin
[ 23.8155330] initialized : netbsd:main+0x31
[ 23.8655316] shared holds : 0 exclusive: 1
[ 23.9555319] shares wanted: 0 exclusive: 1
[ 24.0355338] relevant cpu : 1 last held: 0
[ 24.1155337] relevant lwp : 0xfffff8a0cd91e680 last held: 0xfffff8a0d003fac0
[ 24.2055344] last locked* : netbsd:intr_biglock_wrapper+0x15
[ 24.2655342] unlocked : netbsd:softint_dispatch+0x186
[ 24.3355336] curcpu holds : 0 wanted by: 0xfffff8a0cd91e680
[ 24.4155346] panic: LOCKDEBUG: Kernel lock error: _kernel_lock,266: spinout
[ 24.4955330] cpu1: Begin traceback...
[ 24.5455334] vpanic() at netbsd:vpanic+0x183
[ 24.5955350] panic() at netbsd:panic+0x3c
[ 24.6455370] lockdebug_abort1() at netbsd:lockdebug_abort1+0xe6
[ 24.7055350] _kernel_lock() at netbsd:_kernel_lock+0x2a7
[ 24.7755337] softint_dispatch() at netbsd:softint_dispatch+0x16d
[ 24.8455340] DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffff9a0839ba30f0
[ 24.9255357] Xsoftintr() at netbsd:Xsoftintr+0x4c
[ 24.9855339] --- interrupt ---
[ 25.0155356] 0:
[ 25.0355354] cpu1: End traceback...
[ 25.0855358] dumping to dev 168,2 (offset=68267703, size=8243720):
[ 25.1555346] dump 1262 1261 1260 1259 1258 1257 1256 1255 1254 1253 1252 1251 1250 1249 1248 1247 1246 1245 1244 1243 1242 1241
...
Since using the LOCKDEBUG kernel, this system can't use the network at all
without locking up, even after a hardware reset. It's colocated, so while
I can have someone physically power cycle the machine, I figured I'd leave
it in case more information can be gained from it as it is.
The serial console can be accessed via another system via cu, and the
other system can also do a hardware reset. The system obviously can't talk
on the Internet, but it has netbsd-10 sources and can compile a kernel for
itself.
The previous kernel that has been running for a couple of weeks had locked
up twice, and I don't know if that's directly related to this, because it
had nothing to do with configuring network ports. Interestingly, I've seen
the same lockups with the previous machine that this machine replaced
(8 gig Raspberry Pi 4, netbsd-10). These machines are public facing and
are routing parts of a class C over tinc tunnels.
Here's one lockup:
[ 495715.4076245] fatal breakpoint trap in supervisor mode
[ 495715.4076245] trap type 1 code 0 rip 0xffffffff80235385 cs 0x8 rflags 0x202 cr2 0x76f4a20740
00 ilevel 0x8 rsp 0xffffa80839aac8c8
[ 495715.4076245] curlwp 0xffffa0ed91107480 pid 0.3 lowest kstack 0xffffa80839aa82c0
Stopped in pid 0.3 (system) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x7e0
intr_kdtrace_wrapper() at netbsd:intr_kdtrace_wrapper+0x26
Xhandle_ioapic_edge1() at netbsd:Xhandle_ioapic_edge1+0x75
--- interrupt ---
npf_tcpsaw() at netbsd:npf_tcpsaw+0x1d
npf_conn_inspect() at netbsd:npf_conn_inspect+0x86
npfk_packet_handler() at netbsd:npfk_packet_handler+0x18e
pfil_run_hooks() at netbsd:pfil_run_hooks+0x128
ip_output() at netbsd:ip_output+0x4c0
ip_forward() at netbsd:ip_forward+0x138
ipintr() at netbsd:ipintr+0xa80
softint_dispatch() at netbsd:softint_dispatch+0x95
DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffffa80839aad0f0
Xsoftintr() at netbsd:Xsoftintr+0x4c
--- interrupt ---
b31c059c10208e97:
ds c9a0
es ddb3
fs 1
gs e8d9
rdi ffffffff81845120 x86_io
rsi 800
rbp ffffa80839aac8c8
rbx ffffa8003df8c01c
rdx 7f
rcx 22
rax 1
r8 ffffa80839aaca94
r9 0
r10 5ed7b6ca02a0
r11 ffffa8003df91008
r12 ffffa0e6944a1790
r13 800
r14 cc
r15 ffffa0e6944a16c0
rip ffffffff80235385 breakpoint+0x5
cs 8
rflags 202
rsp ffffa80839aac8c8
ss 0
netbsd:breakpoint+0x5: leave
Does anyone have any suggestions about what to try next? Does anyone want
to have a look around themselves?
Thanks,
John Klos
Home |
Main Index |
Thread Index |
Old Index