tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Looking to address two networking issues with NetBSD 10



Hi,

We still have two large issues with networking with NetBSD 10. One is that dhcpcd exits and needs to be restarted manually or programmatically when it does. This issue was discussed here:

https://mail-index.netbsd.org/tech-net/2024/03/18/msg008729.html

It seems dhcpcd exiting corresponds to getting a new lease / IP (it doesn't exit on networks where the address never (or rarely) changes).

Another is that I can reliably panic or lock up aarch64 and amd64 machines that run npf while routing a /24 with the most trivial configuration. This was discussed here:

https://mail-index.netbsd.org/tech-net/2023/10/12/msg008636.html

Aside from comments, this is the entirety of npf.conf, and /etc/npf_block has 189 entries:

$upstream_if		= "wm0"
$upstream_ip		= { 173.198.243.126 }
$upstream_lan		= { 173.198.243.124/30 }
table <block> type lpm file "/etc/npf_block"
procedure "log" {
	log: npflog0
}
group "external" on $upstream_if {
	ruleset "blocklistd"
	block in final from <block>
}
group default {
	pass final on lo0 all
	pass in final all
	pass out final all
}

While the issue initially was happening with a Raspberry Pi 4, I moved to an amd64 system, first with motherboard re0, but I wanted to make sure there were no issues related to this:

https://mail-index.netbsd.org/tech-kern/2024/01/27/msg029463.html

So I added an Intel wm0 interface via PCIe. The problems continued.

Taylor's improvements caused issues to happen less often, but they still happen with regularity (after around two weeks or so).

After the last lockup (July, 2024) with the configuration above, I disabled npf and the system has been fine for the last three months.

Some panics, in approximately chronological order:

https://www.klos.com/~john/panics/1.txt

With LOCKDEBUG, on amd64:

https://www.klos.com/~john/panics/2.txt
https://www.klos.com/~john/panics/3.txt
https://www.klos.com/~john/panics/4.txt

https://www.klos.com/~john/panics/5.txt
https://www.klos.com/~john/panics/6.txt
https://www.klos.com/~john/panics/7.txt

Also LOCKDEBUG:

https://www.klos.com/~john/panics/8.txt

After running tcpdump:

https://www.klos.com/~john/panics/9.txt

After switching to wm0:

https://www.klos.com/~john/panics/10.txt
https://www.klos.com/~john/panics/11.txt
https://www.klos.com/~john/panics/12.txt
https://www.klos.com/~john/panics/13.txt

After this, I set npf=NO and haven't had any issues since.

What can we do to address this? I've offered to make the machine available via serial console when it's in the frozen state, because I'm not sure what else I should do.

Thanks,
John Klos


Home | Main Index | Thread Index | Old Index