tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: relevant panic() when combining lagg(4), vlan(4) and bridge(4)



Hi Jean-Yves,

It looks like bug #57650 is relevant here. I had the same panic and ended up using this configuration successfully

# cat /etc/ifconfig.lagg0
!ifconfig wm0 up
!ifconfig wm1 up
create
laggproto lacp laggport wm0 laggport wm1
192.168.2.1 netmask 255.255.255.0
mtu 1500
up

Regards,
Sverre

> On Jan 6, 2025, at 12:26, Jean-Yves Migeon <jym%helkyn.org@localhost> wrote:
> 
> Happy new year everyone!
> 
> In case that rings a bell to someone -- I found nothing in PR.
> 
> I recently updated one of my host to 10.1 (from 10.0) and took this opportunity to move from agr(4) to lagg(4). However this change manifested in quite rapid panic()s after boot due to locking error:
> 
> [ 26.1386375] current lwp : 8xfffffe747738d488
> [ 26.1386375] owner field : 8xfffffe747738d480 wait/spin: 8/8
> 
> [ 26.1386375] panic: lock error: Mutex: mutex_vector_enter,548: locking against myself: lock 8xfffffe7478379888 cpu 8 lwp 8xfffffe747730d480
> [ 26.1386375] cpuO: Begin traceback...
> [ 26.1386375] vpanic() at netbsd:upanic+8x183
> [ 26.1486372] panic() at netbsd:panic+8x3c
> [ 26.1586367] lockdebug_abort() at netbsd: lockdebug_abort+8x114
> [ 26.1586367] mutex_vector_enter() at netbsd:mutex_uector_enter+8x3Zb
> [ 26.1686365] bridge_input() at netbsd:bridge_input+8x946
> [ 26.1786362] vlan_input() at netbsd:ulan_input+8x143
> [ 26.1786362] ether_input() at netbsd:ether_input+8x4c2
> [ 26.1886361] bridge_input() at netbsd:bridge_input+8xal8
> [ 26.1986358] lagg_input_ethernet() at netbsd: lagg_input_ethernet+8xZab
> [ 26.2886358] if_percpuq_softint() at netbsd:if_percpuq_softint+8x8d
> [ 26.2886358] softint_dispatch() at netbsd:softint_dispatch+8x95
> [ 26.2186353] cpu0: End traceback...
> [ 26.2186353] dumping to dev 18,17 (offset=8, size=8359657):
> [ 26.2186353] dump device bad
> 
> [ 26.2186353] rebooting...
> 
> I am not knowledgeable enough in the netstack to figure out yet what locking mistake is at play, but FWIW it seems to be some difficulties between lagg(4), bridge(4) and vlan(4).
> 
> The configuration looks like so:
> 
>    - lagg0 as a agregate of two PHYs (wm0 + wm1). It is a trunk where two "networks" are used (native and tagged ID 16);
>    - lagg0 is part of bridge0 with many tap(4) to provide native connectivity to VMs running on the host;
>    - a vlan(4) (vlan16) is bound to lagg0;
>    - vlan16 is attached to a separate bridge16, where a single tap is found to provide connectivity to that VLAN specifically for one VM.
> 
> # cat /etc/ifconfig.lagg0
> create
> !ifconfig wm0 up
> !ifconfig wm1 up
> laggproto lacp laggport wm0 laggport wm1
> 
> inet 192.168.1.2/24
> inet 192.168.1.3/24 alias
> 
> # VM #0 (native network)
> # cat ifconfig.tap0
> create
> !brconfig bridge0 add $int
> up
> 
> # VM #7 (isolated network)
> # cat ifconfig.tap7
> create
> !brconfig bridge16 add $int
> up
> 
> # cat ifconfig.vlan16
> create
> vlan 16 vlanif lagg0
> !brconfig bridge16 add $int
> !brconfig bridge16 -learn $int
> up
> 
> As it is a production host I cannot reproduce it "at will", but looking at its configuration I think it can be triggered with ease on a test bed (incoming).
> 
> Thanks,
> 
> -- 
> jym@
> 



Home | Main Index | Thread Index | Old Index