tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

relevant panic() when combining lagg(4), vlan(4) and bridge(4)



Happy new year everyone!

In case that rings a bell to someone -- I found nothing in PR.

I recently updated one of my host to 10.1 (from 10.0) and took this opportunity to move from agr(4) to lagg(4). However this change manifested in quite rapid panic()s after boot due to locking error:

[ 26.1386375] current lwp : 8xfffffe747738d488
[ 26.1386375] owner field : 8xfffffe747738d480 wait/spin: 8/8

[ 26.1386375] panic: lock error: Mutex: mutex_vector_enter,548: locking against myself: lock 8xfffffe7478379888 cpu 8 lwp 8xfffffe747730d480
[ 26.1386375] cpuO: Begin traceback...
[ 26.1386375] vpanic() at netbsd:upanic+8x183
[ 26.1486372] panic() at netbsd:panic+8x3c
[ 26.1586367] lockdebug_abort() at netbsd: lockdebug_abort+8x114
[ 26.1586367] mutex_vector_enter() at netbsd:mutex_uector_enter+8x3Zb
[ 26.1686365] bridge_input() at netbsd:bridge_input+8x946
[ 26.1786362] vlan_input() at netbsd:ulan_input+8x143
[ 26.1786362] ether_input() at netbsd:ether_input+8x4c2
[ 26.1886361] bridge_input() at netbsd:bridge_input+8xal8
[ 26.1986358] lagg_input_ethernet() at netbsd: lagg_input_ethernet+8xZab
[ 26.2886358] if_percpuq_softint() at netbsd:if_percpuq_softint+8x8d
[ 26.2886358] softint_dispatch() at netbsd:softint_dispatch+8x95
[ 26.2186353] cpu0: End traceback...
[ 26.2186353] dumping to dev 18,17 (offset=8, size=8359657):
[ 26.2186353] dump device bad

[ 26.2186353] rebooting...

I am not knowledgeable enough in the netstack to figure out yet what locking mistake is at play, but FWIW it seems to be some difficulties between lagg(4), bridge(4) and vlan(4).

The configuration looks like so:

- lagg0 as a agregate of two PHYs (wm0 + wm1). It is a trunk where two "networks" are used (native and tagged ID 16); - lagg0 is part of bridge0 with many tap(4) to provide native connectivity to VMs running on the host;
    - a vlan(4) (vlan16) is bound to lagg0;
- vlan16 is attached to a separate bridge16, where a single tap is found to provide connectivity to that VLAN specifically for one VM.

# cat /etc/ifconfig.lagg0
create
!ifconfig wm0 up
!ifconfig wm1 up
laggproto lacp laggport wm0 laggport wm1

inet 192.168.1.2/24
inet 192.168.1.3/24 alias

# VM #0 (native network)
# cat ifconfig.tap0
create
!brconfig bridge0 add $int
up

# VM #7 (isolated network)
# cat ifconfig.tap7
create
!brconfig bridge16 add $int
up

# cat ifconfig.vlan16
create
vlan 16 vlanif lagg0
!brconfig bridge16 add $int
!brconfig bridge16 -learn $int
up

As it is a production host I cannot reproduce it "at will", but looking at its configuration I think it can be triggered with ease on a test bed (incoming).

Thanks,

--
jym@



Home | Main Index | Thread Index | Old Index