Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD/xen goes off the network - reproduceable



On Fri, Feb 17, 2012 at 10:26:42PM -0500, Brian Marcotte wrote:
> > I guess most receive buffers ends up in the socket, but there should be 
> > still
> > one available to make progress. I guess there's a bug somewhere and this
> > one is not reused.
> > Can you see what happens in xennet_rx_mbuf_free especially for the
> > sc->sc_free_rxreql and SC_NLIVEREQ(sc) numbers ?
> 
> For this test, I'm printing those values right at the start of
> xennet_rx_mbuf_free. Also, in xennet_handler I'm printing the values of
> "i" and sc->sc_free_rxreql when it enters the code where it is about to
> do a copy.
> 
> A complete console log is available here:
> 
>       http://www.panix.com/~marcotte/consolelog.txt
> 
> Here is a summary. The most interesting part is probably at the bottom
> where the network stops completely.
> 
> Thanks.
> 
> ------------
> 
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=251
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=251
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=251
> xennet_rx_mbuf_free: sc->sc_free_rxreql=3 SC_NLIVEREQ(sc)=251
> xennet_rx_mbuf_free: sc->sc_free_rxreql=4 SC_NLIVEREQ(sc)=251
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=47 SC_NLIVEREQ(sc)=206
> xennet_rx_mbuf_free: sc->sc_free_rxreql=48 SC_NLIVEREQ(sc)=206
> xennet_rx_mbuf_free: sc->sc_free_rxreql=49 SC_NLIVEREQ(sc)=206
> # mount /
> # /etc/rc.d/network start
> xennet_rx_mbuf_free: sc->sc_free_rxreql=50 SC_NLIVEREQ(sc)=188
> xennet_rx_mbuf_free: sc->sc_free_rxreql=51 SC_NLIVEREQ(sc)=188
> xennet_rx_mbuf_free: sc->sc_free_rxreql=52 SC_NLIVEREQ(sc)=188
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=65 SC_NLIVEREQ(sc)=188
> xennet_rx_mbuf_free: sc->sc_free_rxreql=66 SC_NLIVEREQ(sc)=188
> xennet_rx_mbuf_free: sc->sc_free_rxreql=67 SC_NLIVEREQ(sc)=188
> Starting network.
> Hostname: mail3.panix.com
> IPv6 mode: host
> xennet_rx_mbuf_free: sc->sc_free_rxreql=68 SC_NLIVEREQ(sc)=187
> Configuring network interfaces: xennet0.
> Adding interface aliases:.
> add net default: gateway 166.84.1.65
> xennet_rx_mbuf_free: sc->sc_free_rxreql=69 SC_NLIVEREQ(sc)=186
> xennet_rx_mbuf_free: sc->sc_free_rxreql=70 SC_NLIVEREQ(sc)=185
> xennet_rx_mbuf_free: sc->sc_free_rxreql=71 SC_NLIVEREQ(sc)=181
> ...
> # /etc/rc.d/sshd start
> [ log in remotely and start test with telnet]
> xennet_rx_mbuf_free: sc->sc_free_rxreql=42 SC_NLIVEREQ(sc)=213
> xennet_rx_mbuf_free: sc->sc_free_rxreql=43 SC_NLIVEREQ(sc)=212
> xennet_rx_mbuf_free: sc->sc_free_rxreql=44 SC_NLIVEREQ(sc)=211
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=110 SC_NLIVEREQ(sc)=134
> xennet_rx_mbuf_free: sc->sc_free_rxreql=111 SC_NLIVEREQ(sc)=132
> xennet_rx_mbuf_free: sc->sc_free_rxreql=112 SC_NLIVEREQ(sc)=132
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=124 SC_NLIVEREQ(sc)=131
> xennet_rx_mbuf_free: sc->sc_free_rxreql=125 SC_NLIVEREQ(sc)=130
> xennet_rx_mbuf_free: sc->sc_free_rxreql=126 SC_NLIVEREQ(sc)=128
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=252
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=249
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=248
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=37 SC_NLIVEREQ(sc)=113
> xennet_rx_mbuf_free: sc->sc_free_rxreql=38 SC_NLIVEREQ(sc)=108
> xennet_rx_mbuf_free: sc->sc_free_rxreql=39 SC_NLIVEREQ(sc)=103
> xennet_rx_mbuf_free: sc->sc_free_rxreql=40 SC_NLIVEREQ(sc)=97
> xennet_rx_mbuf_free: sc->sc_free_rxreql=41 SC_NLIVEREQ(sc)=96
> xennet_rx_mbuf_free: sc->sc_free_rxreql=42 SC_NLIVEREQ(sc)=96
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=135
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=130
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=128
> # netstat -f inet -n
> Active Internet connections
> Proto Recv-Q Send-Q  Local Address          Foreign Address        State
> tcp     6800      0  166.84.1.74.65534      166.84.1.3.23          ESTABLISHED
> tcp        0      0  166.84.1.74.22         166.84.1.253.607       ESTABLISHED
> [ Recv-Q starting to fill up ]
> xennet_rx_mbuf_free: sc->sc_free_rxreql=3 SC_NLIVEREQ(sc)=126
> xennet_rx_mbuf_free: sc->sc_free_rxreql=4 SC_NLIVEREQ(sc)=126
> xennet_rx_mbuf_free: sc->sc_free_rxreql=5 SC_NLIVEREQ(sc)=123
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=10 SC_NLIVEREQ(sc)=14
> xennet_rx_mbuf_free: sc->sc_free_rxreql=11 SC_NLIVEREQ(sc)=14
> xennet_rx_mbuf_free: sc->sc_free_rxreql=12 SC_NLIVEREQ(sc)=10
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=20
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=16
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=13
> xennet_rx_mbuf_free: sc->sc_free_rxreql=3 SC_NLIVEREQ(sc)=8
> xennet_rx_mbuf_free: sc->sc_free_rxreql=4 SC_NLIVEREQ(sc)=8
> xennet_rx_mbuf_free: sc->sc_free_rxreql=5 SC_NLIVEREQ(sc)=8
> xennet_rx_mbuf_free: sc->sc_free_rxreql=6 SC_NLIVEREQ(sc)=6
> 
> Active Internet connections
> Proto Recv-Q Send-Q  Local Address          Foreign Address        State
> tcp    13220      0  166.84.1.74.65534      166.84.1.3.23          ESTABLISHED
> tcp        0      0  166.84.1.74.22         166.84.1.253.607       ESTABLISHED
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=10
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=6
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=3
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=2
> xennet_handler: copying packet: i=901 free_rxreql=1
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=0

At this point, there's no space in the ring to receive new packets

> xennet_handler: copying packet: i=902 free_rxreql=0
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=0
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=1

but there is again one slot in the ring; this should be enough to make
limited progress.

> #
> [ XXXXX network has stopped completely XXXXX ]
> # 
> # 
> # 
> # 
> # 
> # 
> # 
> # 
> # pkill telnet
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=2
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=2
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=4
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=4
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=4
> xennet_rx_mbuf_free: sc->sc_free_rxreql=3 SC_NLIVEREQ(sc)=4
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=8
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=8
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=8
> xennet_rx_mbuf_free: sc->sc_free_rxreql=3 SC_NLIVEREQ(sc)=8
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=30 SC_NLIVEREQ(sc)=32
> xennet_rx_mbuf_free: sc->sc_free_rxreql=31 SC_NLIVEREQ(sc)=32
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=64
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=64
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=60 SC_NLIVEREQ(sc)=195
> xennet_rx_mbuf_free: sc->sc_free_rxreql=61 SC_NLIVEREQ(sc)=194

And at this point, the network has not restarted ?

When this happen, can you check the flags of the correponding
xvif interface in the backend ?
Can you give details about your dom0 ?

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index