Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD/xen goes off the network - reproduceable



Hi Brian,

On Feb 18, 2012, at 04:26 , Brian Marcotte wrote:

>> I guess most receive buffers ends up in the socket, but there should be still
>> one available to make progress. I guess there's a bug somewhere and this
>> one is not reused.
>> Can you see what happens in xennet_rx_mbuf_free especially for the
>> sc->sc_free_rxreql and SC_NLIVEREQ(sc) numbers ?
> 
> For this test, I'm printing those values right at the start of
> xennet_rx_mbuf_free. Also, in xennet_handler I'm printing the values of
> "i" and sc->sc_free_rxreql when it enters the code where it is about to
> do a copy.
> 
> A complete console log is available here:
> 
>       http://www.panix.com/~marcotte/consolelog.txt
> 
> Here is a summary. The most interesting part is probably at the bottom
> where the network stops completely.

Have you noticed any messages about "mclpool limit reached: increase 
NMBCLUSTERS"?

I have had the same problem (NetBSD/Xen servers that completely stop 
communicating over the network) on four different DOM0 servers now. In MOST 
(but not all) cases it seems that this is accompanied by the error above. I 
should also note that in my case I also lose the console and only a power cycle 
will help.

I've addressed this the obvious way by increasing the number of NMBCLUSTERS to 
16384 (from 2048) and sofar that seems to do the trick.

I don't know whether we're having the same problem, but it seems close enough 
that I at least wanted to say "me too" ;-)

Regards,

Johan

PS. Oh, BTW, In my case all the servers are running 5.1_STABLE. This is a 
concern by itself, because I the first time I saw this was in September, before 
that I'd been running 5.1_STABLE and 5.0_STABLE for a very long time with no 
issue like this.

> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=251
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=251
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=251
> xennet_rx_mbuf_free: sc->sc_free_rxreql=3 SC_NLIVEREQ(sc)=251
> xennet_rx_mbuf_free: sc->sc_free_rxreql=4 SC_NLIVEREQ(sc)=251
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=47 SC_NLIVEREQ(sc)=206
> xennet_rx_mbuf_free: sc->sc_free_rxreql=48 SC_NLIVEREQ(sc)=206
> xennet_rx_mbuf_free: sc->sc_free_rxreql=49 SC_NLIVEREQ(sc)=206
> # mount /
> # /etc/rc.d/network start
> xennet_rx_mbuf_free: sc->sc_free_rxreql=50 SC_NLIVEREQ(sc)=188
> xennet_rx_mbuf_free: sc->sc_free_rxreql=51 SC_NLIVEREQ(sc)=188
> xennet_rx_mbuf_free: sc->sc_free_rxreql=52 SC_NLIVEREQ(sc)=188
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=65 SC_NLIVEREQ(sc)=188
> xennet_rx_mbuf_free: sc->sc_free_rxreql=66 SC_NLIVEREQ(sc)=188
> xennet_rx_mbuf_free: sc->sc_free_rxreql=67 SC_NLIVEREQ(sc)=188
> Starting network.
> Hostname: mail3.panix.com
> IPv6 mode: host
> xennet_rx_mbuf_free: sc->sc_free_rxreql=68 SC_NLIVEREQ(sc)=187
> Configuring network interfaces: xennet0.
> Adding interface aliases:.
> add net default: gateway 166.84.1.65
> xennet_rx_mbuf_free: sc->sc_free_rxreql=69 SC_NLIVEREQ(sc)=186
> xennet_rx_mbuf_free: sc->sc_free_rxreql=70 SC_NLIVEREQ(sc)=185
> xennet_rx_mbuf_free: sc->sc_free_rxreql=71 SC_NLIVEREQ(sc)=181
> ...
> # /etc/rc.d/sshd start
> [ log in remotely and start test with telnet]
> xennet_rx_mbuf_free: sc->sc_free_rxreql=42 SC_NLIVEREQ(sc)=213
> xennet_rx_mbuf_free: sc->sc_free_rxreql=43 SC_NLIVEREQ(sc)=212
> xennet_rx_mbuf_free: sc->sc_free_rxreql=44 SC_NLIVEREQ(sc)=211
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=110 SC_NLIVEREQ(sc)=134
> xennet_rx_mbuf_free: sc->sc_free_rxreql=111 SC_NLIVEREQ(sc)=132
> xennet_rx_mbuf_free: sc->sc_free_rxreql=112 SC_NLIVEREQ(sc)=132
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=124 SC_NLIVEREQ(sc)=131
> xennet_rx_mbuf_free: sc->sc_free_rxreql=125 SC_NLIVEREQ(sc)=130
> xennet_rx_mbuf_free: sc->sc_free_rxreql=126 SC_NLIVEREQ(sc)=128
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=252
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=249
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=248
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=37 SC_NLIVEREQ(sc)=113
> xennet_rx_mbuf_free: sc->sc_free_rxreql=38 SC_NLIVEREQ(sc)=108
> xennet_rx_mbuf_free: sc->sc_free_rxreql=39 SC_NLIVEREQ(sc)=103
> xennet_rx_mbuf_free: sc->sc_free_rxreql=40 SC_NLIVEREQ(sc)=97
> xennet_rx_mbuf_free: sc->sc_free_rxreql=41 SC_NLIVEREQ(sc)=96
> xennet_rx_mbuf_free: sc->sc_free_rxreql=42 SC_NLIVEREQ(sc)=96
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=135
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=130
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=128
> # netstat -f inet -n
> Active Internet connections
> Proto Recv-Q Send-Q  Local Address          Foreign Address        State
> tcp     6800      0  166.84.1.74.65534      166.84.1.3.23          ESTABLISHED
> tcp        0      0  166.84.1.74.22         166.84.1.253.607       ESTABLISHED
> [ Recv-Q starting to fill up ]
> xennet_rx_mbuf_free: sc->sc_free_rxreql=3 SC_NLIVEREQ(sc)=126
> xennet_rx_mbuf_free: sc->sc_free_rxreql=4 SC_NLIVEREQ(sc)=126
> xennet_rx_mbuf_free: sc->sc_free_rxreql=5 SC_NLIVEREQ(sc)=123
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=10 SC_NLIVEREQ(sc)=14
> xennet_rx_mbuf_free: sc->sc_free_rxreql=11 SC_NLIVEREQ(sc)=14
> xennet_rx_mbuf_free: sc->sc_free_rxreql=12 SC_NLIVEREQ(sc)=10
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=20
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=16
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=13
> xennet_rx_mbuf_free: sc->sc_free_rxreql=3 SC_NLIVEREQ(sc)=8
> xennet_rx_mbuf_free: sc->sc_free_rxreql=4 SC_NLIVEREQ(sc)=8
> xennet_rx_mbuf_free: sc->sc_free_rxreql=5 SC_NLIVEREQ(sc)=8
> xennet_rx_mbuf_free: sc->sc_free_rxreql=6 SC_NLIVEREQ(sc)=6
> 
> Active Internet connections
> Proto Recv-Q Send-Q  Local Address          Foreign Address        State
> tcp    13220      0  166.84.1.74.65534      166.84.1.3.23          ESTABLISHED
> tcp        0      0  166.84.1.74.22         166.84.1.253.607       ESTABLISHED
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=10
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=6
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=3
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=2
> xennet_handler: copying packet: i=901 free_rxreql=1
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=0
> xennet_handler: copying packet: i=902 free_rxreql=0
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=0
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=1
> #
> [ XXXXX network has stopped completely XXXXX ]
> # 
> # 
> # 
> # 
> # 
> # 
> # 
> # 
> # pkill telnet
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=2
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=2
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=4
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=4
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=4
> xennet_rx_mbuf_free: sc->sc_free_rxreql=3 SC_NLIVEREQ(sc)=4
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=8
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=8
> xennet_rx_mbuf_free: sc->sc_free_rxreql=2 SC_NLIVEREQ(sc)=8
> xennet_rx_mbuf_free: sc->sc_free_rxreql=3 SC_NLIVEREQ(sc)=8
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=30 SC_NLIVEREQ(sc)=32
> xennet_rx_mbuf_free: sc->sc_free_rxreql=31 SC_NLIVEREQ(sc)=32
> xennet_rx_mbuf_free: sc->sc_free_rxreql=0 SC_NLIVEREQ(sc)=64
> xennet_rx_mbuf_free: sc->sc_free_rxreql=1 SC_NLIVEREQ(sc)=64
> ...
> xennet_rx_mbuf_free: sc->sc_free_rxreql=60 SC_NLIVEREQ(sc)=195
> xennet_rx_mbuf_free: sc->sc_free_rxreql=61 SC_NLIVEREQ(sc)=194
> Feb 17 22:13:39 poweroff: powered off by root
> xennet_rx_mbuf_free: sc->sc_free_rxreql=62 SC_NLIVEREQ(sc)=193
> ...
> syncing disks... done
> unmounting file systems...
> unmounting / (/dev/xbd0a)... done
> 



Home | Main Index | Thread Index | Old Index