Port-xen archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: NetBSD/xen goes off the network - reproduceable
On Tue, Feb 14, 2012 at 11:14:35PM -0500, Brian Marcotte wrote:
> > For some time, my machines have had very occasional network problems
> > which I have not been able to diagnose or reproduce.
>
> I've been trying to debug this by by adding some debugging code to
> if_xennet_xenbus.c. I think I found some useful information here in the
> xennet_handler function:
>
> m->m_pkthdr.rcvif = ifp;
> #ifdef MYDEBUG
> printf("xennet: ...req_prod_pvt=%u, ...rsp_prod=%u\n",
> sc->sc_rx_ring.req_prod_pvt,sc->sc_rx_ring.sring->rsp_prod);
> #endif
> if (__predict_true(sc->sc_rx_ring.req_prod_pvt !=
> sc->sc_rx_ring.sring->rsp_prod)) {
> m->m_len = m->m_pkthdr.len = rx->status;
> MEXTADD(m, pktp, rx->status,
> M_DEVBUF, xennet_rx_mbuf_free, req);
> m->m_flags |= M_EXT_RW; /* we own the buffer */
> req->rxreq_gntref = GRANT_STACK_REF;
> } else {
>
> During normal operations the kernel prints:
>
> xennet: ...req_prod_pvt=2716, ...rsp_prod=2589
> xennet: ...req_prod_pvt=2716, ...rsp_prod=2589
> xennet: ...req_prod_pvt=2843, ...rsp_prod=2592
> xennet: ...req_prod_pvt=2843, ...rsp_prod=2592
>
> When the network problem is happening, it looks like this:
>
> xennet: ...req_prod_pvt=2843, ...rsp_prod=2843
> xennet: ...req_prod_pvt=2844, ...rsp_prod=2844
> xennet: ...req_prod_pvt=2845, ...rsp_prod=2845
> xennet: ...req_prod_pvt=2846, ...rsp_prod=2846
>
> Or, there is a difference in the numbers during normal operations and
> they are the same when the network problem is occuring.
>
> ----------------
>
> So, what is going on? It looks like the code is trying to avoid copying
> packets by keeping them in the ring when possible. If the ring is full,
> the code copies the packet and gives the receive buffer back to Xen.
>
> If I change the code to ALWAYS copy, my network problem never occurs,
> though presumably it is less efficient.
>
> I provoke this by sending small packets to an application which cannot
> receive them. The recv-q on the socket becomes full and then my network
> problem begins.
I guess most receive buffers ends up in the socket, but there should be still
one available to make progress. I guess there's a bug somewhere and this
one is not reused.
Can you see what happens in xennet_rx_mbuf_free especially for the
sc->sc_free_rxreql and SC_NLIVEREQ(sc) numbers ?
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index