Port-xen archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Dom0 xvif mbuf issues
On Thu, 27 Sep 2018 13:13:27 +0200
Manuel Bouyer <bouyer%antioche.eu.org@localhost> wrote:
> On Wed, Sep 26, 2018 at 01:14:40PM -0700, Harry Waddell wrote:
> >
> > I have a server where Dom0 started becoming unusable as of a few months ago
> > where previously it ran for years with few issues.
> >
> > netbsd-7 branch, never more than a month behind.
> > BRIDGE_IPF is enabled and these options set in sysctl.conf:
> >
> > kern.sbmax=1048576
> > net.inet.tcp.recvbuf_max=1048576
> > net.inet.tcp.sendbuf_max=1048576
> > kern.mbuf.nmbclusters=300000
> > kern.maxfiles=3000
> >
> > Xen 4.8.3 similarly updated.
> >
> > One of the xvif devices "could not allocate a new mbuf". I enabled MBUF debugging
> > and netstat didn't seem to point to a leak on any of the devices.
>
> Looks like temporary memory shortage in the dom0 (this is a MGETHDR failing,
> not MCLGET, so the nmbclusters limit is not relevant).
> How many mbufs were allocated ?
>
At the time of the hang, I have no idea.
It's around 512 whenever I check.
[root@xen-09:conf]> netstat -m
515 mbufs in use:
513 mbufs allocated to data
2 mbufs allocated to packet headers
0 calls to protocol drain routines
>
> > It hung again, but with a new
> > error scrolling on the console. "xennetback: got only 63 new mcl pages"
>
> This would point to a memory shortage in the hypervisor itself.
> Do you have enough free memory (xl info) ?
>
total_memory : 131037
free_memory : 26601
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
> >
> > My suspicion is that either one of the guests started doing a lot more nfs activity OR
> > that a VM I created which uses a fuse filesystem to move large dumpfile to azure blob
> > storage may be what pushed this previously working system off the edge.
> >
> > I'm moving the azure fuse system to another server and plan to disable ipf on the bridge.
>
> ipf shouldn't be a problem, I'm using it extensively on bridges here.
>
Good. Just grasping at straws.
> >
> > Beyond that, and any suggestions? Should I just upgrade to netbsd 8 and/or xen 4.11?
> > ( even if it's just to make debugging easier since this is where current work is taking place? )
>
> I'm not sure it would change something
>
> >
> > This is a production system with about 30 guests. I just want it to work like it used to.
>
> how many vifs is there in the dom0 ?
>
I expect this is not an ideal way to do this but ...
(for i in `xl list | awk '{print $1}'`;do xl network-list $i | grep vif ;done) | wc -l
57
Several of the systems are part of a cluster where hosts are multihomed on 2 of 4 networks
to test a customer setup. Most of my systems have < 30, except for one other with 42.
The others don't hang like this one does.
> --
> Manuel Bouyer <bouyer%antioche.eu.org@localhost>
> NetBSD: 26 ans d'experience feront toujours la difference
> --
Thanks for the followup. Answers inline above.
HW
Home |
Main Index |
Thread Index |
Old Index