NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
bin/50439: rpcbind follies with nis down
>Number: 50439
>Category: bin
>Synopsis: rpcbind follies with nis down
>Confidential: no
>Severity: critical
>Priority: low
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Nov 17 09:40:00 +0000 2015
>Originator: David A. Holland
>Release: NetBSD 7.99.20 (20150727)
>Organization:
>Environment:
System: NetBSD macaran 7.99.20 NetBSD 7.99.20 (MACARAN) #30: Mon Jul 27 20:25:15 EDT 2015 dholland@macaran:/usr/src/sys/arch/amd64/compile/MACARAN amd64
Architecture: x86_64
Machine: amd64
>Description:
Now that ypbind has been fixed to not explode the world when
the network goes down, it seems that rpcbind takes over
responsibility.
When the NIS server goes down, the libc NIS code contacts
rpcbind, producing this message:
Nov 13 19:00:00 macaran rpcbind: connect from 127.0.0.1 to getport/addr(ypbind)
Each time this happens it seems to produce another fork of
rpcbind. In the course of a ~1h30 network downtime a couple
days ago, process accounting logged 1449403 rpcbind processes
exiting. This (and/or possibly related phenomena occurring in
the libc NIS code) was sufficient to run through 12G of ram
and swap and then OOM. This took out the X server of course
and thus I don't have as much information as I'd like about
what actually happened.
>How-To-Repeat:
Be using NIS; disconnect the network with a lot of stuff
running.
>Fix:
rpcbind apparently forks every time it wants to log a message.
This is silly; it shouldn't need to fork more than once
overall.
However, I think the real problem lies in the libc NIS code; I
think it is probably doing something stupid that leads it to
blast rpcbind unnecessarily. I had a fair amount of stuff
running when the network went plop, but not 1.4 million
processes or even 14,000.
Unfortunately, nuking NIS from orbit isn't an option.
Home |
Main Index |
Thread Index |
Old Index