bin/50439: rpcbind follies with nis down

To: gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: bin/50439: rpcbind follies with nis down
From: dholland%netbsd.org@localhost
Date: Tue, 17 Nov 2015 09:40:00 +0000 (UTC)

>Number:         50439
>Category:       bin
>Synopsis:       rpcbind follies with nis down
>Confidential:   no
>Severity:       critical
>Priority:       low
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Nov 17 09:40:00 +0000 2015
>Originator:     David A. Holland
>Release:        NetBSD 7.99.20 (20150727)
>Organization:
>Environment:
System: NetBSD macaran 7.99.20 NetBSD 7.99.20 (MACARAN) #30: Mon Jul 27 20:25:15 EDT 2015  dholland@macaran:/usr/src/sys/arch/amd64/compile/MACARAN amd64
Architecture: x86_64
Machine: amd64
>Description:

	Now that ypbind has been fixed to not explode the world when
	the network goes down, it seems that rpcbind takes over
	responsibility.

	When the NIS server goes down, the libc NIS code contacts
	rpcbind, producing this message:

Nov 13 19:00:00 macaran rpcbind: connect from 127.0.0.1 to getport/addr(ypbind)

	Each time this happens it seems to produce another fork of
	rpcbind. In the course of a ~1h30 network downtime a couple
	days ago, process accounting logged 1449403 rpcbind processes
	exiting. This (and/or possibly related phenomena occurring in
	the libc NIS code) was sufficient to run through 12G of ram
	and swap and then OOM. This took out the X server of course
	and thus I don't have as much information as I'd like about
	what actually happened.

>How-To-Repeat:

	Be using NIS; disconnect the network with a lot of stuff
	running.

>Fix:

	rpcbind apparently forks every time it wants to log a message.
	This is silly; it shouldn't need to fork more than once
	overall.

	However, I think the real problem lies in the libc NIS code; I
	think it is probably doing something stupid that leads it to
	blast rpcbind unnecessarily. I had a fair amount of stuff
	running when the network went plop, but not 1.4 million
	processes or even 14,000.

	Unfortunately, nuking NIS from orbit isn't an option.

Prev by Date: PR/50367 CVS commit: [netbsd-6] src/lib/libc/net
Next by Date: lib/50441: db(3) doesn't work with 64k pagesize
Previous by Thread: PR/50367 CVS commit: [netbsd-6] src/lib/libc/net
Next by Thread: lib/50441: db(3) doesn't work with 64k pagesize
Indexes:

Home | Main Index | Thread Index | Old Index