Subject: lib/6379: RPC "backlog" isn't big enough
To: None <gnats-bugs@gnats.netbsd.org>
From: C Kane <ckane@best.com>
List: netbsd-bugs
Date: 10/30/1998 00:20:02
>Number: 6379
>Category: lib
>Synopsis: RPC "backlog" isn't big enough
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: lib-bug-people (Library Bug People)
>State: open
>Class: change-request
>Submitter-Id: net
>Arrival-Date: Fri Oct 30 00:35:01 1998
>Last-Modified:
>Originator: C Kane
>Organization:
>Release: NetBSD-current, last update occurred Thu Oct 29 05:04:20 1998
>Environment:
System: NetBSD ckane5 1.3H NetBSD 1.3H (ckane5) #2: Mon Oct 19 21:52:45 PDT 1998 root@ckane5:/usr/netbsd-current/src/sys/arch/i386/compile/ckane5 i386
>Description:
While stress-testing the NIS system, I get errors like this:
ypcat: no such map group.byname. Reason: RPC failure
>How-To-Repeat:
We have a large environment where many systems might simultaneously
be attempting to do "ypcat group" (as when someone logs in or
when cron kicks off jobs on multiple systems at the same time).
To simulate this load, I've tried running a test like this:
for i in 1 2 3 4 5 6 7 8 9 10
do
ypcat group | wc -l &
done
We get failures with as few as 20 simultaneous jobs.
The group map is large: `ypcat group | wc` gives: 376 376 67173
The problem can be tracked into the rpc libraries that ypcat uses.
When the library routine attempts to "connect" a socket to portmap,
it fails with errno ECONNREFUSED.
I believe the reason for this failure is because portmap is using
standard libc services to open it's listening socket, with a backlog
of only two.
I edited /usr/src/lib/libc/rpc/svc_tcp.c, line 168, from:
(listen(sock, 2) != 0)) {
to
(listen(sock, 25) != 0)) {
This results in much better performance.
While all the jobs don't run simultaneously because some finish
before they're all started, I've tried to start up to 140 jobs
at once, and gotten no failures.
Why is this value set to 2 and what problems might there be by
setting it higher? For real production work in my environment,
I think I'd want to change the '25' to something even higher
like '256'. I'd prefer no "ECONNREFUSED" errors at all, within
reason.
>Fix:
A possible fix is given above.
>Audit-Trail:
>Unformatted: