NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/48586: Kern complains proc table full even when it is is not
>Number: 48586
>Category: kern
>Synopsis: The kernel complais that proc table is full even when it is not
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Feb 10 16:05:00 +0000 2014
>Originator: Tero Kivinen
>Release: NetBSD 6.1_STABLE
>Organization:
>Environment:
System: NetBSD 8 12:56:43 EET 2014
root%haste.i.kivinen.iki.fi@localhost:/usr/obj/sys/arch/amd64/compile/HASTE
amd64
Architecture: x86_64
Machine: amd64
The HASTE kernel is GENERIC with larger MAXDSIZE
include "arch/amd64/conf/GENERIC"
options MAXDSIZ=34359738368
The problem occurred with GENERIC kernel too, so that change should
not cause it.
>Description:
I am creating garmin maps using the java tools (osmosis,
splitter, mkgmap). The java tools use sun-jre7-7.0.45 from
pkgsrc. After 15 or so hours of continously running scripts
the scripts start to complain:
...
Map: -110.0..-150.0 -30..0 1300 sa-w-c-ele.img South America
West Central
/m/smbkivinen/garmin/bin/gen-el-map.sh: Cannot fork
/m/smbkivinen/garmin/bin/gen-el-map.sh: Cannot fork
/m/smbkivinen/garmin/bin/gen-el-map.sh: Cannot fork
/m/smbkivinen/garmin/bin/gen-el-map.sh: Cannot fork
...
and when I check the syslog there is messages saying:
haste (17:32) /m/smbkivinen/garmin>tail /var/log/messages
...
Feb 10 17:24:37 haste /netbsd: proc: table is full - increase
kern.maxproc or NPROC
Feb 10 17:25:17 haste /netbsd: proc: table is full - increase
kern.maxproc or NPROC
Feb 10 17:27:19 haste /netbsd: proc: table is full - increase
kern.maxproc or NPROC
Then when I check how many processes are running ps claims:
haste (17:41) /m/smbkivinen/garmin>ps agxu | wc
52 581 4056
The system is configure to have kern.maxproc of 8000, but it
is complaining that the proc table is full, even when it only
has 52 processes running:
haste (17:42) /m/smbkivinen/garmin>sysctl -a | fgrepmaxproc
kern.maxproc = 8000
proc.curproc.rlimit.maxproc.soft = 160
proc.curproc.rlimit.maxproc.hard = 1044
I can still run few processes, but if I try to run more than
few processes the fork fails:
haste (17:42) /m/smbkivinen/garmin>(sleep 5 & sleep 5 & sleep 5 & sleep
5 & sleep 5 & sleep 5 & sleep 5 & sleep 5 & sleep 5 & sleep 5 & sleep 5 & sleep
5 & sleep 5 & sleep 5 & sleep 5 &)
zsh: fork failed: resource temporarily unavailable
haste (17:42) /m/smbkivinen/garmin>ps agxu | wc
52 581 4056
haste (17:42) /m/smbkivinen/garmin>
There does not seem to be any other way to recover from this
situation than reboot. My guess is that there is something
wrong with linux emulation in kernel which leaks the processes
in the proc table or something. During the 16 hours since last
reboot, I have run osmosis (the java program) around 6500
times, and the java has crashed around 200 times. The linux
emulation java seems to randomly crash quite often (usually
with out of memory error or similar) and rerunning the program
usually works after few tries. Those java crashes might be
related to the fact that I am using quite large java memory
limits, 4 GB, 11 GB or 18 GB depending on the program. Osmosis
uses 4 GB, splitter uses 18 GB and mkgmap uses 11GB. The
splitter was not able to process my maps with the default max
datasize limit (8GB), so thats why I had compile special
kernel.
Looking at my kern.maxproc (8000) and number of times I have
run those linux emulation java programs, it might be that
actually every single linux emulation java program leaks one
kernel proc table entry.
>How-To-Repeat:
Try running osmosis using sun-jre7-7.0.45 in loop and see if
that uses proc table up. I have not tried this but with my
current workload this seems to repeat daily so it is quite
fast for me to test fixes or something. If you set the
kern.maxproc to much lower value then this most likely will
repeat much faster.
>Fix:
Not known.
Most likely raising the kern.maxproc to way bigger (65k or
something) might move the limit further away.
Home |
Main Index |
Thread Index |
Old Index