Subject: Re: Swap overcommit (was Re: Replacement for grep(1) (part 2))
To: Charles M. Hannum <root@ihack.net>
From: Matthew Dillon <dillon@apollo.backplane.com>
List: tech-userlevel
Date: 07/13/1999 18:32:36
:
:> Has your simulation ever been kicked by the kernel due to lack of
:> swap space?
:
:I already said so. Please at least pretend to read what I wrote
:before replying.
:
:There is a big difference here between a piddling web server and a
:1000-hour simulation. If the web server goes down, you reboot it,
:maybe a few users are inconvenienced meanwhile, and maybe you lose
:some advertising revenue. If the simulation has to be restarted,
:you've lost *valuable* computing time that is not easy to replace.
:
:There are many environments where even the possibility of the
:simulation crashing due to external influence is unacceptable. I find
:it sad that you resist making FreeBSD robust against such problems,
:but that's your concern.
Sigh. If the simulation is so important to you and your system does
not have sufficient swap, maybe you should consider fixing your system
rather then blaming the people who wrote it. Or perhaps you should
consider checkpointing the code if you aren't willing to look for
easy solutions to the problem. Unless all the users on the system are
working against you, no one user with a runaway should be able to run
a properly configured system out of swap by accident. If your users
are doing it on purpose then maybe you should find a different machine
to work on, eh?
In a cooperative environment it is extremely easy to prevent accidental
runaways from eating a system's swap up, and still fairly easy to reduce
the damage done by purposeful attacks. In fact, at BEST we set soft
limits for most of the system resources to reasonable enough values that
users don't need to change them and that has protected 25 machines and
30,000 users for several years.
If you want help in fixing your system, we can talk over private email.
If you are looking for a magical overcommit solution you are going to
be looking for a long time. It isn't going to happen, because I doubt
it would even come close to fixing your problems even if it were
available.
If you are looking to blame overcommits for your problems, then lay out
how your system is setup. But I'll bet you the problem is something
less severe -- like a simple misconfiguration, or perhaps insufficient
swap. How much swap is on this system, by the way?
-Matt
Matthew Dillon
<dillon@backplane.com>