Subject: Re: vm.bufmem_hiwater not honored (Re: failing to keep a process
To: Daniel Carosone <dan@geek.com.au>
From: Arto Selonen <arto@selonen.org>
List: tech-kern
Date: 11/17/2004 11:23:11
Hi!
On Wed, 17 Nov 2004, Daniel Carosone wrote:
> Please try the attached diff, which I have been using since we last
> looked at all this issue earlier this year.
> I would be very interested to see how it affects your issue.
I am almost certain that it will improve my situation. However, I don't
think it is a *solution* for it.
As Thor Lancelot Simon said, "buffer cache growth algorithm is _extremely_
conservative". I have now run 'systat bufcache -w 1' for about one full
day, and have made the following observations:
1) bufmem seems to jump way over bufmem_hiwater once a day
(I suspect midnight/logroll/squid, and try to confirm ASAP)
2) during the day, there is an obvious trend for bufmem to shrink
- every time page daemon scans, the number of metadata
buffers decreases, and so bufmem shrinks too
- this is about 2MB (~500 pages) per hour for bufmem
and a bit less than 1000 buffers per hour
- over a period of 10 hours, the numbers dropped:
- bufmem: 42MB -> 21MB
- buffers: 12,000 -> 3,500
3) within the shrinking trend, bufmem fluctuates: it both *grows*
and shrinks in size, though the number of buffers remains the
same (this is while free>freetarg); this happens even when
bufmem>bufmem_hiwater
This morning, buffer cache usage was again:
vm.bufmem = 45648896
vm.bufmem_lowater = 4194304
vm.bufmem_hiwater = 33554432
I saw vm.bufmem to be over 60,000,000, but it managed to somehow
drop to that 45M while I was writing this. I believe Thor in that bufmem
should NOT be able to get so much above the hiwater mark (or at least
that is how I interpreted his comments). I consider that to be the main
problem in my case.
Your patch will probably help because it will change the allocbuf
behaviour when a buffer is resized (which I'm assuming causes the
fluctuations in bufmem usage, and which happens a lot). Since these take
place when free>freetarg, and without your patch buf_canrelease would
return 0 (if the AGE list is empty), then allocbuf would not react to
bufmem already being over hiwater mark. With the patch, canrelease will
almost always return non-zero, thus buffer usage will be trimmed through
buf_trim(), and so every resize while bufmem>bufmem_hiwater could indeed
reduce buffer cache size, making it a lot faster than by only relying
on page daemon to reduce it a bit.
Anyway, once I get some confidence in understanding why/how bufmem is
currently behaving, then I'll try your buf_canrelease patch.
---- IF YOU ARE BUSY, STOP HERE; "EXTREME" PROGRAMMING FOLLOWS -----
As for the patch itself, here is my reasoning for buf_canrelease():
(take it as both proof of code, and an explanation as to how I see this)
- The comment for the function says
"Return estimate of bytes we think need to be
released to help resolve low memory conditions."
I disagree partly, as there may not be any low memory condition,
but there may still be a need to release some buffer cache
bytes. Of course this all depends on why the function
exists, and as I don't know the real reason, I've made up
my own: "Return the number of bytes one could ask buffer cache
to release, if there was a need to reduce buffer cache size".
NOTE: this changes the current semantics
NOTE: this leaves the size decision to caller
Currently, buf_canrelease seems to be used only by allocbuf()
when a resize would lead bufmem to be over bufmem_hiwater.
That call is "unconditional" in the sense that it leaves
the size decision to buf_canrelease: I think it should only
want to reduce buffer cache size as long as bufmem>bufmem_hiwater
buf_canrelease can not know *why* the caller would like to
reduce the buffer cache size (and there are at least two
different cases: resize exceeding hiwater, and page daemon
asking buffer cache to participate in freeing memory), thus
it can not make that decision. It can only give a suggestion as
to how much *it* would like buffer cache size to be reduced, if
somebody wanted to do that.
- Now that I've defined the reason for buf_canrelease to exist,
I can define how I think it might function. It should fill
the following conditions:
- take as an argument 'requested' (bytes caller wants)
- never say bufmem could go below lowater mark
- always offer enough to go below hiwater mark
- never offer more than was requested
(requesting 0, means no preference)
- always offer at least as much as AGE list has
- always offer at least freemin (could be freetarg)
- offer to shrink 1/16 of current usage
- try not to offer more than two NMEMPOOLS worth
The above would lead to something like this:
(actual implementation is left to reader)
if (request <= 0)
request = bufmem-bufmem_lowater;
return MAX(0,
MIN(bufmem-bufmem_lowater,
MAX(bufmem-bufmem_hiwater,
MIN(request,
MAX(bufqueues[BQ_AGE].bq_bytes,
MAX(freemin*PAGE_SIZE,
MIN((bufmem-bufmem_lowater)/16,2*MAXBSIZE)))))))
I don't know about efficiency, but it looks fairly clean
and simple. It does not need to be very exact either, as it
is only a suggestion (and the caller will need to make the
final decision anyway).
NOTE: it will almost never return 0
NOTE: it will suggest more than request if over hiwater
- Currently, the only user is allocbuf, which could be modified
for this approach quite easily (reusing variables, copying
buf_drain; I guess one could do two versions of buf_drain:
one with locking and another one without, so either one could be
called depending on whether locks were already set or not):
if ((bufmem += delta) > bufmem_hiwater) {
int target, got;
target = buf_canrelease(bufmem-bufmem_hiwater);
got = 0;
while (got < target) {
delta = buf_trim();
if (delta == 0)
break;
got += delta;
}
}
- With the above, one could take advantage of buf_canrelease
also in page daemon (again, efficieny might be a concernt):
buf_drain (buf_canrelease(bufcnt));
This may well break all sorts of conditions that I'm not aware
of. There may be timing isssues, and what not. I may be messing
on critical path, where you really don't want this sort of thing.
Artsi
--
#######======------ http://www.selonen.org/arto/ --------========########
Everstinkuja 5 B 35 Don't mind doing it.
FIN-02600 Espoo arto@selonen.org Don't mind not doing it.
Finland tel +358 50 560 4826 Don't know anything about it.