Subject: Re: IO throttle VOP
To: Frank van der Linden <fvdl@wasabisystems.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 12/17/2001 18:32:44
On Sun, 16 Dec 2001, Frank van der Linden wrote:

> On Sun, Dec 16, 2001 at 09:46:27PM -0000, David Laight wrote:
> > Don't think that helps?
>
> It does actually, I did a special-cased sample implementation
> as a proof of concept, and it works for the softdep case.

Could we see that?

> > What would you do with the layered filesystems?
>
> Not sure what you mean.. those would usually just pass down the
> VOP to the lower layer.

Well, they'll have to transform the vnodes to the lower ones.

I'd recomend against the array of vnodes. Mainly as that would be a new
kind of thing to pass around in our vnode interface. Right now each vop
can pass in up to 4 vnodes, and one vpp*.

So as a minor nit, I'd vote to make the call just pass in one or two
vnodes. I don't think we ever should have more than two vnodes locked at
once (you enter namei() with no locks, and can return with at most two),
so that should be fine.

> > - some method of 'callback' from the vm system into certain drivers
> >   (eg softdeps) to request than memory be freed if possible.
> > - noting the 'resource allocation rate' of processes and reducing the
> >   priority of those where it is high.
> > - making kernel code that is very likely to need memory request it
> >   before locking too many structures - maybe under some 'busy'
> >   conditions you get a call to 'unwind' part of the request until
> >   the resource is available.
> > (if you can ever guess which one it is!)
>
> These all seem sane suggestions, but the softdep case doesn't really
> allow for these. Let me explain.
>
> The concept of softdeps is that, in order to speed things up, you
> don't synchronously write metadata, but instead you keep a list
> of dependencies in memory, which can be used to write it in a
> delayed fashion. This should give you a consistent on-disk state.
>
> Dependencies are allocated in the process' create/remove/mkdir/etc
> path. They are deallocated after they have eventually been pushed
> to disk. Pushing this metadata out to disk is done, just as for
> 'plain' data, by the syncer process.
>
> So far, so good. Now, under heavy metadata usage (like simultaneous
> rm -rf's on a few large source trees, such as pkgsrc), dependencies
> may be allocated at such a pace that the syncer can't keep up,
> and memory starts filling up with dependency structures. This
> must be avoided. Currently, some heuristic limits are used for
> memory usage, above which the syncer is pushed into action more
> actively. But there is no guarantee that the syncer will outrun
> the process which produces the metadata changes. Yes, you can
> make the process of lower priority. Wouldn't work on SMP systems
> though, where the process could run on another CPU, and still
> overtake the syncer.
>
> Callbacks to free resources are also troublesome, since pushing out
> softdeps may mean having to take some locks, possibly vnode locks.
> Also, resource usage may temporarily increase when pushing them
> out. So, you must do it from a controlled environment, in which
> you know you can't get into deadlock trouble. The syncer process
> is such an environment. Others (like the pagedaemon, or even
> from any other process as part of a callback) will likely lead
> to disaster.
>
> The only way to enforce an upper limit for softdeps, is to
> make (user) processes wait until resources are available *before*
> they engage in metadata activity. The same likely applies
> to other resources.

So what would the use pattern be like?

Take care,

Bill