Subject: Re: [RFC] Interface to hardware-assisted data movers
To: None <cgd@broadcom.com>
From: Darren Reed <darrenr@reed.wattle.id.au>
List: tech-kern
Date: 06/22/2002 12:04:02
In some email I received from cgd@broadcom.com, sie wrote:
>
> * the load balancing algorithm, etc., seems a bit ad-hoc.
> additionally, static assignments of sessions to back-ends for all
> times also seems limiting. why restrict by describing it that way?
>
> random thought that popped into my head: if you have some kind of HW
> assist module which gets removed from the system (!!), in current
> scheme all dmover clients who happened to have their sessions
> assigned to that module will need to squish and create sessions
> anew.
>
> requirement that hw be used first is kinda lame... what if your xor
> engine is maxed out but you've got a dual-processor system that's
> idle waiting on xors to finish?
I've skipped some parts of this dicussion, but perhaps I can add
a few comments here...
...it would seem, from this point, that operations being registered
for by dmover/xform back ends should only be allowed for operations
that are already supported by the kernel in a hardware unassisted
manner. That way the kernel can do a small request (say) while a
hardware thing is busy doing a big request without being penalised.
Although it didn't get mentioned, it'd otherwise seem possible to
compile a kernel without DES/3-DES but then issue requests to a
xform backend. hmmm, would that be considered a "useful" feature?
...maybe during autoconfiguration, information about how much work
each dmover/xform "backend" can do is stored somewhere. At bootup
the kernel would try to measure how fast its own native dmover/xform
operations, whereas cards would have some sort of table with this
info. in it. To use the bid idea you mentioned, Chris, maybe this
is a seed for calculating the value of a "bid" ?
Furthermore, there may be times when it is faster to use the CPU in
a system than try and program a device to do some particular work.
There was a paper at last year's Usenix Security symposium of what
a particular engineer had to do in order to get a particular crypto
card to work better than a few kb/sec crypto. Although this may be
device specific, if setup times for hardware to do particular op's
is larger than for CPU based, why use the hardware ?