tech-crypto archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: GSoc2010 project suggestion: swcryptX
On Tue, Feb 23, 2010 at 12:36:18AM +0100, Hubert Feyrer wrote:
>
>> From my understanding of the code, opencrypto(9) spawns 1 kernel thread
>>
> which then handles the actual crypto requests (crypto.c, crypto_init0()
> and cryptoret()). If a second opencrypto(9) call arrives while the first
> one is being handled, it is queued, and processed later
> (crypto_dispatch()).
You need to carefully follow a request down from userspace, into
cryptosoft, and back up. It doesn't work the way you seem to think it
does.
One big hint of that would be that the only kernel thread involved in
the whole business at all is called "cryptoret".
The queues you're looking at are used for result return, not request
dispatch. Requests are dispatched by invoking the driver's processing
methods via function pointer in crypto_invoke().
In the case of cryptosoft, this ends up running on the same CPU that
originally invoked the opencrypto machinery, unless it's been switched
away from. Because all the entities involved are marked MPSAFE, this
means as many LWPs as you like can be running in cryptosoft at the same
time. Your explanation of why this is not so is just wrong.
Here, look:
I have two CPU cores:
# cpuctl list
Num HwId Unbound LWPs Interrupts Last change
---- ---- ------------ -------------- ----------------------------
0 0 online intr Wed Jan 6 17:10:01 2010
1 1 online intr Wed Jan 6 17:10:01 2010
Here's how fast one core is purely in userspace:
# openssl speed --elapsed evp des-ede3-cbc
[...]
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
des-ede3-cbc 12629.10k 13063.08k 13136.54k 13128.63k 13191.85k
Here are those two cores, fed via 4 userspace processes:
# openssl speed -elapsed -evp des-ede3-cbc -multi 4
[...]
evp 23044.85k 27947.13k 26671.29k 30792.88k 30500.91k
Here is how fast it runs with one core, using cryptosoft via /dev/crypto:
# sysctl -w kern.cryptodevallowsoft=-1
[careful to undo this before doing any more software crypto tests...!]
kern.cryptodevallowsoft: 1 -> -1
# openssl speed -elapsed -evp des-ede3-cbc -engine cryptodev
des-ede3-cbc 5176.52k 8921.87k 11116.44k 12024.35k 12258.06k
And here are both (2) cores, using cryptosoft via /dev/crypto:
evp 5637.00k 18884.52k 19146.77k 27567.69k 28165.78k
The huge difference in speed for small requests is the syscall overhead to
get the requests into and out of the kernel. The small difference in speed
for large requests is because the DES implementation in opencrypto is better
than the one in cryptosoft -- it has asm, including asm for CBC mode. But
it's clear that in fact cryptosoft is using both cores.
Of course, this won't let you use multiple cores to offload crypto
processing from IPsec, which I suspect is what you want to do, but
that's because our networking code is not MP safe and thus while
requests are being processed in cryptosoft, the rest of the network
stack, which invoked cryptosoft, can't run.
But this has nothing to do with threads nor request submission queues
because there aren't any of either in cryptosoft.
"Fixing" this would mean pretty fundamentally rewriting the cryptosoft
driver to make it queue requests internally, possibly maintain its own
sleepable entities, etc. And that would probably harm its performance
for the cases where it works well now. The hardware drivers _have to_
do these things because they have hardware resources to manage; cryptosoft
does not.
Perhaps we could provide an alternate cryptosoft implementation which
queues requests, to speed up IPsec on multi-CPU machines. Attaching
multiple instances of _that_ might do what you want.
Another approach would be to look at the FAST_IPSEC code, which already
goes to great pains to be able to wait for requests when opencrypto does
queue them, and see if it could arrange to let other CPUs do packet
processing at those times. I think that architecturally this is a better
solution, but someone who really understands the networking stack and is
not afraid of the FAST_IPSEC code really would have more a more useful
opinion here (Jonathan? Arnaud? Matt?).
Thor
Home |
Main Index |
Thread Index |
Old Index