tech-crypto archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

openssh unaligned access in swap_bytes (on alpha)



Hi,

I've got a core dump from NetBSD's version of openssh with debugging
enabled (running on an alpha), and here's the stack trace:

Core was generated by `ssh'.
Program terminated with signal 10, Bus error.
#0  0x12002736c in swap_bytes (src=0x1202e8007 "", dst_=0x12036d831 "", n=140)
    at 
/home/simonb/src/ssh-test/usr.bin/ssh/libssh/../../../crypto/dist/ssh/cipher.c:131
131                     *dst++ = t.i;
(gdb) bt
#0  0x12002736c in swap_bytes (src=0x1202e8007 "", dst_=0x12036d831 "", n=140)
    at 
/home/simonb/src/ssh-test/usr.bin/ssh/libssh/../../../crypto/dist/ssh/cipher.c:131
#1  0x1200281a8 in cipher_encrypt (context=0x120261d34, dest=0x12036d831 "", 
    src=0x1202e8003 "", len=1120)
    at 
/home/simonb/src/ssh-test/usr.bin/ssh/libssh/../../../crypto/dist/ssh/cipher.c:409
#2  0x120022c48 in packet_encrypt (cc=0x120261d34, dest=0x12036d831, 
    src=0x1202e8003, bytes=1120)
    at 
/home/simonb/src/ssh-test/usr.bin/ssh/libssh/../../../crypto/dist/ssh/packet.c:329
#3  0x120023760 in packet_send1 ()
    at 
/home/simonb/src/ssh-test/usr.bin/ssh/libssh/../../../crypto/dist/ssh/packet.c:528
#4  0x120023ed0 in packet_send ()
    at 
/home/simonb/src/ssh-test/usr.bin/ssh/libssh/../../../crypto/dist/ssh/packet.c:674
#5  0x1200065f0 in client_make_packets_from_stdin_data ()
    at 
/home/simonb/src/ssh-test/usr.bin/ssh/ssh/../../../crypto/dist/ssh/clientloop.c:316
#6  0x1200083ac in client_loop (have_pty=0, escape_char_arg=-1, ssh2_chan_id=0)
    at 
/home/simonb/src/ssh-test/usr.bin/ssh/ssh/../../../crypto/dist/ssh/clientloop.c:873
#7  0x120003260 in ssh_session ()
    at 
/home/simonb/src/ssh-test/usr.bin/ssh/ssh/../../../crypto/dist/ssh/ssh.c:912
#8  0x120002520 in main (ac=3, av=0x1fffff670)
    at 
/home/simonb/src/ssh-test/usr.bin/ssh/ssh/../../../crypto/dist/ssh/ssh.c:705

So the arguments obviously aren't 8 (or even 4) byte aligned, contrary
to the comment before swap_bytes() in cipher.c.

I think we should switch to something like:

Index: cipher.c
===================================================================
RCS file: /cvsroot/basesrc/crypto/dist/ssh/cipher.c,v
retrieving revision 1.1.1.1
diff -d -p -u -r1.1.1.1 cipher.c
--- cipher.c    2000/09/28 22:09:53     1.1.1.1
+++ cipher.c    2000/10/22 15:47:33
@@ -113,28 +113,21 @@ SSH_3CBC_DECRYPT(des_key_schedule ks1,
  * and after encryption/decryption. Thus the swap_bytes stuff (yuk).
  */
 static void
-swap_bytes(const unsigned char *src, unsigned char *dst_, int n)
+swap_bytes(const unsigned char *src, unsigned char *dst, int n)
 {
-       /* dst must be properly aligned. */
-       u_int32_t *dst = (u_int32_t *) dst_;
-       union {
-               u_int32_t i;
-               char c[4];
-       } t;
+       char c[4];
 
-       /* Process 8 bytes every lap. */
-       for (n = n / 8; n > 0; n--) {
-               t.c[3] = *src++;
-               t.c[2] = *src++;
-               t.c[1] = *src++;
-               t.c[0] = *src++;
-               *dst++ = t.i;
+       /* Process 4 bytes every lap. */
+       for (n = n / 4; n > 0; n--) {
+               c[3] = *src++;
+               c[2] = *src++;
+               c[1] = *src++;
+               c[0] = *src++;
 
-               t.c[3] = *src++;
-               t.c[2] = *src++;
-               t.c[1] = *src++;
-               t.c[0] = *src++;
-               *dst++ = t.i;
+               *dst++ = c[0];
+               *dst++ = c[1];
+               *dst++ = c[2];
+               *dst++ = c[3];
        }
 }
 

Originally I dropped the processing loop down from 8 bytes at a time to
4 at a time.  Here's some timing tests for the original algorithm, the
above patch, and a modified version of the above patch but processing
eight bytes per loop.  This is for 100000 iters over two 1024 byte
arrays:

        CPU                       orig            new(4)          new(8)

        21264A  (500MHz)         0.8662           2.4425          2.9943
        486     (25MHz)         83.8325         118.1313        112.5127
        PIII    (700MHz)         0.9577           1.1502          1.2051
        R4400   (60MHz)          4.9682           5.1774          4.7491
        SA-110  (233MHz)         1.6488           1.8607          1.6500
        sun4m   (40MHz)          9.3255          10.0279          9.6763

I'm tempted to leave it at 4 byte loops.  The code is clearer and
overall it doesn't seem to make that much off a difference either way.
I've tested this change actually in ssh on alpha and sparc (just to
check that I didn't botch any endian-type stuff).

I also think that such a change should also be pulled up to 1.5...

Simon.
--
Simon Burge                            <simonb%wasabisystems.com@localhost>
NetBSD Sales, Support and Service:  http://www.wasabisystems.com/



Home | Main Index | Thread Index | Old Index