Subject: openssh unaligned access in swap_bytes (on alpha)
To: None <port-alpha@netbsd.org, tech-crypto@netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: tech-crypto
Date: 10/23/2000 03:46:07
Hi,

I've got a core dump from NetBSD's version of openssh with debugging
enabled (running on an alpha), and here's the stack trace:

Core was generated by `ssh'.
Program terminated with signal 10, Bus error.
#0  0x12002736c in swap_bytes (src=0x1202e8007 "", dst_=0x12036d831 "", n=140)
    at /home/simonb/src/ssh-test/usr.bin/ssh/libssh/../../../crypto/dist/ssh/cipher.c:131
131                     *dst++ = t.i;
(gdb) bt
#0  0x12002736c in swap_bytes (src=0x1202e8007 "", dst_=0x12036d831 "", n=140)
    at /home/simonb/src/ssh-test/usr.bin/ssh/libssh/../../../crypto/dist/ssh/cipher.c:131
#1  0x1200281a8 in cipher_encrypt (context=0x120261d34, dest=0x12036d831 "", 
    src=0x1202e8003 "", len=1120)
    at /home/simonb/src/ssh-test/usr.bin/ssh/libssh/../../../crypto/dist/ssh/cipher.c:409
#2  0x120022c48 in packet_encrypt (cc=0x120261d34, dest=0x12036d831, 
    src=0x1202e8003, bytes=1120)
    at /home/simonb/src/ssh-test/usr.bin/ssh/libssh/../../../crypto/dist/ssh/packet.c:329
#3  0x120023760 in packet_send1 ()
    at /home/simonb/src/ssh-test/usr.bin/ssh/libssh/../../../crypto/dist/ssh/packet.c:528
#4  0x120023ed0 in packet_send ()
    at /home/simonb/src/ssh-test/usr.bin/ssh/libssh/../../../crypto/dist/ssh/packet.c:674
#5  0x1200065f0 in client_make_packets_from_stdin_data ()
    at /home/simonb/src/ssh-test/usr.bin/ssh/ssh/../../../crypto/dist/ssh/clientloop.c:316
#6  0x1200083ac in client_loop (have_pty=0, escape_char_arg=-1, ssh2_chan_id=0)
    at /home/simonb/src/ssh-test/usr.bin/ssh/ssh/../../../crypto/dist/ssh/clientloop.c:873
#7  0x120003260 in ssh_session ()
    at /home/simonb/src/ssh-test/usr.bin/ssh/ssh/../../../crypto/dist/ssh/ssh.c:912
#8  0x120002520 in main (ac=3, av=0x1fffff670)
    at /home/simonb/src/ssh-test/usr.bin/ssh/ssh/../../../crypto/dist/ssh/ssh.c:705

So the arguments obviously aren't 8 (or even 4) byte aligned, contrary
to the comment before swap_bytes() in cipher.c.

I think we should switch to something like:

Index: cipher.c
===================================================================
RCS file: /cvsroot/basesrc/crypto/dist/ssh/cipher.c,v
retrieving revision 1.1.1.1
diff -d -p -u -r1.1.1.1 cipher.c
--- cipher.c	2000/09/28 22:09:53	1.1.1.1
+++ cipher.c	2000/10/22 15:47:33
@@ -113,28 +113,21 @@ SSH_3CBC_DECRYPT(des_key_schedule ks1,
  * and after encryption/decryption. Thus the swap_bytes stuff (yuk).
  */
 static void
-swap_bytes(const unsigned char *src, unsigned char *dst_, int n)
+swap_bytes(const unsigned char *src, unsigned char *dst, int n)
 {
-	/* dst must be properly aligned. */
-	u_int32_t *dst = (u_int32_t *) dst_;
-	union {
-		u_int32_t i;
-		char c[4];
-	} t;
+	char c[4];
 
-	/* Process 8 bytes every lap. */
-	for (n = n / 8; n > 0; n--) {
-		t.c[3] = *src++;
-		t.c[2] = *src++;
-		t.c[1] = *src++;
-		t.c[0] = *src++;
-		*dst++ = t.i;
+	/* Process 4 bytes every lap. */
+	for (n = n / 4; n > 0; n--) {
+		c[3] = *src++;
+		c[2] = *src++;
+		c[1] = *src++;
+		c[0] = *src++;
 
-		t.c[3] = *src++;
-		t.c[2] = *src++;
-		t.c[1] = *src++;
-		t.c[0] = *src++;
-		*dst++ = t.i;
+		*dst++ = c[0];
+		*dst++ = c[1];
+		*dst++ = c[2];
+		*dst++ = c[3];
 	}
 }
 

Originally I dropped the processing loop down from 8 bytes at a time to
4 at a time.  Here's some timing tests for the original algorithm, the
above patch, and a modified version of the above patch but processing
eight bytes per loop.  This is for 100000 iters over two 1024 byte
arrays:

	CPU			  orig		  new(4)	  new(8)

	21264A	(500MHz)	 0.8662		  2.4425	  2.9943
	486	(25MHz)		83.8325		118.1313	112.5127
	PIII	(700MHz)	 0.9577		  1.1502	  1.2051
	R4400	(60MHz)		 4.9682		  5.1774	  4.7491
	SA-110	(233MHz)	 1.6488		  1.8607	  1.6500
	sun4m	(40MHz)		 9.3255		 10.0279	  9.6763

I'm tempted to leave it at 4 byte loops.  The code is clearer and
overall it doesn't seem to make that much off a difference either way.
I've tested this change actually in ssh on alpha and sparc (just to
check that I didn't botch any endian-type stuff).

I also think that such a change should also be pulled up to 1.5...

Simon.
--
Simon Burge                            <simonb@wasabisystems.com>
NetBSD Sales, Support and Service:  http://www.wasabisystems.com/