Subject: Re: Intermediate void* casts
To: <>
From: David Laight <david@l8s.co.uk>
List: tech-misc
Date: 08/11/2003 11:46:53
> Following is how gcc 3.4 20030806 (experimental) with -O2 makes it:
>
> 00000000 <bswap64>:
> 0: 55 push %ebp
> 1: 89 e5 mov %esp,%ebp
> 3: 8b 45 08 mov 0x8(%ebp),%eax
> 6: 53 push %ebx
> 7: 8b 55 0c mov 0xc(%ebp),%edx
> a: 89 c1 mov %eax,%ecx
> c: 66 c1 c9 08 ror $0x8,%cx
> 10: c1 c9 10 ror $0x10,%ecx
> 13: 66 c1 c9 08 ror $0x8,%cx
> 17: 89 d3 mov %edx,%ebx
> 19: 66 c1 cb 08 ror $0x8,%bx
> 1d: c1 cb 10 ror $0x10,%ebx
> 20: 66 c1 cb 08 ror $0x8,%bx
> 24: 31 c0 xor %eax,%eax
> 26: 89 ca mov %ecx,%edx
> 28: 89 d9 mov %ebx,%ecx
> 2a: 31 db xor %ebx,%ebx
> 2c: 09 da or %ebx,%edx
> 2e: 09 c8 or %ecx,%eax
> 30: 5b pop %ebx
> 31: c9 leave
> 32: c3 ret
Or you could hand-code:
0: pop %ecx
1: pop %edx
2: pop %eax
3: ror $8, %ax
7: ror $10 %eax
a: ror $8, %ax
e: ror $8, %dx
11: ror $10, %edx
14: ror $8, %dx
18: jmp %ecx
1a:
If I've got the registers in the right order (etc).
Probably runs faster with a different ordering or the 'ror' instructions.
One this I did notice when changing some similar code to be:
i64 = (uint64_t)tv.tv_sec << 32 | tv.tv_sec * 4294u;
is that the current gcc managed to sign extend tv.tv_sec and then throw
away the top word. Doing:
i64 = tv.tv_sec * 0x100000000ull | tv.tc_sec * 4294u;
stopped this happening.
However in both cases is looks as though gcc's 'peephole optimiser' isn't
very good. Something ought to be killing the register-register moves and
instructions whose result is never used.
David
--
David Laight: david@l8s.co.uk