Subject: Re: bswap{16,32,64} in libutil ?
To: Manuel Bouyer <bouyer@antioche.lip6.fr>
From: Eduardo E. Horvath <eeh@one-o.com>
List: tech-userlevel
Date: 03/05/1998 08:41:04
On Wed, 4 Mar 1998, Manuel Bouyer wrote:
> On Mar 4, Eduardo E. Horvath wrote
> > Is it possible to break these functions down into "store as little-endian"
> > and "store as big endian"? There are architectures where this is a big
> > performance issue. Byte swapping on SPARC V9 machines is a complicated
> > process involving lots of shifts and masking, but storing to a particular
> > endianness is practically a NOP. Machines that really do swaps can
> > have one set of macros as NOPs and the others do the swap. (Or do I just
> > assume that the only time bswap*() is called on a big endian machine is to
> > access little-endian data and vice-versa?)
> >
>
> If they also can do 'read as little-endian' and 'read as big endian', it will
> be possible. If only store instructions are available, it will be harder,
> as there are places where convertions are done 'on the fly', such as
> var = ufs_bswap32(ufs_bswap32(var) +1); /* formely vas var++ */
It does reads as well. For those of you who really want to know the nitty
gritty details:
There is a bit in the PSTATE register that tells the machine to run in
little-enidan mode. There is another bit that tells the machine to take
traps in little-endian mode. The MMU's TTE has a bit to indicate that a
particular page has its endiannes inverted. There are also ASIs that
specify whether to acces a particular location as little endian or big
endian. This makes the generation of a bswap macro a bit difficult.
I suppose the most efficient way of doing this would be to put a #pragma
on the datatype to indicate the endianness and let the compiler deal with
it. Otherwise the above code would reqire something on the order of 2
loads and 2 stores, eg:
ldwa [var] ASI_PRIMARY_LITTLE, %o0 ; ufs_bswap32(var)
inc %o0 ; var = var + 1
stw %o0, [tmp] ; tmp = var
ldwa [tmp] ASI_PRIMARY_LITTLE, %o0 ; ufs_bswap32(tmp)
stw %o, [var] ; var = tmp
An optimized version should be:
ldwa [var] ASI_PRIMARY_LITTLE, %o0 ; ufs_bswap32(var)
inc %o0 ; var = var + 1
stwa %o0, [var] ASI_PRIMARY_LITTLE ; var = ufs_bswap32(var)
Since there are already gcc macros for lda and sta (and ldha (16-bit),
stha, ldxa (64-bit), and stxa) the macros could be designed something like
this:
#define load32_little(addr) lda((addr),ASI_PRIMARY_LITTLE)
#define load32_big(addr) lda((addr),ASI_PRIMARY)
#define load16_little(addr) ldha((addr),ASI_PRIMARY_LITTLE)
#define load16_big(addr) ldha((addr),ASI_PRIMARY)
#define load64_little(addr) ldxa((addr),ASI_PRIMARY_LITTLE)
#define load64_big(addr) ldxa((addr),ASI_PRIMARY)
#define store32_little(addr,v) sta((addr),(v),ASI_PRIMARY_LITTLE)
#define store32_big(addr,v) sta((addr),(v),ASI_PRIMARY)
#define store16_little(addr,v) stha((addr),(v),ASI_PRIMARY_LITTLE)
#define store16_big(addr,v) stha((addr),(v),ASI_PRIMARY)
#define store64_little(addr,v) stxa((addr),(v),ASI_PRIMARY_LITTLE)
#define store64_big(addr,v) stxa((addr),(v),ASI_PRIMARY)
Unfortunately, mapping this to an in-register bswap is non-trivial. I'll
need to take another close look at the V9 spec and see if there's some
efficient way of doing this inside a register.
=========================================================================
Eduardo Horvath eeh@one-o.com
"Cliffs are for climbing. That's why God invented grappling hooks."
- Benton Frasier