Subject: Re: Port of NetBSD to XScale
To: Charles M. Hannum <root@ihack.net>
From: Richard Earnshaw <rearnsha@arm.com>
List: port-arm32
Date: 03/29/2001 11:08:13
>
> On Thu, Mar 29, 2001 at 09:17:52AM +0100, Chris Gilbert wrote:
> >
> > Branching looks to be worse than ever at 4 cycle for a branch miss, or 0 if
> > it's predicted by the branch prediction buffer, it doesn't see the standard
> > MOV PC, LR to return method, I suspect that doing B LR will help it there.
>
> Er, are you saying `mov pc, lr' always causes a 4-cycle stall? That
> would an *amazing* f*ck*p. Wow.
>
According to the documentation I have, Xscale only predicts B and BL
instructions, both of which only have pc-relative invariant offsets. Any
mis-predicted (or unpredicted) branch takes at least 5 cycles to issue (8
if the value has to come from memory). [XScale Developers Manual, Table
14-4]
So I don't think there are any coding tricks to speed this up, other than
to avoid code like
ldr pc, [addr]
when it can be reasonably split into
ldr reg, [addr]
<other instructions>
mov pc, reg
which can save a couple of cycles.