Source-Changes archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 4xx copyin/copyout [Was: CVS commit: src/sys/arch/powerpc/ibm4xx]



On Wed, Nov 28, 2007 at 02:22:13AM +1100, Simon Burge wrote:
> Juergen Hannken-Illjes wrote:
> 
> > On Thu, Nov 22, 2007 at 01:33:08PM +0000, Herb Peyerl wrote:
> > > 
> > > Module Name:      src
> > > Committed By:     hpeyerl
> > > Date:             Thu Nov 22 13:33:08 UTC 2007
> > > 
> > > Modified Files:
> > >   src/sys/arch/powerpc/ibm4xx: trap.c
> > > 
> > > Log Message:
> > > Optimize copyin/copyout to transfer as many words as possible before doing
> > > residual bytes. This improves small transfers. As a result, we can avoid
> > > doing bigcopyin/bigcopyout until len>1024 instead of len>256.
> > > 
> > > Reviewed by: simonb.
> > > 
> > > (everybody run, Herb's in the kernel again).
> > 
> > This breaks with alignment trap on the 1st copyout for evbppc/explora:
> > 
> >     kaddr=0x2fda88 udaddr=0xfffebff5 len=11
> > 
> > Trap is at this line:
> > 
> >     "   stw %[tmp],0(%[udaddr]);"       /* Store user word */
> 
> The 405 core manual says that the only intructions that can issue
> alignment traps are dcbz, dcread, lwarx and stwcx.  I found a manual
> for the 403CGX and it says "All data operands referenced by the Storage
> Reference instructions (loads/stores) must be aligned on a corresponding
> operand-size boundary." so that explains why I did see any issues when I
> tested Herb's patches on a 405 Walnut.
> 
> I've just tried using lswi (Load String Word Immediate) in place if
> lwz (and similar for the stores) to fix this without any significant
> rewriting, and on a Walnut it gets the same performance as just using
> lwz/stw.  Does the trailing patch work for you on the Explora?
> 
> If this works, we can possibly also look at unrolling the word loop a
> bit since the load/store string instructions can do up to 32-bytes per
> instruction.  Does anyone know how to request a number of consecutive
> registers with gcc asm constraints?  If not, it might be easier to break
> these two assembly fragments out to their own .S file...
> 
> If this patch doesn't fix the Explora, we can just add a alignment check
> (if on a 403) and skip the word-at-a-time loop, although the trailing
> loop will need to be updated to not just use "len % 4" bytes.

Patch works fine.  Explora comes up multi-user without problems.

-- 
Juergen Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig 
(Germany)



Home | Main Index | Thread Index | Old Index