Subject: bcopy, bzero, copypage, and zeropage
To: None <port-m68k@NetBSD.ORG, tech-kern@NetBSD.ORG>
From: J.T. Conklin <jtc@cygnus.com>
List: tech-kern
Date: 12/09/1996 16:46:30
Those of you who read port-m68k know I've been fooling around with
improved implementations of block memory operation functions. Several
people were kind enough to run some benchmarks on systems I don't have
access to. As a result, I have an implementation which perform quite
nicely on all m68k family parts.
The reason I started down this path, was that profiles on my Sun3
showed that bcopy and bzero together took more than 10% of the time.
This clearly indicated that there was lots of room for beneficial
microoptimizations.
However most of the time for bcopy and bzero came from the calls in
pmap_copy_page and pmap_zero_page. Clearly functions that copy/zero
only page size objects can be made to be faster than general purpose
copy/zero functions. When I dug a little deeper, I discovered that
several m68k ports have copypage functions that do exactly that.
Some suggestions:
* create a corresponding zeropage function --- in my
profiles, zeroing was done more than copying.
* change all ports that currently call bcopy and bzero
to call copypage and zeropage.
* possibly move copypage and zeropage from each port's
locore.s to m68k/m68k/copy.s.
* once the versions are consolidated, clean up and optimize
the versions. For example, use dbf instead of subq/jne;
don't use back-to-back move16 insns (on '040s); only provide
a implementation for a specific cpu varient when configured
as such.
* etc.
Thoughts?
--jtc