Subject: Re: ARM9 cache routines updated
To: None <port-arm@netbsd.org>
From: Hiroyuki Bessho <bsh@grotto.jp>
List: port-arm
Date: 02/09/2004 18:46:17
Richard Earnshaw <rearnsha@arm.com> writes:
>
> If anyone has access to the various Samsung ARM920-based boards I'd be
> interested to hear how this affects performance.
>
I got an lmbench result on SMDK2410.
L M B E N C H 1 . 9 S U M M A R Y
------------------------------------
(Alpha software, do not distribute)
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host OS Mhz null null open selct sig sig fork exec sh
call I/O stat clos inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
2410-a NetBSD 1.6ZI 200 3.1 19. 74 1650 0.51K 10.8 26 7.1K 31K 77K
2410-a NetBSD 1.6ZI 200 3.1 19. 75 1654 0.52K 10.9 26 7.1K 31K 77K
2410-b NetBSD 1.6ZI 200 1.8 10. 41 1566 0.29K 6.4 16 6.2K 27K 67K
2410-b NetBSD 1.6ZI 200 1.8 10. 41 1595 0.29K 6.4 16 6.2K 27K 67K
2410-c NetBSD 1.6ZI 200 3.1 19. 74 1527 0.51K 10.7 26 7.2K 29K 72K
2410-c NetBSD 1.6ZI 200 3.1 19. 73 1599 0.50K 10.6 26 7.2K 29K 72K
2410-d NetBSD 1.6ZI 200 1.5 9.5 36 1431 0.26K 5.3 13 5.8K 24K 59K
2410-d NetBSD 1.6ZI 200 1.5 9.5 36 1478 0.26K 5.4 13 5.8K 24K 59K
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
2410-a NetBSD 1.6ZI 267 631 627 634
2410-a NetBSD 1.6ZI 267 626 624 630
2410-b NetBSD 1.6ZI 265 635 631 654
2410-b NetBSD 1.6ZI 268 633 631 659
2410-c NetBSD 1.6ZI 320 687 680 695
2410-c NetBSD 1.6ZI 320 685 679 691
2410-d NetBSD 1.6ZI 287 663 656 657
2410-d NetBSD 1.6ZI 289 661 656 669
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
2410-a NetBSD 1.6ZI 267 586 794
2410-a NetBSD 1.6ZI 267 586 798
2410-b NetBSD 1.6ZI 265 564 770
2410-b NetBSD 1.6ZI 268 563 771
2410-c NetBSD 1.6ZI 320 689 881
2410-c NetBSD 1.6ZI 320 693 881
2410-d NetBSD 1.6ZI 287 596 787
2410-d NetBSD 1.6ZI 289 596 787
File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page
Create Delete Create Delete Latency Fault Fault
--------- ------------- ------ ------ ------ ------ ------- ----- -----
2410-a NetBSD 1.6ZI 465 337 3333 321 2027 9.9K
2410-a NetBSD 1.6ZI 458 312 3333 326 2091 9.9K
2410-b NetBSD 1.6ZI 442 292 3225 302 1568 9.7K
2410-b NetBSD 1.6ZI 442 302 3225 314 1582 9.7K
2410-c NetBSD 1.6ZI 442 290 3225 310 2215 9.9K
2410-c NetBSD 1.6ZI 436 301 3225 313 2105 9.8K
2410-d NetBSD 1.6ZI 411 273 3125 286 1515 9.5K
2410-d NetBSD 1.6ZI 409 277 3125 294 1500 9.5K
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
2410-a NetBSD 1.6ZI 28 4 -1 14 45 41 40 45 189
2410-a NetBSD 1.6ZI 27 4 -1 14 45 41 40 45 189
2410-b NetBSD 1.6ZI 28 4 -1 15 54 42 42 54 192
2410-b NetBSD 1.6ZI 28 4 -1 15 54 42 42 54 192
2410-c NetBSD 1.6ZI 24 4 -1 14 45 41 40 45 189
2410-c NetBSD 1.6ZI 24 4 -1 14 45 41 40 45 189
2410-d NetBSD 1.6ZI 27 4 -1 16 54 42 42 54 193
2410-d NetBSD 1.6ZI 27 4 -1 16 54 42 42 54 193
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
---------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Guesses
--------- ------------- --- ---- ---- -------- -------
2410-a NetBSD 1.6ZI 200 20 483 495 No L2 cache?
2410-a NetBSD 1.6ZI 200 20 484 495 No L2 cache?
2410-b NetBSD 1.6ZI 200 10 483 495 No L2 cache?
2410-b NetBSD 1.6ZI 200 10 483 495 No L2 cache?
2410-c NetBSD 1.6ZI 200 20 483 495 No L2 cache?
2410-c NetBSD 1.6ZI 200 20 483 495 No L2 cache?
2410-d NetBSD 1.6ZI 200 10 483 495 No L2 cache?
2410-d NetBSD 1.6ZI 200 10 483 495 No L2 cache?
The kernels were built from -current source as of 2004-Feb-04, with
following changes:
2410-a: backed out both write-back dcache change and clocking-mode
bits fix in arm9_setup().
(using sys/arm/include/cpufunc.h:1.29, sys/arm/arm/cpufunc.c:1.65,
sys/arm/arm/cpufunc_asm_arm9.S:1.2)
2410-b: with clocking-mode bits fix in arm9_setup(), and without
write-back d-cache.
2410-c: with write-back d-cache chages, and without clocking-mode
bits fix.
2410-d: both write-back d-cache changes and clocking-mode bits fix.
It showed that clocking-mode bits fix made better results for all
tests, while write-back d-cache changes gave lower performance on some
tests.
--
bsh.