Subject: Re: mips kernel profiling
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Simon Burge <simonb@netbsd.org>
List: port-mips
Date: 04/18/2000 13:40:24
Jonathan Stone wrote:
> Wow, apologies for the delay. Seems like I'm not getting port-mips
> mail these days?!?...
>
>
> >_splset is a LEAF function, so it calls MCOUNT. From what I understand,
> >we shouldn't profile the profiling support :)
>
> Yep. Or your stack runneth over. ;).
>
>
> When _splset() was introduced, we should've created non-profiled
> entrypoints, say __splhigh() and __splx() entrypoint, and changed
> mips/include/profile.h to do
>
> #define MCOUNT_ENTER s = __splhigh();
> #define MCOUNT_EXIT __splx(s);
>
>
> (or _splhigh_()/_splx_(), whatever works best with ANSI namespace
> rules). _KERNEL_MCOUNT_DECL should change to match.
>
> One way to do this is to use XLEAF() to add alias entrypoints after
> the profiling goop emitted by the LEAF() macros. That's what the
> locore code used to do with splhigh/_splhigh, once upon a time.
I'm currently running with assembly MCOUNT_{ENTER,EXIT} to save the
function call overhead. I earlier had non-profiled _spl*() routines.
Which would you say is best?
Index: profile.h
===================================================================
RCS file: /cvsroot/syssrc/sys/arch/mips/include/profile.h,v
retrieving revision 1.13
diff -p -u -r1.13 profile.h
--- profile.h 2000/03/28 02:58:46 1.13
+++ profile.h 2000/04/18 03:29:46
@@ -42,28 +42,13 @@
#define _MIPS_PROFILE_H_
#ifdef _KERNEL
- /*
- * Declare non-profiled _splhigh() /_splx() entrypoints for _mcount.
- * see MCOUNT_ENTER and MCOUNT_EXIT.
- */
-#define _KERNEL_MCOUNT_DECL \
- int _splhigh __P((void)); \
- int _splx __P((int));
-#else /* !_KERNEL */
-/* Make __mcount static. */
-#define _KERNEL_MCOUNT_DECL static
-#endif /* !_KERNEL */
-
-#ifdef _KERNEL
# define _PROF_CPLOAD ""
#else
# define _PROF_CPLOAD ".cpload $25;"
#endif
-
#define _MCOUNT_DECL \
- _KERNEL_MCOUNT_DECL \
- void __attribute__((unused)) __mcount
+ static void __attribute__((unused)) __mcount
#define MCOUNT \
__asm__(".globl _mcount;" \
@@ -72,6 +57,7 @@
".set noreorder;" \
".set noat;" \
_PROF_CPLOAD \
+ "subu $29,$29,16;" \
"sw $4,8($29);" \
"sw $5,12($29);" \
"sw $6,16($29);" \
@@ -87,7 +73,7 @@
"lw $7,20($29);" \
"lw $31,4($29);" \
"lw $1,0($29);" \
- "addu $29,$29,8;" \
+ "addu $29,$29,24;" \
"j $31;" \
"move $31,$1;" \
".set reorder;" \
@@ -95,14 +81,38 @@
#ifdef _KERNEL
/*
- * The following two macros do splhigh and splx respectively.
- * They have to be defined this way because these are real
- * functions on the MIPS, and we do not want to invoke mcount
- * recursively.
+ * Block interrupts during mcount so that those interrupts can also be
+ * counted (as soon as we get done with the current counting).
*/
-#define MCOUNT_ENTER s = _splhigh()
-#define MCOUNT_EXIT _splx(s)
-#endif /* _KERNEL */
+/* $1 is at, $8 is t0, $12 is MIPS_COP_0_STATUS */
+#define MCOUNT_ENTER __asm__( \
+ ".set noat;" \
+ ".set noreorder;" \
+ "mfc0 $1,$12;" \
+ "nop;" \
+ "andi %0,$1,1;" \
+ "beq $1,$0,1f;" \
+ "li $8,-2;" \
+ "and $1,$1,$8;" \
+ "mtc0 $1,$12;" \
+ "nop;" \
+ "1:;" \
+ ".set at;" \
+ ".set reorder" : "=g" (s) :: "t0", "at");
+
+#define MCOUNT_EXIT __asm__( \
+ ".set noat;" \
+ ".set noreorder;" \
+ "beq %0,$0,1f;" \
+ "mfc0 $1,$12;" \
+ "nop;" \
+ "ori $1,$1,1;" \
+ "mtc0 $1,$12;" \
+ "nop;" \
+ "1:;" \
+ ".set at;" \
+ ".set reorder" :: "g" (s) : "at");
+#endif /* _KERNEL */
#endif /* _MIPS_PROFILE_H_ */
> *Sigh*. Its a real shame kernel profiling keeps getting busted. That
> suggests that kernel changes being arent being adequately profiled
> before they get committed. NetBSD/pmax used to be enough faster than
> the alternatives that some large campuses switched servers just for
> the performance improvement. I wonder if that's still true.
>
> Simon -- can you run lmbench binaries on both Ultrix and NetBSD,
> on a 60Mhz r4400?
Overall not too bad. The process exec time is probably the worst for
NetBSD. In this case, the Ultrix box had no local filesystems, so
pretty much ignore the file benchmarks.
L M B E N C H 1 . 9 S U M M A R Y
------------------------------------
(Alpha software, do not distribute)
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host OS Mhz null null open selct sig sig fork exec sh
call I/O stat clos inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
mips-dec- ULTRIX 4.5 117 3.7 28. 80 99 0.39K 13.8 41 5.6K 14K 30K
pmax-netb NetBSD 1.4X 118 3.5 17. 105 124 0.31K 8.2 26 5.0K 37K 62K
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
mips-dec- ULTRIX 4.5 46 356 963 251 1493 296 1738
pmax-netb NetBSD 1.4X 18 339 754 284 1198 308 1714
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
mips-dec- ULTRIX 4.5 46 101 146 404 302 1678
pmax-netb NetBSD 1.4X 18 131 123 383 458 1882
File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page
Create Delete Create Delete Latency Fault Fault
--------- ------------- ------ ------ ------ ------ ------- ----- -----
mips-dec- ULTRIX 4.5 189 61 1265 128 0
pmax-netb NetBSD 1.4X 2941 1136 5555 3030 158314 6.6K
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
mips-dec- ULTRIX 4.5 16 10 -1 1 0 10 9 23 18
pmax-netb NetBSD 1.4X 10 12 7 9 24 10 10 24 18
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
---------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Guesses
--------- ------------- --- ---- ---- -------- -------
mips-dec- ULTRIX 4.5 117 23 281 1269
pmax-netb NetBSD 1.4X 118 25 291 1251
Simon.