Subject: Re: common irq handler code
To: Steve Woodford <scw@netbsd.org>
From: Chris Gilbert <chris@dokein.co.uk>
List: port-arm
Date: 04/11/2007 01:02:01
This is a multi-part message in MIME format.
--------------060301010207060303060203
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Steve Woodford wrote:
> On Monday 09 April 2007 08:01, Toru Nishimura wrote:
>> Chris Gilbert says;
>>
>> [ ... proposal to add "common irq code path" ... ]
>>
>>> So instead I've taken the footbridge irqhandler code, and made it a
>>> common file. (The footbridge version was based on Jason's work for
>>> xscale platforms)
>> I agree with that providing "the common" is an better idea, but it
>> should be emphasised that it has limitations which might be inadequate
>> for some cases.
>
> I'll echo that. While the common code may be useful during early bring-up
> of a new port, there is no substitute for an implementation targetted
> directly at the flavour of interrupt controller. This code is critical
> to good performance. In particular, the spl(9) implementation must be as
> small as possible. I have a few comments...
>
> 1. Consider the proposed arm_intr_splx() implementation. Right at the
> start there are a whole bunch of 'extern foo' declarations. Each one of
> those variables will generate, on ARM, a PC-relative load just to find
> the address of the variable. This is all inlined code, so even small
> functions like this contrived example:
>
> {
> extern volatile int x;
> int s = splbio();
> x = 0;
> splx(s);
> }
>
> when compiled will be very much larger than you'd expect. Consider moving
> some state into cpu_info.
I guess the common parts could be moved, eg current_spl_level. Which
would save one load/store for each variable.
> 2. As you noted, the 'mask' approach whereby pending interrupts are
> recorded in a mask which mirrors the hardware's mask registers does not
> scale well with more than 32 individual interrupt sources. It also
> becomes extremely unwieldy in splx(9) where you'd have to logically AND
> multiple 32-bit values to determine if an interrupt is pending. Instead,
> on CPUs with an ffs(2)-like instruction ("clz" for example) you can use
> a single 32-bit variable to record pending interrupts per IPL_*. Then
> splx(9) can simply use "clz" to determine if it needs to invoke the
> 'dispatch pending' heavy lifter. You also greatly reduce the number of
> external variable references in splx(9).
Certainly that sounds like a good idea. In fact using the ipl levels as
a bitmask means that the test actually becomes a simple unsigned
compare. Take the new ipl and shift by it. If the ipls pending value
is >= than the shifted ipl value we have something to handle:
splx(int ipl)
{
extern volatile uint32_t ipls_pending;
extern volatile int current_spl_level;
uint32_t iplvalue = (1 << spl);
/* Don't let the compiler re-order this code with preceding code */
__insn_barrier();
current_spl_level = spl;
if (ipls_pending >= iplvalue)
splx_heavy_lifter(spl);
return;
}
would be good enough, without doing the ffs/clz thing? In fact arm can
use it's barrel shifter to great effect, and the compare ends up as:
mov r2, #1
cmp r1, r2, asl r0
movcc pc, lr
b splx_heavy_lifter
Where r1 is ipls_pending, r0 is ipl. Add in the double load for
ipls_pending and that's 5 instructions, another 2 for the storing
current_spl_level, so 7 instructions is probably the smallest I'll get it.
It also allows the spl routines to remain common, but the heavy lifter
vary depending on the target hardware.
> Using the above two suggestions in tandem, the spl(9) implementation will
> be much smaller, the code will scale to multiple hardware masks much
> more easily, and you may find some more opportunities to factor out
> common/shared code.
You're right, I'll look into combining things. I think the
current_spl_level and ipls_pending would make sense in the cpu_info
(well for now, smp for arm11 would need something different :)
A quick experiment shows that all of the above combined makes:
void test_routine2()
{
extern volatile int b;
int oldstate = arm_intr_splraise(10);
b = 7;
arm_intr_splx2(oldstate);
}
come out as 13 instructions with no branching:
test_routine2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r2, .L6
mov r3, #10
ldr r0, [r2, #4]
@ lr needed for prologue
str r3, [r2, #4]
ldr r3, .L6+4
mov r1, #7
str r1, [r3, #0]
str r0, [r2, #4]
ldr r1, [r2, #0]
mov r3, #1
cmp r1, r3, asl r0
movcc pc, lr
b splx_heavy_lifter
.L7:
.align 2
.L6:
.word cpu_info
.word b
.size test_routine2, .-test_routine2
For this simple test case, a basic b=7 takes 4 instructions, that's an
overhead of 9 instructions for calling spl() splx() Which I think is
probably as tight as I'll get it.
Previously the shortest path was 25 instructions :)
Attached is my test .c file, and .s file. Real code will probably be
very different.
Thanks,
Chris
--------------060301010207060303060203
Content-Type: text/plain;
name="arm_intr_test.s"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="arm_intr_test.s"
.file "arm_intr_test.c"
.text
.align 2
.global test_routine_raw
.type test_routine_raw, %function
test_routine_raw:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r3, .L3
mov r2, #7
@ lr needed for prologue
str r2, [r3, #0]
mov pc, lr
.L4:
.align 2
.L3:
.word b
.size test_routine_raw, .-test_routine_raw
.align 2
.global test_routine2
.type test_routine2, %function
test_routine2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r2, .L10
mov r3, #10
ldr r0, [r2, #4]
@ lr needed for prologue
str r3, [r2, #4]
ldr r3, .L10+4
mov r1, #7
str r1, [r3, #0]
str r0, [r2, #4]
ldr r1, [r2, #0]
mov r3, #1
cmp r1, r3, asl r0
movcc pc, lr
b splx_heavy_lifter
.L11:
.align 2
.L10:
.word cpu_info
.word b
.size test_routine2, .-test_routine2
.align 2
.global test_routine
.type test_routine, %function
test_routine:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 1, uses_anonymous_args = 0
ldr r2, .L20
mov ip, sp
mov r3, #10
stmfd sp!, {r4, r5, fp, ip, lr, pc}
ldr r5, [r2, #4]
sub fp, ip, #4
str r3, [r2, #4]
ldr r3, .L20+4
mov r2, #7
str r2, [r3, #0]
ldr r3, .L20+8
ldr r2, .L20+12
ldr r1, .L20+16
str r5, [r3, #0]
ldr r0, [r2, #0]
ldr r3, [r1, r5, asl #2]
mov ip, #128
bics lr, r0, r3
bne .L18
.L13:
ldr r3, .L20+20
ldr r2, .L20+24
ldr r1, [r3, #0]
ldr r3, [r2, r5, asl #2]
bics r3, r1, r3
bne .L19
ldmfd sp, {r4, r5, fp, sp, pc}
.L19:
ldmfd sp, {r4, r5, fp, sp, lr}
b arm_do_pending_soft_intr
.L18:
mrs r4, cpsr
bic r3, r4, ip
eor r3, r3, ip
msr cpsr_c, r3
ldr r1, .L20+28
ldr r2, .L20+32
ldr r3, [r1, #0]
ldr r0, [r2, #0]
orr r3, r3, lr
str r3, [r1, #0]
ldr r2, [r1, #0]
mvn r0, r0
and r0, r0, r2
bl arm_hardware_set_irq_mask
and r4, r4, #192
mov r3, #192
mrs r1, cpsr
bic r2, r1, r3
eor r2, r2, r4
msr cpsr_c, r2
b .L13
.L21:
.align 2
.L20:
.word cpu_info
.word b
.word current_spl_level
.word arm_intr_pending
.word arm_imask
.word arm_soft_intr_pending
.word arm_simask
.word intr_enabled
.word intr_soft_disabled
.size test_routine, .-test_routine
.ident "GCC: (GNU) 4.1.2 20070110 (prerelease) (NetBSD nb1 20070110)"
--------------060301010207060303060203
Content-Type: text/plain;
name="arm_intr_test.c"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="arm_intr_test.c"
/* $NetBSD: footbridge_intr.h,v 1.10 2007/03/09 06:45:20 thorpej Exp $ */
/*
* Copyright (c) 2001, 2002 Wasabi Systems, Inc.
* All rights reserved.
*
* Written by Jason R. Thorpe for Wasabi Systems, Inc.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed for the NetBSD Project by
* Wasabi Systems, Inc.
* 4. The name of Wasabi Systems, Inc. may not be used to endorse
* or promote products derived from this software without specific prior
* written permission.
*
* THIS SOFTWARE IS PROVIDED BY WASABI SYSTEMS, INC. ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
* TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL WASABI SYSTEMS, INC
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
#include <stdint.h>
#define NIPL 14
#define I32_bit (1 << 7) /* IRQ disable */
#define F32_bit (1 << 6) /* FIQ disable */
static __inline uint32_t __set_cpsr_c(uint32_t bic, uint32_t eor) __attribute__((__unused__));
static __inline uint32_t
__set_cpsr_c(uint32_t bic, uint32_t eor)
{
uint32_t tmp, ret;
__asm volatile(
"mrs %0, cpsr\n" /* Get the CPSR */
"bic %1, %0, %2\n" /* Clear bits */
"eor %1, %1, %3\n" /* XOR bits */
"msr cpsr_c, %1\n" /* Set the control field of CPSR */
: "=&r" (ret), "=&r" (tmp)
: "r" (bic), "r" (eor) : "memory");
return ret;
}
#define disable_interrupts(mask) \
(__set_cpsr_c((mask) & (I32_bit | F32_bit), \
(mask) & (I32_bit | F32_bit)))
#define enable_interrupts(mask) \
(__set_cpsr_c((mask) & (I32_bit | F32_bit), 0))
#define restore_interrupts(old_cpsr) \
(__set_cpsr_c((I32_bit | F32_bit), (old_cpsr) & (I32_bit | F32_bit)))
static inline void __attribute__((__unused__))
arm_intr_splx(int newspl)
{
extern volatile uint32_t intr_enabled;
extern uint32_t intr_soft_disabled;
extern volatile int current_spl_level;
extern volatile uint32_t arm_intr_pending;
extern void arm_do_pending_soft_intr(void);
extern uint32_t arm_imask[NIPL];
extern uint32_t arm_simask[NIPL];
extern volatile uint32_t arm_soft_intr_pending;
int oldirqstate, hwpend;
/* Don't let the compiler re-order this code with preceding code */
__insn_barrier();
current_spl_level = newspl;
/* un-mask any interupts that we can now handle */
hwpend = (arm_intr_pending & ~arm_imask[newspl]);
if (hwpend != 0) {
oldirqstate = disable_interrupts(I32_bit);
intr_enabled |= hwpend;
arm_hardware_set_irq_mask(intr_enabled & ~intr_soft_disabled);
restore_interrupts(oldirqstate);
}
if (arm_soft_intr_pending & ~arm_simask[newspl])
arm_do_pending_soft_intr();
}
struct test_cpu_info
{
volatile unsigned int ipls_pending;
volatile int current_spl_level;
};
extern struct test_cpu_info cpu_info;
static inline void __attribute__((__unused__))
arm_intr_splx2(int spl)
{
unsigned int iplvalue = (1 << spl);
/* Don't let the compiler re-order this code with preceding code */
__insn_barrier();
cpu_info.current_spl_level = spl;
if (cpu_info.ipls_pending >= iplvalue)
splx_heavy_lifter();
return;
}
static inline int __attribute__((__unused__))
arm_intr_splraise(int ipl)
{
int old;
old = cpu_info.current_spl_level;
cpu_info.current_spl_level = ipl;
/* Don't let the compiler re-order this code with subsequent code */
__insn_barrier();
return (old);
}
static inline int __attribute__((__unused__))
arm_intr_spllower(int ipl)
{
int old = cpu_info.current_spl_level;
arm_intr_splx(ipl);
return(old);
}
void
test_routine()
{
extern volatile int b;
int oldstate = arm_intr_splraise(10);
b = 7;
arm_intr_splx(oldstate);
}
void
test_routine2()
{
extern volatile int b;
int oldstate = arm_intr_splraise(10);
b = 7;
arm_intr_splx2(oldstate);
}
void test_routine_raw()
{
extern volatile int b;
b = 7;
}
--------------060301010207060303060203--