Subject: Re: port-shark/22355 [was: Help needed to fix NetBSD/shark]
To: Chris Gilbert <chris@dokein.co.uk>
From: Julio M. Merino Vidal <jmmv84@gmail.com>
List: port-arm
Date: 08/04/2007 13:20:54
On 04/08/2007, at 12:50, Chris Gilbert wrote:
> Julio M. Merino Vidal wrote:
>> Hi,
>>
>> Based on my limited understanding of ARM assembly (as in "just
>> learned
>> the basics yesterday") and after countless hours of crappy
>> debugging, I
>> think I have found THE^Wa bug in the isa_irq.S file. With my change
>> (see below) the machine seems to work fine, but I have also made
>> it work
>> in so many different and flawed ways (see the beginning of this
>> thread
>> or the contents of the PR) that I'm unsure if this is correct or not.
>>
>> The thing is that the file contains this loop:
>>
>> Lfind_highest_ipl:
>> ldr r2, [r7, r9, lsl #2]
>> tst r8, r2
>> subeq r9, r9, #1
>> beq Lfind_highest_ipl
>
> I think what you're missing is that this code looks for the first
> IPL/SPL where an interrupt is enabled, so it starts at the top and
> works
> downwards. So the clock, which is masked at SPL_CLOCK will have the
> interrupt line clear in SPL_CLOCK and above. Only when the code
> reaches
> SPL_AUDIO will tst not find it masked, and so r9 will be SPL_AUDIO on
> exit from that code.
So, in order to prevent the code after the loop accessing spl_masks
[_SPL_LEVELS], spl_masks[_SPL_LEVELS - 1] has to be 0 so that the tst
always sets the Z bit, right? Otherwise it'd not do the sub and
reincrementing r9 later on could make the code access an invalid
array position.
> Your change means that the interrupt is masked (due to it being
> added to
> disabled_mask) but the spl isn't at the correct level, eg IPL_BIO
> stuff
> will be running at IPL_NONE :)
Too good to be true :P
>> AIUI, this locates the highest IPL at which the received IRQs have
>> to be
>> served. After the beq, r9 contains the number of this IPL, and r2
>> contains the spl_mask for that level.
>
> when it hangs are you able to print the contents of:
> i8259_mask
> spl_mask
> current_mask
> disabled_mask
> current_spl_level
> current_intr_depth
>
> As I think they might help track down what's masked out and where. My
> feeling is that the clock interrupt is being left masked out by some
> code path somewhere, and not being re-enabled.
>
> Given the speed it happens inserting a printf call at exit from the
> handler with the current_spl_level may reveal if it's exitting with
> the
> spl_level correctly reset.
I was able to print the values of, e.g. current_spl_level at all
places where it is modified. And I'm fairly sure this (the current
code) is correct. When the machine gets locked, the last SPL value I
see is 0, so everything should be enabled and working...
I can try again to get the values of all these variables and not just
the SPL level.
Thanks,
--
Julio M. Merino Vidal <jmmv84@gmail.com>