Subject: Re: The dreaded thread bug [was Re: Stable again?]
To: None <port-sparc64@NetBSD.org>
From: Geoff Adams <gsa-netbsd@alldestroying.com>
List: port-sparc64
Date: 10/27/2006 20:02:25
Thanks a lot for the great pointers.
I'm still working through a lot of the details, and figuring out a
lot of things. I've got more digging to do before I start asking the
really interesting questions, but I do have a couple:
- In trap.c (in both sparc and sparc64), mention is made of
mem_access_fault(), through which MMU-related traps go, instead of
trap(). But this function doesn't seem to exist. Is that a vestige of
a former design? I guess it's probably text_access_fault() and
data_access_fault(), now.
And, of course, the big question looming in my mind:
- If this is related to trap handling, why does this happen only when
executing threaded processes? Surely we take a similar number and
type of traps during execution of threaded and non-threaded
processes, and non-threaded processes can run for years, literally.
The handling for window overflows that occur as a result of a trap
(on pre-v9) must already be handled properly, since this will come up
fairly routinely in normal (non-threaded) execution, no?
I'm still reading nathanw's paper on Scheduler Activations in NetBSD,
but I suspect that the real difference here comes in how we pass
things up through an upcall. Or do we use more software traps to
maintain the various bits of thread state, and something's going
wrong somewhere in there?
I'll agree that the evidence does point to a window restore problem.
Still perusing locore.s and letting things percolate in my mind...
- Geoff