Subject: Re: Intel Atlantis motherboard cache woes
To: Kevin M. Lahey <kml@nas.nasa.gov>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: port-i386
Date: 06/22/1996 11:17:22
[ Kevin came by my office earlier and asked me about this, and I had
only speculations. But, I'm not an i386 gooro either, so I thought
he should post to the list. I will, however, share my speculations
here with the group. So, the disclaimer here is: "I don't know much
about the x68 cache architecutre _at all_, so don't laugh too loudly
if I look like an idiot." :-) --thorpej ]
On Fri, 21 Jun 1996 17:09:38 -0700
"Kevin M. Lahey" <kml@nas.nasa.gov> wrote:
> When I run with the 512KB pipeline burst cache COAST module,
> I get pretty horrible cache access results. In fact, I get the
> same results whether or not I have the COAST module installed.
> I've tried several different COAST modules with no success.
When the COAST is installed, does the BIOS recognize the cache? If the
results are the same, it sounds like it never gets enabled.
> When I run with a 256KB pipeline burst cache COAST module,
> the kernel panics with a page fault in supervisor mode,
> usually after core dumping on the compiles that start up
> lmbench. It is a little more robust when I run it at 133 MHz
> rather than 166 MHz, but it still seems to panic eventually.
Do you see messages like "data modified on free list", and the like? For
quick reference, see:
http://www.netbsd.org/cgi-bin/query-full-pr?1416
...I filed that last April after having some problems with caches on a
Pentium system.
> Any clues? Am I missing something obvious?
Well, here's my one speculation...this assumes that the cache is a
write-back type (which Kevin and I honestly don't know, since neither the
BIOS nor manual seemed willing to tell us...)
>From my experiences hacking on the SPARC port, I can't help but wonder if
address translation is using stale copies of the page tables.
This sort of thing happened to me on my SS10 (when working with Paul and
Aaron on some of the latter stages of getting the sun4m stuff ready for the
tree). The only ``solution'' (it's a hack, really) we could find was to
cache-inhibit the page, segment, and region table pages (since the magic
to tell the MMU to check the cache first didn't seem to be working on my
somewhat quirky SS10).
Now, another bit of SPARC expierence, from getting DMA working on the
sun4 `si' driver, tells me that one needs to be careful to flush the
write-back cache (like the one on the sun4/200) before doing DMA, though
the sun4 cache is virtually tagged, so handling it is going to be
somewhat different (since the DMA hardware actually uses the address
translation facilities of the sun MMU).
So, I guess my point is that we could be seeing a scenario like this:
- process forks
- new mappings for child get set up (we think we updated the
page tables properly, but the updates are in the w/b cache,
and haven't made it out to memory yet)
- access occurs at the address where we think we have a valid
mapping
- MMU attempts to translate that address, sees no valid mapping
cuz the page tables haven't updated
- *poof*
Like I said earlier, I could really be showing my ignorance of how the
cache and memory management hardware interact (if at all :-) on the x86,
so take this with a grain of salt. I'm just trying to provide food for
thought...
Say, Kevin... If you have DDB in that kernel, type "trace" at the db>
prompt, and jot it down... that could be helpful.
Ciao.
-- save the ancient forests - http://www.bayarea.net/~thorpej/forest/ --
Jason R. Thorpe thorpej@nas.nasa.gov
NASA Ames Research Center Home: 408.866.1912
NAS: M/S 258-6 Work: 415.604.0935
Moffett Field, CA 94035 Pager: 415.428.6939