NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
The following reply was made to PR port-i386/38935; it has been noted by GNATS.
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: port-i386-maintainer%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost,
netbsd-bugs%NetBSD.org@localhost
Subject: Re: port-i386/38935: x86 bus_dmamap_sync() should use memory barrier
Date: Thu, 12 Jun 2008 13:48:50 +0200
On Tue, Jun 10, 2008 at 09:20:02PM +0000, Andrew Doran wrote:
> On Tue, Jun 10, 2008 at 06:30:00PM +0000, Manuel Bouyer wrote:
>
> > the x86 implementation of bus_dmamap_sync() turn into a NOP if
> > we're not using bouce-buffers (which is the common case).
> > Yet some DMA devices rely on the read or write operations
> > happening in order (e.g. when the driver can add DMA descripors to
> > a queue while the device may also be looking at the queue).
> > Typical examples are the uhci and ehci drivers (but it's not
> > the only example in our tree). Without appropriate memory barrier,
> > the CPU's write buffers can reorder writes in a may that makes
> > the DMA descriptors in an inconsistent state from the device's view.
>
> Writes are never reordered by the CPU unless the target memory region has
> been set as one of the weakly ordered types using an MTRR or PAE equivalent.
> For write-combining regions it could happen but that should never be the
> case for memory mapped for a device. In theory it can also happen with a
> write-back (Intel ordering) region if you use the SSE instructions, but we
> only use those in kernel for zeroing free pages.
>
> The tech docs from Intel are really confused about this issue, but the
> memory ordering rules are (in short):
>
> - loads can be reordered around other loads in any direction
> - a load after a store in program order can be reordered to occur before
> the store
> - a load before a store in program order _cannot_ be reordered to occur
> after the store
> - a store is strongly ordered with respect to all other stores
> - locked transactions prevent all reordering and flush store buffers
>
> amd64 is (was?) pretty much the same, except loads are never reordered
> around stores.
I've been rereading chapter 7 "memory system" of the "AMD64 Architecture
Programmer's Manual Volume 2 (rev 3.13)". Actually loads can be reordered
before stores from the memory POV (which is what matters for a bus-master
device), because of the write buffers. A load can fetch the data directly
from the write buffer, before the data is actually in cache or memory.
A mfence or locked instruction is needed to prevent that.
there is also a sentence in "AMD64 Architecture Programmer's Manual Volume 3
(rev 3.13)" about lfence and mfence that I don't really understand:
"The MFENCE instruction is weakly-ordered with respect to data and instruction
prefetches.
Speculative loads initiated by the processor, or specified explicitly using
cache-prefetch instructions, can be reordered around an MFENCE."
and
"Speculative loads initiated by the processor, or specified explicitly using
cache-prefetch instructions, can be reordered around an LFENCE."
so I'm wondering if mfence/lfence is really what we need here, or
if we'd need a locked instruction.
> > @@ -844,6 +854,24 @@
> > printf("unknown buffer type %d\n", cookie->id_buftype);
> > panic("_bus_dmamap_sync");
> > }
> > +end:
> > + if ((ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) &&
> > + (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD))) {
> > + /* both read and write, we need a full barrier */
> > + x86_mfence();
> > + } else if (ops & (BUS_DMASYNC_PREWRITE|BUS_DMASYNC_POSTWRITE)) {
> > + /*
> > + * all past writes should have completed before this point,
> > + * and futur writes should not have started yet.
> > + */
> > + x86_sfence();
> > + } else if (ops & (BUS_DMASYNC_PREREAD|BUS_DMASYNC_POSTREAD)) {
> > + /*
> > + * all past reads should have completed before this point,
> > + * and futur reads should not have started yet.
> > + */
> > + x86_lfence();
> > + }
>
> I think x86_lfence() without any if() would be enough. Does it solve the
> problem for you?
As a load can occurs before a store hit memory, I think we need
x86_mfence() for all write cases, in fact. Assuming mfence and lfence really
prevent speculative loads.
--
Manuel Bouyer, LIP6, Universite Paris VI.
Manuel.Bouyer%lip6.fr@localhost
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index