Source-Changes archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
CVS commit: src/sys/arch/x86/x86
Module Name: src
Committed By: riastradh
Date: Fri Aug 12 13:44:12 UTC 2022
Modified Files:
src/sys/arch/x86/x86: bus_dma.c
Log Message:
x86: Adjust fences issued in bus_dmamap_sync after bouncing.
And expand the comment on the lfence for POSTREAD before bouncing.
Net change:
op before bounce after bounce
old new
PREREAD nop lfence sfence
PREWRITE nop mfence sfence
PREREAD|PREWRITE nop mfence sfence
POSTREAD lfence lfence nop[*]
POSTWRITE nop mfence nop
POSTREAD|POSTWRITE lfence mfence nop[*]
The case of PREREAD is as follows:
1. loads and stores before DMA buffer may be allocated for the purpose
2. bus_dmamap_sync(BUS_DMASYNC_PREREAD)
3. store to register or DMA descriptor to trigger DMA
The register or DMA descriptor may be in any type of memory (or I/O).
lfence at (2) is _not enough_ to ensure stores at (1) have completed
before the store in (3) in case the register or DMA descriptor lives
in wc/wc+ memory, or the store to it is non-temporal: in that case,
it may executed early before all the stores in (1) have completed.
On the other hand, lfence at (2) is _not needed_ to ensure loads in
(1) have completed before the store in (3), because x86 never
reorders load;store to store;load. So we may need to enforce
store/store ordering, but not any other ordering, hence sfence.
The case of PREWRITE is as follows:
1. stores to DMA buffer (and loads from it, before allocated)
2. bus_dmamap_sync(BUS_DMASYNC_PREWRITE)
3. store to register or DMA descriptor to trigger DMA
Ensuring prior loads have completed is not necessary because x86
never reorders load;store to store;load (and in any case, the device
isn't changing the DMA buffer, so it's safe to read over and over
again). But we must ensure the stores in (1) have completed before
the store in (3). So we need sfence, in case either the DMA buffer
or the register or the DMA descriptor is in wc/wc+ memory or either
store is non-temporal. But we don't need mfence.
The case of POSTREAD is as follows:
1. load from register or DMA descriptor notifying DMA completion
2. bus_dmamap_sync(BUS_DMASYNC_POSTREAD)
(a) lfence [*]
(b) if bouncing, memcpy(userbuf, bouncebuf, ...)
(c) ???
3. loads from DMA buffer to use data, and stores to reuse buffer
This certainly needs an lfence to prevent the loads at (3) from being
executed early -- but bus_dmamap_sync already issues lfence in that
case at 2(a), before it conditionally loads from the bounce buffer
into the user's buffer. So we don't need any _additional_ fence
_after_ bouncing at 2(c).
The case of POSTWRITE is as follows:
1. load from register or DMA descriptor notifying DMA completion
2. bus_dmamap_sync(BUS_DMASYNC_POSTWRITE)
3. loads and stores to reuse buffer
Stores at (3) will never be executed early because x86 never reorders
load;store to store;load for any memory types. Loads at (3) are
harmless because the device isn't changing the buffer -- it's
supposed to be fixed from the time of PREWRITE to the time of
POSTWRITE as far as the CPU can witness.
Proposed on port-amd64 last month:
https://mail-index.netbsd.org/port-amd64/2022/07/16/msg003593.html
Reference:
AMD64 Architecture Programmer's Manual, Volume 2: System Programming,
24593--Rev. 3.38--November 2021, Sec. 7.4.2 Memory Barrier Interaction
with Memory Types, Table 7-3, p. 196.
https://www.amd.com/system/files/TechDocs/24593.pdf
To generate a diff of this commit:
cvs rdiff -u -r1.85 -r1.86 src/sys/arch/x86/x86/bus_dma.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Home |
Main Index |
Thread Index |
Old Index