Subject: Re: On the subject of bus_dma(9)
To: Jason R Thorpe <thorpej@zembu.com>
From: Matthew Jacob <mjacob@feral.com>
List: tech-kern
Date: 03/07/2001 08:55:33
On Tue, 6 Mar 2001, Jason R Thorpe wrote:
> On Tue, Mar 06, 2001 at 02:55:53PM -0800, Matthew Jacob wrote:
>
> > "syncing" doesn't work here. It's mapping that does the right thing. And
> > bounce buffers for a major platorm isn't the right answer. The assumption
> > that a post-mapping 'sync' operation exists is a flawed assumption.
>
> I don't buy that -- if nothing else, you could invalidate the IOMMU PTEs
> and re-validate them!
You can't for things less than IOMMU page sized entities, and that's exactly
what this memory is for.
> Like I said, bus_dmamap_sync() MUST be what a driver
> uses to ensure "coherency" -- this has not changed since the very first
> bus_dma design document (which you even provided input on -- and this didn't
> come up as an issue then, and the UltraSPARC systems in question certainly
> existed at that time).
I provided input on this, and there were many things you didn't agree with.
One of them was the Solaris DDI DMA model differences which, btw, handles this
stuff fine.
> I think part of the problem here is that no one has been clear *at all*
> as to what the particular problems with the sparc64 even *are*.
>
> Is there a cache between the memory and the device? (The Sun 4/400 has
> such an "I/O cache", as well.) And you're saying that there is no way
> to invalidate this cache once the IOMMU mappings have been setup?
I'd like us to stick to architecture rather than implementation- but in this
case the issue is whether you map a page in the IOMMU for 'streaming' or 'byte
coherent' access. It's the mapping that establishes the attribute- not any
later 'flush' operation.
> > What I'm going to do is to parse your mail again and update the man page with
> > mention of the ordering dependency and bus_dmamap_load_raw usages- after all,
>
> Please let me make any manual page updates which may be necessary.
> Note, I'm saying that you *should not* be using bus_dmamap_load_raw().
Okay. Like I said- *I* didn't add this to isp_sbus.c- that was done by
somebody else.
>..
> Also, why won't this "fix" the sparc64? bus_dmamap_load() has to query
> the pmap for the physical address of the page to stuff into the IOMMU
> PTEs, and while it's there, it could certainly look for the TLB's "cache
> inhibit" bits, and set its own if they are set.
It's conceivable that in this way the implementation could follow the
architecture- if the architecture is clear and clearly documented.
If the ordering is as you said:
> bus_dmamap_create(...);
> bus_dmamem_alloc(...);
> bus_dmamem_map(...);
> bus_dmamap_load(...);
Then the requirement could conceivably met in this case for the implementation
in sparc64- that bus_dmamem_map has to be called in order to establish that
this memory must then be also byte-coherent in IOMMU mappings.
From an efficiency point of view just for this implementation, I'd want to
leave it that bus_dmamem_alloc'd memory *must* be BUS_DMA_COHERENT.
From an architectural point of view, there isn't sufficient hint as to what
*kind* of memory to allocate if you don't get the hint until *after* you
allocate the memory. You may have a platform where in order to provide byte
coherent access (and *not* require bounce, which is ridiculous, when DMA works
just fine if you program things sensibly) you have to allocate from a certain
pool. It seems wrong to not have the hint when you need it.
>...
> > So, the assumption here is that you have to have CPU
> > (BUS_DMA_COHERENT) mapping in order to infer you want
> > an byte-coherent IOMMU mapping. This doesn't handle the
> > case of two non-cpu devices sharing memory, but that is,
> > I would agree, a bit of an edge case.
>
> In fact, bus_dma(9) does not currently address peer-peer DMA at all. This
> is something I'm working on in my "idle loop" (along with other changes I
> want to make to the bus_dma(9) interface... and since I'd like to change
> it as few times as possible, I want to try and group them all together).
Fair enough.
>
> > So.. follow the logic here....
> >
> > The inference of all of this is that
> >
> > 1. Prior to a bus_dmamap_load, a bus_dmamem_map has to be done to give the
> > 'hint' that BUS_DMA_COHERENT is to be used.
>
> bus_dmamap_load(), as is described in the manual, requires a buffer mapped
> into either kernel virtual address space (proc argument == NULL) or a user
> process virtual address space (proc argument non-NULL). The *only* loading
> routine which does not require the buffer to be mapped into someone's address
> space is bus_dmamamp_load_raw(), and that routine is specifically designed
> to load pages that aren't in *anyone's* address space.
Okay- lacking the architectural hole about peer-peer, this is a valid
argument. The man page really needs to be clarified about this. There are
several engineers here who have lots of experience who really didn't follow
this. A good architecture is easy for reasonable engineers to follow.
>
> > 2. But real memory is allocated (in the order above) prior to the call to
> > bus_dmamem_map.
> >
> >
> > 3. Therefore, bus_dmamem_alloc'd memory must be assumed to be for
> > BUS_DMA_COHERENT purposes because there may be no way to 'change'
> > the identity of the memory alloc'd in bus_dmamem_alloc (it might be
> > from a pool that *can't* be made byte-coherent).
>
> bus_dmamem_alloc() for a particular bus_dma_tag_t *should never* allocate
> from memory which can not be made to DTRT for a particular bus. All this
> does it point to a bug in the sparc64 bus_dmamem_alloc() routines.
No - this is inflammatory, and you should be ashamed for saying it.
It's not a 'bug' in that there has been genuine disagreement and uncertainty
about how this is actually supposed to work.
I would put it as I've always put it throughout this discussion as I've done
above. And this should be documented as an explicit requirement because the
logic of the *architecture* (not the games you could play with
implementation) require it.
And, btw, just because you haven't handled peer-peer- you might at least
document the issue.
So- please reconsider letting me update the man page with some clarifications.
-matt