tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: NetBSD 5.1 TCP performance issue (lots of ACK)
On Wed, Nov 23, 2011 at 12:12:05PM +0100, Manuel Bouyer wrote:
> On Tue, Nov 22, 2011 at 03:10:52PM -0800, Dennis Ferguson wrote:
> > [...]
> > You are assuming the above somehow applied to Intel CPUs which existed
> > in 2004, but that assumption is incorrect. There were no Intel (or AMD)
> > CPUs which worked like that in 2004, since post-2007 manuals document the
> > ordering behavior of all x86 models from the 386 forward, and explicitly
> > says that none of them have reordered reads, so the above could only a
> > statement of what they expected future CPUs might do and not what
> > they actually did.
>
> This is clearly not my experience. I can say for sure that without lfence
> instructions, the xen front/back drivers are not working properly
> (and I'm not the only one saying this).
Are the xen front-/back-end drivers otherwise correct? I.e., using
volatile where they ought to? wm(4) definitely does *not* use volatile
everywhere it ought to, and I've just found out that that explains this
bug.
I've just tried the same experiment on the netbsd-5 branch. The
compiler generates different assembly for wm_rxintr() before and after.
The before-assembly definitely loads wrx_len before wrx_status, which is
wrong; the after-assembly loads wrx_status, first. So we can explain
the wm(4) bug with re-ordering of reads by the compiler, not the CPU.
(BTW, in -current, when I added volatile to the rx descriptor
members and recompiled, the compiler generated the same assembly for
wm_rxintr(). Makes me wonder, does the newer GCC in -current cover a
lot of bugs?)
> > This is clear in the post-2007 revision I have, where the section you quote
> > above now says:
>
> It also says that we should not rely on this behavior and, for compatibility
> with future processors programmers should use memory barrier instructions
> where appropriate.
Agreed.
> Anyway, what prompted this discussion is the added bus_dmamap_sync()
> in thw wm driver. It's needed because:
> - we may be using bounce buffering, and we don't know in which order
> the copy to bounce buffer is done
> - all the world is not x86.
I agree strongly with your bullet points, and I think that by the same
rationale, we need one more bus_dmamap_sync(). :-)
Maybe I do not remember correctly, but I thought that the previous
discussion of how many _sync()s to use, where they should go, and why,
left off with me asking, "what do you think?" I do really want to know!
Dave
--
David Young
dyoung%pobox.com@localhost Urbana, IL (217) 721-9981
Home |
Main Index |
Thread Index |
Old Index