Manuel Bouyer <bouyer%antioche.eu.org@localhost> writes: > On Wed, Oct 26, 2011 at 08:15:44PM -0400, Greg Troxel wrote: > Yes, between 40 and 50MB/s ok, that matches what I see in the trace. >> What is between these two devices? Is this just a gigabit switch, or >> anything more complicated? > > they're all (the 2 NetBSD and the linux host) connected to a cisco 3750 > gigabit switch. I also tested with a single crossover cable, this doens't > change anything . OK - I've just seen enough things that are supposed to be transparant and aren't. > that's easy. And yes, I get better performances: 77MB/s instead of < 50. And does gluster then match ttcp, as in both 77? > So it looks like we have something wrong with TSO. > The traces are still at ftp://asim.lip6.fr/outgoing/bouyer/ > (netbsd-{client,server}-notso.pcap.gz). > > Did you see the reordering in the ttcp trace too ? There were some, but it seems not big enough to cause real problems. As long as TCP does fast recovery and doesn't go into timeout, things work ok enough that it's really hard to notice. > But, that still doesn't explain why I get good performances when one > of the host is linux. NetBSD used tso as well, and it didn't seem to cause > problems for linux ... Sure, but TCP performance is subtle and there are all sorts of ways things can line up to provoke or not provoke latent bugs. It seems likely that whatever bad behavior the tso option is causing is either doesn't bother the linux receiver in terms of the acks it sends, or the congestion window doesn't get big enough to trigger the tso bugs, or something else like that. You can't conclude much from linux/netbsd working well other than that things are mostly ok. > BTW, how is TSO working ? does the adapter get a single data block of > a full window size ? if so, maybe the transmit ring just isn't big > enough ... I have no idea. Also, is there receive offload? The receiver has packets arriving all together whereas they are showing up more spread out at the transmitter. It may be that reordering happens in the controller, or it may be that it happens at the receiver when the packets are regenerated from the large buffer (and then injected out of order). One thing to keep in mind is that the tcpdump timestamps are not when the packet arrives on the wire. They are the system time when the bpf call is made, which is in many drivers when the packet's pointers are loaded into the transmit ring. >> thrashing. What happens if you change gluster to have smaller buffers I would do this experiment; that may avoid the problem. I'm not suggesting that you run this way forever, but it will help us understand what's wrong. >> (I don't understand why it's ok to have the FS change the tcp socket >> buffer options from system default)? > > Because it knows the size of its packets, or its internal receive buffers ? This is TCP, so gluster can have a large buffer in user space independently of what the TCP socket buffer is. People set TCP socket buffers to control the advertised window and to balance throughput on long fat pipes with memory usage. In your case the RTT is only a few ms even under load, so it wouldn't seem that huge buffers are necessary. Do you have actual problems if gluster doesn't force the buffer to be large? (That said, having buffers large enough to allow streaming is generally good. But if you need that, it's not really about one user of TCP. I have been turning on net.inet.tcp.recvbuf_auto = 1 net.inet.tcp.sendbuf_auto = 1 net.inet6.tcp6.recvbuf_auto = 1 net.inet6.tcp6.sendbuf_auto = 1 to let buffers get bigger when TCP would be blocked by socket buffer. In 5.1, that seems to lead to running out of mbuf clusters rather than reclaiming them (when there are lots of connections), but I'm hoping this is better in -current (or rather deferring looking into it until I jump to current). If you can get ttcp to show the same performance problems (by setting buffer sizes, perhaps), then we can debug this without gluster, which would help. Also, it would be nice to have a third machine on the switch and run tcpdump (without any funky offload behavior) and see what the packets on the wire really look like. With the tso behavior I am not confident that either trace is exactly what's on the wire. Have you seen: http://gnats.netbsd.org/42323
Attachment:
pgpqxkZmw2mUP.pgp
Description: PGP signature