So, I'm wondering: this makes 15.6ms the minimum value that can be represented, given the odd <= 0 test you pointed out elsewhere in the code (the math doesn't appear to be able to cope with a "0" RTT anyway, even were we able to arrange for it to never go negative (how does it go negative?) but allow 0 values). Even regional networks (for example, Columbia to MIT) yield RTTs of about 7ms -- 1/2 the smallest value we can represent. That's funny; BBN to MIT is 80ms, but that's because MIT routes to the west coast on an academic testbed. Before, it was more like 20ms, because it went to NY and back. But I see your point. This suggests to me that with this representation, even with bugs fixed, many, many cases of interest the RTT will have no effect at all on our network stack. Still more precision is required. I had similar thoughts, but you are missing two points which cause this not to really matter: The minimum RTO is longish, I think 1s, regardless of estimates. Currently, we are measuring RTT in slow ticks. Given that, 15.6ms is plenty. The real problem now is that RTT measurements that should be often 0 and sometimes 1 slow ticks are being computed as often 1 and sometimes 2. Fixing that makes a real difference. A minor problem is that the lowest the EWMA will get to following a 1 sample, even with all 0 samples, is storage-rep 8, which is 124 ms, because 8>>3 is 0. With the rounding fix, that becomes storage-rep 4 and 62 ms. Estimating RTT really well in super-low delay environments probably doesn't matter much, because timeouts will be few (due to delays more than drops). And, since any drops would be due to big congestion spikes, and buffering is huge compared to unloaded RTT, the odds of a delayed packet before srtt+4*rttvar seems high, and the point of Van's 1988 paper referenced in RFC2988 is that it's very important that retranmissions not happen for non-lost segments, and then given that it's important that they be as prompt as possible. Also, we wrote a monitoring framework which has TCP put internal state like srtt values, cwnd, etc plus note about why it's doing things in a bpf buffer, and modified xplot to show this. So I think really understanding what's going on with RTT estimation is in order before making big changes - right now I'm just fixing old bugs. But, our monitoring framework was key to finding them and understanding their impact.
Attachment:
pgpsNHyhNCr9J.pgp
Description: PGP signature