Subject: Re: Networking question MTU on non-local nets
To: None <port-macppc@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: port-macppc
Date: 06/15/2003 01:07:21
> This is wandering seriously off-topic, but still very
> interesting.....
Well, a off-topic for port-macppc, maybe. Maybe we should mvoe it to
tech-net? (I'm already there, so feel free to just change the
to-address in your reply if you agree.)
>>> 22:50:05.864796 192.168.0.38.49815 > mercy.icompute.com.http: S 1026952240:1026952240(0) win 32768 <mss 1460,nop,wscale 0,nop,nop,timestamp 707979176 0> (DF)
>>> 22:50:06.123198 mercy.icompute.com.http > 192.168.0.38.49815: S 2847775046:2847775046(0) ack 1026952241 win 16384 <mss 1414,nop,wscale 0,nop,nop,timestamp 17981721 707979176>
>>> 22:50:06.123306 192.168.0.38.49815 > mercy.icompute.com.http: . ack 1 win 33648 <nop,nop,timestamp 707979176 17981721> (DF)
>>> 22:50:06.127597 192.168.0.38.49815 > mercy.icompute.com.http: P 1:225(224) ack 1 win 33648 <nop,nop,timestamp 707979176 17981721> (DF)
>>> 22:50:06.397694 mercy.icompute.com.http > 192.168.0.38.49815: . 1:993(992) ack 225 win 17520 <nop,nop,timestamp 17981722 707979176> (frag 8006:1024@0+)
>>> 22:50:06.397704 mercy.icompute.com > 192.168.0.38: (frag 8006:422@1024)
>>> 22:50:06.444745 192.168.0.38.49815 > mercy.icompute.com.http: . ack 1415 win 33648 <nop,nop,timestamp 707979177 17981722> (DF)
>>> 22:50:06.705762 mercy.icompute.com.http > 192.168.0.38.49815: . 1415:2407(992) ack 225 win 17520 <nop,nop,timestamp 17981722 707979176> (frag 8007:1024@0+)
>>> 22:50:06.718277 mercy.icompute.com.http > 192.168.0.38.49815: . 2829:3821(992) ack 225 win 17520 <nop,nop,timestamp 17981722 707979176> (frag 8008:1024@0+)
>>> 22:50:07.761468 mercy.icompute.com.http > 192.168.0.38.49815: . 1415:2407(992) ack 225 win 17520 <nop,nop,timestamp 17981724 707979176> (frag 8009:1024@0+)
>>> 22:50:10.761691 mercy.icompute.com.http > 192.168.0.38.49815: . 1415:2407(992) ack 225 win 17520 <nop,nop,timestamp 17981730 707979176> (frag 8010:1024@0+)
[quoted in full to have the information at hand in the future]
> [traceroute from client side]
>>>>>traceroute to www.qdea.com (209.46.8.67), 30 hops max, 40 byte packets
>>>>> 1 192.168.0.1 (192.168.0.1) 0.795 ms 0.57 ms 0.527 ms
>>>>> 2 * * *
[27 more lines of "* * *" snipped]
>>>>> 30 * * *
> Useful, huh?
Immensely. :-þ
> The fact that the traceroute works (fro my side) is a good sign if I
> am hoping to get PMTU-D working.
Indeed it is. Below, I present further evidence that if it doesn't
work, the problem is on your side, where you can in principle fix it.
>> [...back-to-back....]
>>> 22:50:06.397694 [...] (frag 8006:1024@0+)
>>> 22:50:06.397704 [...] (frag 8006:422@1024)
>> ...interesting. [...350µs at 10MBit...35µs at 100...only 10µs
>> apart...]
> I seriously doubt that the fragmentation is happening on this
> continent. I'm betting that it's happening somewhere in Japan.
I'd guess that even without seeing the timings; whenever I've seen such
problems, the low-MTU link has been close to the client.
(Comparatively more clients than servers are behind PPPoE, VPNs, and
suchlike MTU-lowering things.)
But I agree with you; the fragmentation point is almost certainly in
Japan in this case.
If your traceroute supports -P (basically, this makes traceroute do its
own PMTU-D), you might try that. On the theory that the low-MTU link
probably is close to the client, I did a traceroute -P from my own
machine to the name you quote yourself as tracerouting to, and...
[Sparkle] 864> traceroute -P pddf654.tkyoac00.ap.so-net.ne.jp
traceroute to pddf654.tkyoac00.ap.so-net.ne.jp (218.221.246.84), 30 hops max, 17914 byte packets
message too big, trying new MTU = 1500
1 Stone (216.46.5.9) 7.167 ms 5.946 ms *
2 core-04.openface.ca (216.46.14.121) 54.841 ms 52.423 ms 52.056 ms
3 bob.openface.ca (216.46.1.1) 51.988 ms 51.767 ms 51.595 ms
4 doug.openface.ca (216.46.1.16) 52.112 ms 52.349 ms 57.553 ms
5 border-peer1.openface.ca (216.46.0.245) 154.647 ms 54.558 ms 57.028 ms
6 openface-gw.peer1.net (65.39.144.129) 53.590 ms 54.071 ms 53.243 ms
7 Gig4-0.mtl-gsr-a.peer1.net (216.187.90.229) 54.545 ms 68.725 ms 54.515 ms
8 OC48POS0-0.nyc-gsr-b.peer1.net (216.187.123.234) 62.799 ms 77.577 ms 63.227 ms
9 GIG1-0.wdc-gsr-a.peer1.net (216.187.123.226) 67.580 ms 68.829 ms 68.616 ms
10 ge-2-3-0.r02.asbnva01.us.bb.verio.net (206.223.115.112) 68.804 ms 67.953 ms 70.125 ms
11 p16-0-1-2.r21.asbnva01.us.bb.verio.net (129.250.2.62) 74.460 ms 69.896 ms 69.524 ms
12 p16-5-0-0.r01.mclnva02.us.bb.verio.net (129.250.2.180) 69.997 ms 70.349 ms 72.354 ms
13 p16-7-0-0.r02.mclnva02.us.bb.verio.net (129.250.5.10) 71.260 ms 71.407 ms 70.120 ms
14 p16-0-1-2.r20.plalca01.us.bb.verio.net (129.250.2.192) 127.981 ms 128.502 ms 129.232 ms
15 xe-0-2-0.r21.plalca01.us.bb.verio.net (129.250.4.231) 128.421 ms 127.746 ms 144.905 ms
16 p64-0-0-0.r21.snjsca01.us.bb.verio.net (129.250.5.49) 128.569 ms 137.866 ms 128.656 ms
17 p16-1-1-0.r82.mlpsca01.us.bb.verio.net (129.250.3.195) 128.506 ms 129.103 ms 128.873 ms
18 p16-0-2-0.r21.tokyjp01.jp.bb.verio.net (129.250.4.158) 243.949 ms 245.087 ms 244.665 ms
19 xe-1-1-0.r20.tokyjp01.jp.bb.verio.net (129.250.3.233) 243.093 ms 242.310 ms 242.633 ms
20 ge-3-0-0.a10.tokyjp01.jp.ra.verio.net (61.213.162.76) 230.151 ms 229.910 ms 230.652 ms
21 61.120.146.230 (61.120.146.230) 230.672 ms ge-3-0-0.a10.tokyjp01.jp.ra.verio.net (61.213.162.76) 243.620 ms 241.961 ms
22 61.120.146.230 (61.120.146.230) 242.440 ms 242.022 ms note-13Gi0-0-0.net.so-net.ne.jp (61.211.63.133) 230.177 ms
23 61.211.63.247 (61.211.63.247) 232.482 ms 232.314 ms 234.189 ms
24 61.211.63.247 (61.211.63.247) 232.197 ms 231.845 ms 233.382 ms
25 61.211.63.247 (61.211.63.247) 234.610 ms
fragmentation required and DF set, next hop MTU = 1454
25 pddf654.tkyoac00.ap.so-net.ne.jp (218.221.246.84) 269.161 ms 269.644 ms 269.784 ms
[Sparkle] 865>
If this is to be believed, the low-MTU link is the very last hop. I
really wonder what's with hops 21/22 and 23/24; the way different
gateways respond on lines 21 and 22, it appears there is some kind of
variant routing going on - loadsharing, maybe.
>> A more detailed description of the client-side network might help:
>> where is the low-MTU link, what hardware is on each end of it, where
>> is the NAT being done, what speeds are the various pieces running
>> at, that sort of thing.
> As you can see above, getting that sort of data would be non-trivial.
> It's tough enough getting the gentleman in Japan to send us the bits
> of data he has.
:-( I misunderstood; I thought you actually had control over both ends
of the test connection.
Because my traceroute -P worked, I feel confident that the ICMP
unreachables necessary to drive PMTU-D are making it out from the
Japanese end of things. But it does look to me as though something is
broken on the client side; some but not all of the second frags making
it through - but all the first frags working - practically guarantees
that there is something wrong between the fragmentation point and the
endpoint; since the fragmentation point is right next to the endpoint
per my traceroute above, this means it's on that end.
I notice something else weird:
22:50:06.397694 mercy.icompute.com.http > 192.168.0.38.49815: . 1:993(992) ack 225 win 17520 <nop,nop,timestamp 17981722 707979176> (frag 8006:1024@0+)
22:50:06.397704 mercy.icompute.com > 192.168.0.38: (frag 8006:422@1024)
The packet was fragmented into 1024-octet and 422-octet pieces;
however, according to traceroute -P, the MTU is 1454. Normally,
fragmentation puts as much as possible in the first frag (which in this
case would lead to a second fragment with approximately 50 octets of
data). I could also see a fragmentation module that tried to produce
roughly equal fragment sizes, but that wasn't done either. I don't
know what the provenance is of the stack did the fragmentation, but
it's behaving rather unusually.
If it would help you, I can set up a machine deliberately behind a
low-MTU link we can run experiments with. (If you want to take me up
on that, off-list is probably best.)
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML mouse@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B