Subject: Re: bridge(4) and silent data corruption :-(
To: None <jonathan@dsg.stanford.edu>
From: Sean Doran <smd@ab.use.net>
List: tech-net
Date: 05/02/2002 12:15:46
| Agreed that ti's interesting. I would still like to see the output of
| ifconfig (check to see if outboard TCP/ip acceleration is enabled)
| and
| netstat -s -p tcp
| netstat -s -p ip
|
| on the machines involved. ssh sessions to or from the bridge itself
| would also be interesting.
A proper answer will have to wait until Sunday evening European time,
when I can move wires around to put the bridge in front of a machine,
but basically, iirc there was not terribly much interesting in the protocol
summary outputs (I was checking this myself). There were no problems
whatsoever doing large ssh2 transfers in or out of the bridge,
either to/from the local LAN (either side of the bridge), or to/from
the "world". Again, this was one reason I was trying to walk through
what bridge(4) is really doing, since it's weird that the only visible
symptom is corruption experienced by machines on the far side of the
router, when those machines are transferring data to/from the world.
Do you have any particular requests for things I should provide
the help diagnose this (someone might suggest "send-pr" :-) )
FWIW, the apples and NetBSD (February i386 SMP kernel) boxes
that are station-X are all running with hardware checksumming
enabled on their interfaces. I turned the bridge's hw checksumming
on and off as you suggested in earlier email, and it made
no difference. (I didn't think about turning off the stations'
hw checksumming).
Sean.
ps - the bridge is up and running, but there's nothing on the far side
of the bridge (my laptoy is with me on the road, and i didn't want
my other boxes suffering from network data corruption)
ex0 is the side of the bridge closest to the router
putting something on ex1 results in data corruption
swapping ex0 and ex1, and putting something on ex0 results
in data corruption
ex0: flags=8b63<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
address: 00:04:76:de:ba:da
media: Ethernet 10baseT
status: active
inet6 ex0 prefixlen 64 scopeid 0x1
ex1: flags=8b63<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
address: 00:04:75:80:72:a2
media: Ethernet 10baseT
status: no carrier
inet xxx.xxx.xxx.xxx netmask 0xfffffff8 broadcast xxx.xxx.xxx.xxx
inet6 ex1 prefixlen 64 scopeid 0x2
These netstats are from the bridge.
I don't think netstats from the clients would be meaningful at
this point, since they haven't been behind the bridge for a while.
ip:
11514641 total packets received
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
0 with length > max ip packet size
0 with header length < data size
0 with data length < header length
0 with bad options
0 with incorrect version number
0 fragments received
0 fragments dropped (dup or out of space)
0 malformed fragments dropped
0 fragments dropped after timeout
0 packets reassembled ok
17468 packets for this host
0 packets for unknown/unsupported protocol
11481771 packets forwarded (0 packets fast forwarded)
2225 packets not forwardable
0 redirects sent
21234 packets sent from this host
4 packets sent with fabricated ip header
0 output packets dropped due to no bufs, etc.
99 output packets discarded due to no route
0 output datagrams fragmented
0 fragments created
0 datagrams that can't be fragmented
47 datagrams with bad address in header
1714 packets sent
1610 data packets (126274 bytes)
13 data packets (15618 bytes) retransmitted
76 ack-only packets (1130 delayed)
0 URG only packets
0 window probe packets
11 window update packets
4 control packets
0 send attempts resulted in self-quench
2144 packets received
1108 acks (for 125920 bytes)
48 duplicate acks
0 acks for unsent data
1134 packets (66356 bytes) received in-sequence
24 completely duplicate packets (416 bytes)
0 old duplicate packets
0 packets with some dup. data (0 bytes duped)
0 out-of-order packets (0 bytes)
0 packets (0 bytes) of data after window
0 window probes
0 window update packets
0 packets received after close
0 connection requests
11 connection accepts
11 connections established (including accepts)
9 connections closed (including 6 drops)
0 embryonic connections dropped
1097 segments updated rtt (of 1087 attempts)
4 retransmit timeouts
0 connections dropped by rexmit timeout
0 persist timeouts (resulting in 0 dropped connections)
9 keepalive timeouts
8 keepalive probes sent
1 connection dropped by keepalive
0 correct ACK header predictions
676 correct data packet header predictions
586 PCB hash misses
282 dropped due to no socket
0 connections drained due to memory shortage
0 bad connection attempts
11 SYN cache entries added
0 hash collisions
11 completed
0 aborted (no space to build PCB)
0 timed out
0 dropped due to overflow
0 dropped due to bucket overflow
0 dropped due to RST
0 dropped due to ICMP unreachable
0 SYN,ACKs retransmitted
0 duplicate SYNs received for entries already in the cache
0 SYNs dropped (no route or no space)
the only interesting netstat output is from the laptoy, which
has also been travelling around to other places in the network
since the last boot, so i don't know how much is related to
the bridge and how much is related to weird connectivity on the road.
better figures sunday...
ip:
299731 total packets received
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
0 with header length < data size
0 with data length < header length
0 with bad options
0 with incorrect version number
56 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped after timeout
28 packets reassembled ok
282751 packets for this host
31 packets for unknown/unsupported protocol
0 packets forwarded (0 packets fast forwarded)
16919 packets not forwardable
2 packets received for unknown multicast group
0 redirects sent
212085 packets sent from this host
0 packets sent with fabricated ip header
0 output packets dropped due to no bufs, etc.
15 output packets discarded due to no route
188 output datagrams fragmented
376 fragments created
0 datagrams that can't be fragmented
tcp:
177489 packets sent
36984 data packets (21810409 bytes)
499 data packets (461816 bytes) retransmitted
0 resends initiated by MTU discovery
48179 ack-only packets (20549 delayed)
0 URG only packets
0 window probe packets
89283 window update packets
2555 control packets
251183 packets received
31985 acks (for 21393360 bytes)
2464 duplicate acks
0 acks for unsent data
211208 packets (274736280 bytes) received in-sequence
644 completely duplicate packets (888856 bytes)
3 old duplicate packets
12 packets with some dup. data (9648 bytes duped)
24501 out-of-order packets (34816254 bytes)
2 packets (2 bytes) of data after window
2 window probes
150 window update packets
29 packets received after close
342 discarded for bad checksums
0 discarded for bad header offset fields
0 discarded because packet too short
986 connection requests
910 connection accepts
0 bad connection attempts
0 listen queue overflows
1883 connections established (including accepts)
2806 connections closed (including 1450 drops)
44 connections updated cached RTT on close
44 connections updated cached RTT variance on close
22 connections updated cached ssthresh on close
7 embryonic connections dropped
31985 segments updated rtt (of 29266 attempts)
281 retransmit timeouts
10 connections dropped by rexmit timeout
0 persist timeouts
0 connections dropped by persist timeout
11 keepalive timeouts
0 keepalive probes sent
3 connections dropped by keepalive
2163 correct ACK header predictions
188277 correct data packet header predictions