tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: No buffer space available
Network hangs are insidious. [old fart story time]
The headscratcher for me was the one in the 1990's at apple.com (when apple.com
was a DEC VAX-8650 running 4.3BSD) that led me to discover TCP_SYN attacks and
report that to the CERT two years before panix.com was attacked in the same
way. Problem: far too limited initial TCP SYN queue length (5!), and when the
short queue was full, any new TCP connection attempts to that port failed from
"connection timed out" (SYN packet inbound dropped because queue for that port
is full), despite ping (ICMP) working fine.
Imagine:
"telnet localhost 25" gives "connection timed out" (wait, what? How is that
possible?)
kill sendmail (yeah, we used sendmail back then)
telnet localhost 25 gives "connection refused" (OK, as expected)
restart sendmail
telnet localhost 25 gives "connection timed out" (WTF?!!)
Rebooting the VAX didn't clear the problem either - same behavior afterwards.
That's when I went looking to our routers to see if anything was wrong with the
rest of our connections to the Internet.
The source of my problem was warring "default" routes in a pair of our
exterior-facing Cisco routers (round & round a class of outbound packets went
until TTL exceeded), but because the routers carried about 2/3rds of the full
"default-free" Internet routing table at the time, we didn't immediately notice
that we couldn't talk to 1/3rd of the Internet. Of course, they could all still
send packets to us ... which is how the TCP SYN queue got full: our SYN_ACKs
weren't getting out to that 1/3rd, and with the SYN queue full (and a
two-minute timeout), suddenly SMTP stops accepting any other connection
attempts.
Once I found the default route loop, I fixed it, and then watched the load on
apple.com shoot up as the Internet started actually being able to speak to our
SMTP server again.
My report to the CERT (then at CMU SEI) came out of first "how did this
happen?", followed by, "wow, I could send five or six packets every two minutes
with totally random non-responsive (non-existant!) IP source addresses to any
particular host/TCP port combination and stop that host from being able to
respond on that port! I could shut down E-mail at AOL! Moo hah hah! Oh, and,
yeah, just try to trace & stop me, I dare you." [the CERT did nothing with my
report, alas. I quietly provided it to friends at SGI and a few other places]
I also sent a somewhat oblique message to the IETF mailing list, asserting that
a class of one-way (bidirectional communication not required) attacks existed,
and that ISP ingress filtering of customer IP source addresses was the only way
we'd be able to both forestall them, and trace them. That's a BCP now, but Phil
Karn flamed me at the time for wanting to break one mode of mobile-IP. I wasn't
graphic or explicit because that list was public, and I didn't want to provide
a recipe for any would-be attackers until both the ingress filtering was
deployed, and the OS companies had fixed their TCP implementations.
This all got fixed a few years later after Panix.com was attacked (though
nowhere near as elegantly - they were really massively flooded) with the TCP
SYN queue system we now have in NetBSD and all other responsible OSes.
The Internet is a pretty hostile network.
[/old fart story time]
How this relates: as noted in PR/7285, we have a semantic problem with our
errors from the networking code: ENOBUFS (55) is returned for BOTH mbuf
exhaustion, AND for "network interface queue full" (see the IFQ_MAXLEN,
IFQ_SET_MAXLEN(), IF_QFULL() macros in /usr/include/net/if.h, and then in the
particular network interface driver you use).
TCP is well-behaved: it just backs off and retransmits when it hits a condition
like that, and your application probably never hears about it - though it may
experience the condition as a performance degradation as TCP backs off.
UDP, not so much.
If your UDP-based applications are reporting that error, they're probably not
doing anything active/adaptive about it. Some human is expected to analyze the
situation and "deal with it" somehow. Lucky you, human. It might be time for
you to recapitulate the TCP congestion measurement and backoff algorithms in
your UDP application (good luck with that well-trod path to tears). Or just
convert to TCP. Or ... fix your network (stack? interface? media? switches?),
if you can figure out what's actually wrong.
The bad part is that without a distinct error message for "queue full", I can't
tell you whether you really are running out of mbufs (though netstat -m will
tell you if you've ever hit the limit, and netstat -s will tell you about some
queues on a per-protocol basis, but I don't see counters for network interfaces
in there, as there probably should be), or whether you're overrunning the
network interface output queue limit, whatever that is.
In both cases, your application should take such an error as a message to back
off and retransmit "later" (like TCP does).
The trouble with a network interface output queue full error is that it could
be that your application is just plain transmitting faster than the network
interface can physically go (and good luck finding that datum from the Unix
networking API), or your interface has been flow-controlled due to congestion
(modern gigabit Ethernet switches do that now), or, worse, the driver really is
hanging in some odd state "for a while" (missed interrupt, perhaps? other
hardware hiccup?) and the packets are piling up until the queue is full.
You seem to think it's that last, and it could well be - but I think you're
going to have to instrument some code to catch it in the act to be able to
really figure this out and be sure of your analysis.
We really should fix PR/7285 properly with the required API change: a new error
code allocated at least amongst the BSD's, though we ought to get Linux on
board, too (I haven't looked, but I bet they have the same problem).
An aside: one of my favorite network heartbeat monitoring tools is Network Time
Protocol (NTP), because it (politely) polls its peers, and keeps very careful
track of both packet losses, and transit times. Just looking at an "NTP
billboard" (ntpq -p) can tell you quite a lot about the health of your network,
depending upon which peers/servers you configure.
I hope this is of some use towards solving your problem,
Erik <fair%netbsd.org@localhost>
Home |
Main Index |
Thread Index |
Old Index