Subject: kern/3508: [dM] ipforward_rt cache broken
To: None <gnats-bugs@gnats.netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: netbsd-bugs
Date: 04/17/1997 15:52:57
>Number: 3508
>Category: kern
>Synopsis: [dM] ipforward_rt cache broken
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Apr 17 13:05:02 1997
>Last-Modified:
>Originator: der Mouse
>Organization:
Dis-
>Release: 1.2_BETA (code inspection implies also in -current)
>Environment:
Any (noticed on a SPARC IPC)
>Description:
ipforward_rt, which appears to be a size-one cache for IP
packet forwarding, can produce broken routing.
In this particular case, a NetBSD/sparc machine is on a local
Ethernet; let's say its address there is 123.45.6.7, and its
default route is to 123.45.6.1. A PPP user exists, whose
remote address is always (say) 123.45.9.1. For reasons not
relevant here, the machine is always advertising routes to
123.45.9.* even when the PPP link is down - when it's down,
packets for 123.45.9.1 just bounce back and forth between
123.45.6.7 and 123.45.6.1 until their TTL expires.
When the PPP user dials in and causes ppp0 to come up, a route
is (correctly) installed, pointing 123.45.9.1 down ppp0. The
problem is, if ipforward_rt happens to hold 123.45.9.1's route
out the local ethernet to 123.45.6.1, packets for 123.45.9.1
will still take that route even though that is not the current
route. Having the machine attempt to forward a packet to any
other address promptly cures the problem.
>How-To-Repeat:
See above. It's not hard to provoke this deliberately - find a
machine with no other forwarding traffic, ping a host through
(not from) it, change the routing table in a way that affects
that host's route, ping/traceroute again, and notice that the
old route is still used. Cause the machine to forward a packet
for any other address and retry, and notice it's magically
fixed itself.
>Fix:
Not sure what the right fix is. Since on a non-busy machine
the load from routing lookups is low, and on a busy machine it
seems reasonably likely that a cache as small as one will be
missed more often than not, so I'd be tempted to remove the if
entirely and always do the lookup. Alternatively I'd add a
heartbeat timer that clears that cache reasonably often wrt the
sort of timescale on which routes appear and disappear, but
seldom wrt inter-packet arrival times - from 1 to 0.1 Hz seems
reasonable to me.
The _right_ fix would probably be to explicitly clear that
cache every time the routing tables get changed, or perhaps
keep a routing-table generation count and have ip_forward
ignore the cache if the routing table generation has changed.
For this machine I'll prolly just toss the cache entirely and
always do the lookup. Does anyone have stats on the hit rate
of that cache? I didn't see any instrumentation in the code.
This code has changed some between 1.2 and -current, but
reading the code makes me think the bug is probably present in
-current as well.
der Mouse
mouse@rodents.montreal.qc.ca
7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
>Audit-Trail:
>Unformatted: