Subject: Recent nbftp panic analysis
To: None <tech-net@netbsd.org>
From: Charles M. Hannum <abuse@spamalicious.com>
List: tech-net
Date: 02/14/2005 09:01:33
The system died with "panic: tcp_output REXMT". I believe this is caused by a
race condition between the callout handler and tcp_input(). Specifically, if
we've just started to run tcp_persist_timer(), the flags for the persist
timer are CALLOUT_FIRED|CALLOUT_INVOKING; therefore, if we also happen to get
interrupted and receive a packet for the same connection, these tests:
else if (TCP_TIMER_ISARMED(tp,
TCPT_PERSIST) == 0)
TCP_TIMER_ARM(tp, TCPT_REXMT,
tp->t_rxtcur);
} else if (TCP_TIMER_ISARMED(tp, TCPT_PERSIST) == 0)
TCP_TIMER_ARM(tp, TCPT_REXMT, tp->t_rxtcur);
will fail to notice that we're using the persist timer (because we have a 0
window) and will arm the rexmt timer. When we return to tcp_persist_timer(),
it will try to *also* reset the persist timer, and we panic.
The simplest thing to do here is to change TCP_TIMER_ISARMED; to wit:
#define TCP_TIMER_ISARMED(tp, timer)\
- callout_pending(&(tp)->t_timer[(timer)])
+ (callout_pending(&(tp)->t_timer[(timer)]) || \
+ callout_invoking(&(tp)->t_timer[(timer)]))
However, I've thought for a long time that it might make more sense to have a
separate state variable/bit to keep track of which of the two timers we're
using at a given time (really, whether we have a 0 window). The logic would
be much less crufty that way.