Subject: Performance issue in tcp_reass()
To: None <tech-net@netbsd.org>
From: None <ragge@ludd.luth.se>
List: tech-net
Date: 04/19/2004 16:45:45
In tcp_reass() the routine m_cat() is frequently called to append
received mbufs to the end of a received chain. If the chain of
received packets become long, the machine may become unusable for
a long time if packets are lost. For example;
If running at around 3Gbit with a window size of 100Mbyte (RTT 200ms)
and one packet is lost, the machine will lock up for a loong time or
until the sending process is terminated. This is because of that
m_cat() is called at softintr level, and as long as new packets are
received acks are sent and new packets are clocked out.
The attached diff solves this problem by making use of the mlast
element in the ipqent struct, thus keeping a pointer to the last
mbuf in the queue. All type of communication with packet losses
will benefit from this, but it's most visible on "long fat pipes".
Comments?
-- Ragge
Index: tcp_input.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/tcp_input.c,v
retrieving revision 1.193
diff -c -r1.193 tcp_input.c
*** tcp_input.c 17 Apr 2004 23:35:37 -0000 1.193
--- tcp_input.c 19 Apr 2004 14:33:40 -0000
***************
*** 340,345 ****
--- 340,347 ----
__P((const struct ip6_hdr *, const struct tcphdr *));
#endif
+ #define TRAVERSE(x) while ((x)->m_next) (x) = (x)->m_next
+
int
tcp_reass(tp, th, m, tlen)
struct tcpcb *tp;
***************
*** 393,399 ****
if (pkt_seq == p->ipqe_seq + p->ipqe_len) {
p->ipqe_len += pkt_len;
p->ipqe_flags |= pkt_flags;
! m_cat(p->ipqe_m, m);
m = NULL;
tiqe = p;
TAILQ_REMOVE(&tp->timeq, p, ipqe_timeq);
--- 395,402 ----
if (pkt_seq == p->ipqe_seq + p->ipqe_len) {
p->ipqe_len += pkt_len;
p->ipqe_flags |= pkt_flags;
! m_cat(p->ipre_mlast, m);
! TRAVERSE(p->ipre_mlast);
m = NULL;
tiqe = p;
TAILQ_REMOVE(&tp->timeq, p, ipqe_timeq);
***************
*** 424,429 ****
--- 427,434 ----
q->ipqe_flags |= pkt_flags;
m_cat(m, q->ipqe_m);
q->ipqe_m = m;
+ q->ipre_mlast = m; /* last mbuf may have changed */
+ TRAVERSE(q->ipre_mlast);
tiqe = q;
TAILQ_REMOVE(&tp->timeq, q, ipqe_timeq);
TCP_REASS_COUNTER_INCR(&tcp_reass_prependfirst);
***************
*** 455,461 ****
pkt_len += q->ipqe_len;
pkt_flags |= q->ipqe_flags;
pkt_seq = q->ipqe_seq;
! m_cat(q->ipqe_m, m);
m = q->ipqe_m;
TCP_REASS_COUNTER_INCR(&tcp_reass_append);
goto free_ipqe;
--- 460,467 ----
pkt_len += q->ipqe_len;
pkt_flags |= q->ipqe_flags;
pkt_seq = q->ipqe_seq;
! m_cat(q->ipre_mlast, m);
! TRAVERSE(q->ipre_mlast);
m = q->ipqe_m;
TCP_REASS_COUNTER_INCR(&tcp_reass_append);
goto free_ipqe;
***************
*** 519,525 ****
#endif
m_adj(m, overlap);
rcvpartdupbyte += overlap;
! m_cat(q->ipqe_m, m);
m = q->ipqe_m;
pkt_seq = q->ipqe_seq;
pkt_len += q->ipqe_len - overlap;
--- 525,532 ----
#endif
m_adj(m, overlap);
rcvpartdupbyte += overlap;
! m_cat(q->ipre_mlast, m);
! TRAVERSE(q->ipre_mlast);
m = q->ipqe_m;
pkt_seq = q->ipqe_seq;
pkt_len += q->ipqe_len - overlap;
***************
*** 633,638 ****
--- 640,646 ----
* Insert the new fragment queue entry into both queues.
*/
tiqe->ipqe_m = m;
+ tiqe->ipre_mlast = m;
tiqe->ipqe_seq = pkt_seq;
tiqe->ipqe_len = pkt_len;
tiqe->ipqe_flags = pkt_flags;