NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/47506: tap(4) gets stuck in OACTIVE



>Number:         47506
>Category:       kern
>Synopsis:       tap(4) gets stuck in OACTIVE
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 28 15:50:09 +0000 2013
>Originator:     Valery Ushakov
>Release:        NetBSD 6
>Organization:
>Environment:
NetBSD amd64 6.0_STABLE NetBSD 6.0_STABLE (GENERIC) #0: Sun Nov 18 04:21:07 MSK 
2012  
uwe@amd64:/home/uwe/work/netbsd/cvs/src-release-6/sys/arch/amd64/compile/GENERIC
 amd64

>Description:
It seems that under load tap(4) get stuck in a state where it has OACTIVE flag 
set, but poll(2) on the tap's fd doesn't return POLLIN.
>How-To-Repeat:
I'm playing with lwIP tcp/ip stack.  It uses tap(4) to talk to the 
ethernet:

# Create tap(4) interface for lwIP
ifconfig tap1 create
ifconfig tap1 up

# Bridge it to the network
ifconfig bridge0 create
brconfig bridge0 add tap1 add wm1
brconfig bridge0 up

The code to read from tap(4) does something along these lines:

for (;;) {
  poll( [{ tapfd, POLLIN }] );
  read(tapfd, packet);
  post packet to tcp/ip thread;
}

If I throw enough incoming traffic load at it (benchmarks/netperf),
the loop above gets stuck.  It sits in poll(2) and never returns.
Meanwhile the tap(4) has OACTIVE flag set, and bridge(4) just enqueues
new frames and doesn't call if_start (the very end of bridge_enqueue()
function)

Since poll is redundant here (the read(2) is blocking anyway), I can
work around this problem by just dropping the poll(2).  In that case
read(2) does complete successfully and the loop is not stuck. However,
in a situation where poll(2)'ing was indeed required by the structure
of the code, the bug would be impossible to avoid.

>Fix:



Home | Main Index | Thread Index | Old Index