NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/47506: tap(4) gets stuck in OACTIVE
>Number: 47506
>Category: kern
>Synopsis: tap(4) gets stuck in OACTIVE
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jan 28 15:50:09 +0000 2013
>Originator: Valery Ushakov
>Release: NetBSD 6
>Organization:
>Environment:
NetBSD amd64 6.0_STABLE NetBSD 6.0_STABLE (GENERIC) #0: Sun Nov 18 04:21:07 MSK
2012
uwe@amd64:/home/uwe/work/netbsd/cvs/src-release-6/sys/arch/amd64/compile/GENERIC
amd64
>Description:
It seems that under load tap(4) get stuck in a state where it has OACTIVE flag
set, but poll(2) on the tap's fd doesn't return POLLIN.
>How-To-Repeat:
I'm playing with lwIP tcp/ip stack. It uses tap(4) to talk to the
ethernet:
# Create tap(4) interface for lwIP
ifconfig tap1 create
ifconfig tap1 up
# Bridge it to the network
ifconfig bridge0 create
brconfig bridge0 add tap1 add wm1
brconfig bridge0 up
The code to read from tap(4) does something along these lines:
for (;;) {
poll( [{ tapfd, POLLIN }] );
read(tapfd, packet);
post packet to tcp/ip thread;
}
If I throw enough incoming traffic load at it (benchmarks/netperf),
the loop above gets stuck. It sits in poll(2) and never returns.
Meanwhile the tap(4) has OACTIVE flag set, and bridge(4) just enqueues
new frames and doesn't call if_start (the very end of bridge_enqueue()
function)
Since poll is redundant here (the read(2) is blocking anyway), I can
work around this problem by just dropping the poll(2). In that case
read(2) does complete successfully and the loop is not stuck. However,
in a situation where poll(2)'ing was indeed required by the structure
of the code, the bug would be impossible to avoid.
>Fix:
Home |
Main Index |
Thread Index |
Old Index