NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/53562: bridge(4) breaks segmentation / TX checksum offloading



On 2018/09/14 20:17, Masanobu SAITOH wrote:
On 2018/09/12 21:33, Rin Okuyama wrote:
Hi, sorry for the delay in the response.

On 2018/09/10 16:34, Masanobu SAITOH wrote:
...
On 2018/09/08 8:50, Rin Okuyama wrote:
...
  The following would be a simplest
  example where your patch does not work:
  - NetBSD/amd64-currnet with wm0 connected to LAN
  - qemu-3.0.0nb2 installed from pkgsrc
  - setup tap and bridge as follows:
       # ifconfig tap0 create up
       # ifconfig bridge0 create
       # brconfig bridge0 add wm0 add tap0 up
  - run NetBSD/amd64 8.0 on QEMU with tap enabled:
       # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
         -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
  Then, the virtual host can send and receive small packets, e.g.,
  ping works well regardless of whether TSO is enabled or not for wm0.
[quotation added by RO from the previous message of mine]
  However, it cannot receive larger packets when TSO is enabled, e.g.,
  file cannot be retrieved through ftp. If TSO is turned off, everything
  works fine.
[quotation done]

  0)
  Is this "wm0" on the guest or the host?
  If my understanding is correct, writing packet via tap on host is not related
  to TSO...


I realized that this problem is not only for TSO but for all outgoing offload flags.


Sorry for bothering you many times. This "wm0" is for the host. Also,
in this example, the guest fails to receive data from the ftp server
working on the host. I forgot to test servers other than the host. It
worked fine; only connecting from the guest to host fails.

I see. I could reproduce the same problem with Xen dom0 & domU by adding only ip4csum
flag to wm0.

  1)
  If you're using wm(4) on the guest, please try vioif(4) and see if the problem
occurs or not.

I tried vioif(4) on the guest but the situation does not change.

  2)
  For checksum oofloading, the following kernel options will help for debugging:
     INET_CSUM_COUNTERS
     TCP_CSUM_COUNTERS
     UDP_CSUM_COUNTERS

  These options add some event counters:
...

  3)
  wm(4) has a lot of event counters, some of them are useful for debugging.
  To use them add "options WM_EVENT_COUNTERS" to your kernel config file.

Thanks! I enabled these options.

  4)
  For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
  I have not-yet-committed patch:

     http://www.netbsd.org/~msaitoh/wm-defrag-20180906-0.dif

% vmstat -ev | grep toomany
wm0 txq00txtoomanyseg                                      0    0 misc

  If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the
problem.

I tried if_wm.c r1.587, which already contains this patch. Counter
"wmX txqYYtxtoomanyseg" was not increased.

I examined how packets flow in the example above, from the ftp server
on host (wm0 on host) to the ftp client on guest (tap0 on host ->
vioif0 on guest):

tcp_output()
-> ip_output()
-> ip_if_output()
-> if_output_lock()
-> ifp->if_output = ether_output()
-> bridge_output()
-> bridge_enqueue()

When TSO4 is enabled for wm0 on the host, tcp_output() attempts to send
packets larger than MTU. If the destination interface is wm0, i.e.,
dst_ifp == ifp in bridge_output(), there's no problem with your patch
for if_bridge.c; packets are segmented automatically by the HW. However,
in this case, the destination is tap0, and bridge_enqueue() send packets
larger than MTU for tap0. This is, of course, illegal, since there is no
HW by which packets are segmented. The guest on QEMU, therefore, cannot
receive packets. This is the scenario how the failure occurs in my
example of QEMU above.

The similar problems occur when one of other TX offload options, TSO6,
IP4CSUM-TX, TCPnCSUM-TX, or UDPnCSUM-TX, are enabled. For example,
consider the case of IP4CSUM-TX. When this option is enabled for wm0
on the host, checksumming is omitted in ip_output(). Then, when the
destination is tap0, packets with wrong checksum are sent.

This patch disable TX offload when the interface is added to bridge:

http://www.netbsd.org/~rin/disable_tx_offload_when_bridged_20180912.patch

I don't know if it's good to doing in L3...

It seems that both FreeBSD and DragonFly clear all bridge member's outgoing
offload flags when adding a member to a bridge. I also don't know if
this is good solution or not...

  My diff have been committed as if_bridge.c rev. 1.156 now.

Module Name:    src
Committed By:    msaitoh
Date:        Fri Sep 14 11:05:09 UTC 2018

Modified Files:
    src/sys/net: if_bridge.c

Log Message:
 Fix a bug that bridge_enqueue() incorrectly cleared outgoing packet's offload
flags. bridge_enqueue() is called from bridge_output() when a packet is
spontaneous. Clear csum_flags before calling brige_enqueue() in
bridge_forward() or bridge_broadcast() instead of in the beginning of
bridge_enqueue().

Note that this change doesn't fix a problem on the following configuration:

    A bridge has two or more interfaces.

    An address is assigned to an bridge member interface and
    some offload flags are set.

    Another interface has no address and has no any offload flag.

XXX pullup-[78]


To generate a diff of this commit:
cvs rdiff -u -r1.156 -r1.157 src/sys/net/if_bridge.c



With this patch, the guest on QEMU can retrieve files from the host
without problem, even if TX offload options are enabled, which supports
the discussion above. No need to modify if_bridge.c from the original
(r1.156) in this case.

I don't know this patch is preferable; there should be better ways to
handle the problem. Packets need to be segmented or checksummed only
if the destination in bridge_output() is different from the original
interface. I will examine further if needed.

Thanks,
rin



FYI:
	http://www.netbsd.org/~msaitoh/packetflow017.pdf

--
-----------------------------------------------
                SAITOH Masanobu (msaitoh%execsw.org@localhost
                                 msaitoh%netbsd.org@localhost)


Home | Main Index | Thread Index | Old Index