Subject: Re: NetBSD in BSD Router / Firewall Testing
To: Jonathan Stone <jonathan@Pescadero.dsg.stanford.edu>
From: Mike Tancsa <mike@sentex.net>
List: tech-net
Date: 12/01/2006 15:34:21
At 01:49 PM 12/1/2006, Jonathan Stone wrote:
>As sometime principial maintaner of NetBSD's bge(4) driver, and the
>author of many of the changes and chip-variant support subsequently
>folded into OpenBSD's bge(4) by brad@openbsd.org, I'd like to speak
>to a couple of points here.
First off, thanks for the extended insights! This has been a most
interesting exercise for me.
>I beleive the UDP packets in Mike's tests are all so small that, even
>with a VLAN tag added, the Ethernet payload (IPv4 header, UDP header,
>10 bytes UDP payload), plus 14-byte Ethernet header, plus 4-byte CRC,
>is still less than the ETHER_MIN_MTU. If so, I don't see how
>framesize is a factor, since the packets will be padded to the minimum
>valid Ethernet payload in any case. OTOH, Switch forwarding PPS may
>well show a marginal degradation due to VLAN insertion; but we're
>still 2 or 3 orders of magnitude away from those limits.
Unfortunately, my budget is not so high that I can afford to have a
high end gigE switch in my test area. I started off with a linksys,
which I managed to hang under moderately high loads. I had an
opportunity to test the Netgear and it was a pretty reasonable price
(~$650 USD) for what it claims its capable of (17Mpps). It certainly
hasnt locked up and I tried putting a bunch of boxes on line and
forwarding packets as fast as all 8 of the boxes could and there
didnt seem to be any ill effects on the switch. Similarly, trunking,
although a bit wonky to configure (I am far more used to Cisco land)
at least works and doesnt seem to degrade overall performance.
>Second point: NetBSD's bge(4) driver includes support for runtime
>manual tuning of interrupt mitigation. I chose the tuning values
>based on empirical measurements of large TCP flows on bcm5700s and bcm5704s.
>
>If my (dimming) memory serves, the default value of 0 yields
>thresh-holds close to Bill Paul's original FreeBSD driver. A value of
>1 yields an bge interrrupt for every two full-sized Ethernet
>frames. Each increment of the sysctl knob will, roughly, halve receive
>interrupt rate, up to a maximum of 5, which interrupts about every 30
>to 40 full-sized TCP segments.
I take it this is it
# sysctl -d hw.bge.rx_lvl
hw.bge.rx_lvl: BGE receive interrupt mitigation level
# sysctl hw.bge.rx_lvl
hw.bge.rx_lvl = 0
#
With ipf enabled and 10 poorly written rules.
rx_lvl pps
0 219,181
1 229,334
2 280,508
3 328,896
4 333,585
5 346,974
Blasting for 10 seconds with the value set to 5, here is the before
and after for netstat -i and netstat -q after doing
[4600X2-88-176]# ./netblast 192.168.44.1 500 10 10
start: 1165001022.659075049
finish: 1165001032.659352738
send calls: 5976399
send errors: 0
approx send rate: 597639
approx error rate: 0
[4600X2-88-176]#
# netstat -q
arpintrq:
queue length: 0
maximum queue length: 50
packets dropped: 153
ipintrq:
queue length: 0
maximum queue length: 256
packets dropped: 180561075
ip6intrq:
queue length: 0
maximum queue length: 256
packets dropped: 0
atintrq1:
queue length: 0
maximum queue length: 256
packets dropped: 0
atintrq2:
queue length: 0
maximum queue length: 256
packets dropped: 0
clnlintrq:
queue length: 0
maximum queue length: 256
packets dropped: 0
ppoediscinq:
queue length: 0
maximum queue length: 256
packets dropped: 0
ppoeinq:
queue length: 0
maximum queue length: 256
packets dropped: 0
# netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
nfe0 1500 <Link> 00:13:d4:ae:9b:6b 38392 584 5517 0 0
nfe0 1500 fe80::/64 fe80::213:d4ff:fe 38392 584 5517 0 0
nfe0 1500 192.168.43/24 192.168.43.222 38392 584 5517 0 0
bge0* 1500 <Link> 00:10:18:14:15:12 0 0 0 0 0
bge1 1500 <Link> 00:10:18:14:27:d5 46026021 489390
213541721 0 0
bge1 1500 192.168.44/24 192.168.44.223 46026021 489390
213541721 0 0
bge1 1500 fe80::/64 fe80::210:18ff:fe 46026021 489390
213541721 0 0
bge2 1500 <Link> 00:10:18:14:38:d2 354347890 255587
19537142 0 0
bge2 1500 192.168.88/24 192.168.88.223 354347890 255587
19537142 0 0
bge2 1500 fe80::/64 fe80::210:18ff:fe 354347890 255587
19537142 0 0
wm0 1500 <Link> 00:15:17:0b:70:98 17816154 72 31 0 0
wm0 1500 fe80::/64 fe80::215:17ff:fe 17816154 72 31 0 0
wm1 1500 <Link> 00:15:17:0b:70:99 1528 0 2967696 0 0
wm1 1500 fe80::/64 fe80::215:17ff:fe 1528 0 2967696 0 0
lo0 33192 <Link> 3 0 3 0 0
lo0 33192 127/8 localhost 3 0 3 0 0
lo0 33192 localhost/128 ::1 3 0 3 0 0
lo0 33192 fe80::/64 fe80::1 3 0 3 0 0
# netstat -q
arpintrq:
queue length: 0
maximum queue length: 50
packets dropped: 153
ipintrq:
queue length: 0
maximum queue length: 256
packets dropped: 183066795
ip6intrq:
queue length: 0
maximum queue length: 256
packets dropped: 0
atintrq1:
queue length: 0
maximum queue length: 256
packets dropped: 0
atintrq2:
queue length: 0
maximum queue length: 256
packets dropped: 0
clnlintrq:
queue length: 0
maximum queue length: 256
packets dropped: 0
ppoediscinq:
queue length: 0
maximum queue length: 256
packets dropped: 0
ppoeinq:
queue length: 0
maximum queue length: 256
packets dropped: 0
# netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
nfe0 1500 <Link> 00:13:d4:ae:9b:6b 38497 585 5596 0 0
nfe0 1500 fe80::/64 fe80::213:d4ff:fe 38497 585 5596 0 0
nfe0 1500 192.168.43/24 192.168.43.222 38497 585 5596 0 0
bge0* 1500 <Link> 00:10:18:14:15:12 0 0 0 0 0
bge1 1500 <Link> 00:10:18:14:27:d5 46026057 489390
217012400 0 0
bge1 1500 192.168.44/24 192.168.44.223 46026057 489390
217012400 0 0
bge1 1500 fe80::/64 fe80::210:18ff:fe 46026057 489390
217012400 0 0
bge2 1500 <Link> 00:10:18:14:38:d2 360324326 255587
19537143 0 0
bge2 1500 192.168.88/24 192.168.88.223 360324326 255587
19537143 0 0
bge2 1500 fe80::/64 fe80::210:18ff:fe 360324326 255587
19537143 0 0
wm0 1500 <Link> 00:15:17:0b:70:98 17816195 72 31 0 0
wm0 1500 fe80::/64 fe80::215:17ff:fe 17816195 72 31 0 0
wm1 1500 <Link> 00:15:17:0b:70:99 1528 0 2967696 0 0
wm1 1500 fe80::/64 fe80::215:17ff:fe 1528 0 2967696 0 0
lo0 33192 <Link> 3 0 3 0 0
lo0 33192 127/8 localhost 3 0 3 0 0
lo0 33192 localhost/128 ::1 3 0 3 0 0
lo0 33192 fe80::/64 fe80::1 3 0 3 0 0
>I therefore see very, very good grounds to expect that NetBSD would
>show much better performance if you increase bge interrupt mitigation.
Yup, it certainly seems so!
>That said: I see a very strong philosophical design difference between
>FreeBSD's polling machinery, and the interrupt-mitigation approaches
>variously implemented by Jason Thorpe in wm(4) and by myself in
>bge(4). For the workloads I care about, the design-point tradeoffs in
>FreeBSD-4's polling are simply not acceptable. I *want* kernel
>softint processing to pre-empt userspace procesese, and even
>kthreads. I acknowledge that my needs are, perhaps, unusual.
There are certainly tradeoffs. I guess for me in a firewall capacity,
I want to be able to get into the box OOB when its under
attack. 1Mpps is still considered a medium to heavy attack right
now, but with more and more botnets out there, its only going to get
more common place :( I guess I would like the best of both worlds, a
way to give priority for OOB access, be that serial console or other
interface... But I dont see a way of doing that right now via Interrupt method.
>Even so, I'd be glad to work on improving bge(4) tuning for workloads
>dominated by tinygrams. The same packet rate as ttcp (over
>400kpacket/sec on a 2.4Ghz Opteron) seems like an achievable target
>--- unless there's a whole lot of CPU processing going on inside
>IP-forwarding that I'm wholly unaware of.
The AMD I am testing on is just a 3800 X2 so ~ 2.0Ghz.
>At a recieve rate of 123Mbyte/sec per bge interface, I see roughly
>5,000 interrupts per bge per second. What interrupt rates are you
>seeing for each bge device in your tests?
After 10 seconds of blasting,
# vmstat -i
interrupt total rate
cpu0 softclock 5142870 98
cpu0 softnet 1288284 24
cpu0 softserial 697 0
cpu0 timer 5197361 100
cpu0 FPU synch IPI 5 0
cpu0 TLB shootdown IPI 373 0
cpu1 timer 5185327 99
cpu1 FPU synch IPI 2 0
cpu1 TLB shootdown IPI 1290 0
ioapic0 pin 14 1659 0
ioapic0 pin 15 30 0
ioapic0 pin 3 44586 0
ioapic0 pin 10 2596838 49
ioapic0 pin 5 11767286 226
ioapic0 pin 7 64269 1
ioapic0 pin 4 697 0
Total 31291574 602
# vmstat -i
interrupt total rate
cpu0 softclock 5145604 98
cpu0 softnet 1288376 24
cpu0 softserial 697 0
cpu0 timer 5201094 100
cpu0 FPU synch IPI 5 0
cpu0 TLB shootdown IPI 373 0
cpu1 timer 5189060 99
cpu1 FPU synch IPI 2 0
cpu1 TLB shootdown IPI 1291 0
ioapic0 pin 14 1659 0
ioapic0 pin 15 30 0
ioapic0 pin 3 44664 0
ioapic0 pin 10 2596865 49
ioapic0 pin 5 11873637 228
ioapic0 pin 7 64294 1
ioapic0 pin 4 697 0
Total 31408348 603
That was with hw.bge.rx_lvl=5
>
>I've never seen that particular bug. I don't beleive I have any acutal
>5750 chips to try to reproduce it. I do have access to: 5700, 5701,
>5705, 5704, 5721, 5752, 5714, 5715, 5780. (I have one machine with one
>5752; and the 5780 is one-dualport-per HT-2000 chip, which means one
>per motherboard. But for most people's purposes, the 5780/5714/5715
>are indistinguishable).
>
>I wonder, does this problem go away if you crank up interrupt mitigation?
Its hard to reproduce, but if I use 2 generators to blast in one
direction, it seems to trigger it even with the value at 5
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 142?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 52?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 52?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 142?
Dec 1 10:21:29 r2-netbsd last message repeated 2 times
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 52?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 142?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 52?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 142?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 52?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 52?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 142?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 52?
Dec 1 10:21:29 r2-netbsd last message repeated 3 times
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 142?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 142?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 52?
Dec 1 10:21:29 r2-netbsd last message repeated 2 times
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 142?
Dec 1 10:21:29 r2-netbsd /netbsd: bge: failed on len 52?
Dec 1 10:21:30 r2-netbsd last message repeated 2365 times
With ipfilter disabled, I am able to get about 680Kpps through the
box using 2 streams in one direction. (As a comparison, RELENG_4 was
able to do 950Kpps and with a faster CPU (AMD 4600), about 1.2Mpps)
Note, with all these tests, the NetBSD box is essentially locked up
servicing interrupts
---Mike