NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/58792: npfctl generates a lot of redundant code



>Number:         58792
>Category:       bin
>Synopsis:       npfctl generates a lot of redundant code
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Oct 30 11:45:00 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10, 9, ...
>Organization:
The NpfBSD Optimization
>Environment:
>Description:
The npf bpf compiler naively takes groups like `from { 1.2.0.0/16, 1.3.0.0/16 }' or `from { fe80::1, fe80::2 }' and compiles tests for each part individually, even if there are many overlapping instructions like reading the first word and ANDing it with 0xffff0000, or comparing the first three words (fe80:0000, 0000:0000, 0000:0000) before comparing the fourth word (0000:0001 vs 0000:0002).  Similarly, the compiler doesn't compact port ranges like { 123, 124, 125 } or { 200-299, 300-399 }.

Obviously there's no end to the amount of optimization a compiler can do, and it's probably not worth the effort to go arbitrarily far down this rabbit hole.  But these are low-hanging fruit that probably affect real-world filter performance and can be handled by modelling an `npf var' not as a list of elements each element being an address range or a port range, but by modelling an `npf var' as a tuple of (compacted address range list, compacted port range list) and generating code for a compacted range list all at once.
>How-To-Repeat:
$ cat npf1.conf
group default {
        pass in final family inet4 from { 1.2.0.0/16, 1.3.0.0/16 }
        block all
}
$ npfctl debug -c npf1.conf
Loading npf1.conf

RULE AT LINE 2
(000) ld       M[0]
(001) jeq      #0x4             jt 2    jf 10
(002) ld       [12]                             ### 1
(003) and      #0xffff0000                      ###
(004) jeq      #0x1020000       jt 9    jf 5
(005) ld       [12]                             ### 2
(006) and      #0xffff0000                      ###
(007) jeq      #0x1030000       jt 9    jf 8
(008) ret      #0
(009) ret      #-1
(010) ret      #0
...

The lines marked ### are duplicated; the second instance could be deleted entirely.  The code generated essentially does

(P[12] & 0xffff0000 == 0x01020000) || (P[12] & 0xffff0000 == 0x01030000)

when it could be

X := P[12] & 0xffff0000, X == 0x01020000 || X == 0x01030000


In the multi-word comparison case:

$ cat npf2.conf
group default {
        pass in final family inet6 from { fe80::1, fe80::2 }
        block all
}
$ npfctl debug -c npf_subopt.conf
Loading npf_subopt.conf

RULE AT LINE 2
(000) ld       M[0]
(001) jeq      #0x6             jt 2    jf 20
(002) ld       [8]                                 ### 1 {
(003) jeq      #0xfe800000      jt 4    jf 10
(004) ld       [12]
(005) jeq      #0x0             jt 6    jf 10
(006) ld       [16]
(007) jeq      #0x0             jt 8    jf 10
(008) ld       [20]                                ### }
(009) jeq      #0x1             jt 19   jf 10
(010) ld       [8]                                 ### 2 {
(011) jeq      #0xfe800000      jt 12   jf 18
(012) ld       [12]
(013) jeq      #0x0             jt 14   jf 18
(014) ld       [16]
(015) jeq      #0x0             jt 16   jf 18
(016) ld       [20]                                ### }
(017) jeq      #0x2             jt 19   jf 18
(018) ret      #0
(019) ret      #-1
(020) ret      #0
...

The sequence marked ### 1 { ... } is redundant with the sequence marked ### 2 { ... }.  The code generated essentially does

(P[8] == 0xfe800000 && P[12] == 0 && P[16] == 0 && P[20] == 1) || (P[8] == 0xfe800000 && P[12] == 0 && P[16] == 0 && P[20] == 2)

when this could be simplified to

P[8] == 0xfe800000 && P[12] == 0 && P[16] == 0 && (X := P[20], X == 1 || X == 2)


For the port range case:

$ cat npf3.conf
group default {
        pass in final from any port { 123, 124, 125, 200-299, 300-399 }
}
$ npfctl debug -c npf3.conf
Loading npf3.conf
  
RULE AT LINE 2
(000) ld       M[2]
(001) jeq      #0x6             jt 5    jf 2
(002) ld       M[2]
(003) jeq      #0x11            jt 5    jf 4
(004) ret      #0
(005) ld       M[0]
(006) jeq      #0x0             jt 24   jf 7
(007) ldx      M[1]
(008) ldh      [x + 0]
(009) jeq      #0x7b            jt 23   jf 10
(010) ldh      [x + 0]
(011) jeq      #0x7c            jt 23   jf 12
(012) ldh      [x + 0]
(013) jeq      #0x7d            jt 23   jf 14
(014) ldh      [x + 0]
(015) jge      #0xc8            jt 16   jf 17
(016) jgt      #0x12b           jt 17   jf 18
(017) ja       18
(018) ldh      [x + 0]
(019) jge      #0x12c           jt 20   jf 21
(020) jgt      #0x18f           jt 21   jf 22
(021) ja       22
(022) ret      #0
(023) ret      #-1
(024) ret      #0
...

Here the generated code reads the port to for 123, then reads the port to check for 124, then reads the port to check for 125; then reads the port to check for 200 <= port <= 299; then reads the port to check for 300 <= port <= 399.  Instead, it could read the port once, and then check for two ranges (123 <= port <= 125 or 200 <= port <= 399).
>Fix:
Yes, please!



Home | Main Index | Thread Index | Old Index