tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
IPv4 fragmentation and IPsec details
This is a continuation of the email I posted on 10/16/12 about FAST_IPSEC not
sending "fragmentation needed" messages. I have included that email at the end
of this email.
As part of my fix, I now have ip_forward use the mtu returned from ip_output
via the mtu_p pointer. The problem is that breaks KAME_IPSEC. KAME_IPSEC
doesn't update the mtu value, so with my fix outlined below, KAME_IPSEC will
now send a "fragmentation needed" message with a value of 1500 (or whatever the
mtu of the interface is). So I have modified the KAME_IPSEC block in
ip_output.c to parallel what I do in the FAST_IPSEC block. (See below.)
This all works, but there are a couple of troubling details.
1. ipsec_hdrsiz returns the maximum possible number of bytes IPsec may add,
rather than the actual number of bytes that *will* be added to the packet.
With my fix, some packets that could actually fit in the MTU will be bounced
because ipsec_hdrsiz reports a number of bytes that is greater than what is
actually added.
2. ipsec_hdrsiz returns an odd number, so the "fragmentation needed" packet
contains an mtu that is an odd number. (This was true before any of my fixes,
so is a different but related problem. See esp_hdrsiz in netipsec/xform_esp.c
for FAST_IPSEC or netinet6/esp_output.c for KAME_IPSEC.)
For the short term, I am proposing that we use ipsec_hdrsiz as is, but in
ip_forward, reduce destmtu until it is on a 4-byte boundary.
For a longer term fix, I propose having two functions, one that calculates the
actual number of bytes added to a packet, and another that calculates the
maximum possible number of bytes that can be added based on IPsec policy. The
calculation should push the number up to a 4-byte, 8-byte or whatever boundary,
based on what boundary the IPsec policy uses for padding.
Thoughts? Comments?
-Bev
Begin forwarded message:
> From: Beverly Schwartz <bschwart%bbn.com@localhost>
> Subject: FAST_IPSEC fragmentation problem
> Date: October 16, 2012 4:05:11 PM EDT
> To: tech-net%NetBSD.org@localhost
> Cc: Beverly Schwartz <bschwart%bbn.com@localhost>
>
> I have detected a problem in which FAST_IPSEC does not return a fragmentation
> needed icmp message.
>
> Here's the setup:
>
> A=======B--------C
>
> Between A & B is an IPsec esp tunnel. B & C are in the clear.
>
> A initiates a wget from C. After initial TCP handshaking, C sends a packet
> with TCP data to fill the mtu between B & C.
>
> B tries to forward the packet to A after adding a new IP header, ESP header,
> and padding, but the resultant packet is too large.
>
> B does *not* send a fragmentation needed packet back to C. Instead, C times
> out and black hole detection comes into play, C starts sending minimally sized
> packets, and then the data gets through. This delays the (small) transfer by
> about 5 seconds. For large transfers, this has a signifcant impact on
> throughput because of the small-sized packets.
>
> If you look at netipsec/ipsec_output.c, it looks like fragmentation is being
> taken care of, but it is not. This is what happens.
>
> In function ipsec4_process_packet, based on the policy set for setting the DF
> bit on the outer packet (ip4_ipsec_dfbit) and the setting of the bit in the
> inner packet is whether or not the DF bit gets set in the outer packet. This
> code does work as expected.
>
> Continuing further down ipsec4_process_packet, we see a call to do the output
> transform:
> error = (*sav->tdb_xform->xf_output)(m, isr, NULL, i, off);
>
> In my case, we are doing crypto, and this function maps to esp_output.
>
> Looking in netipsec/xform_esp.c, we can see crypto_dispatch called.
> crypto_dispatch will queue the packet to have crypto done, returning
> ENOENT.
>
> We can see this is netinet/ip_output.c after the call to
> ipsec4_process_packet. In ip_output, error is changed to 0 because no
> error has occurred - we're just waiting to be called back when opencrypto
> is done. Control jumps to done, and ip_forward sees no error. ip_forward
> returns.
>
> When crypto has done it's thing, it puts the packet on another queue, and
> calls esp_output_cb. esp_output_cb calls ip_output. On this run through
> ip_output, the DF bit is checked, and we do have an MTU error. However,
> we're no longer returning to ip_forward which has all the context we need
> to call icmp_error. Even if esp_output_cb could detect the failure is an
> mtu problem, all it has is an encrypted packet, and has no way to figure
> out where to send an icmp error message to.
>
> So no fragmentation needed message is sent.
>
> I have been trying different ways to deal with this. What seems easiest
> to me, is in the case where the DF bit is set, add a call before the call
> to ipsec4_process_packet in ip_output which determines how many bytes
> IPsec will add. Then check current packet length plus the extra bytes
> against mtu. If this fails, return EMSGSIZE. Here's the code, placed
> right before the call to ipsec4_process_packet:
>
> /*
> * Check that MTU is sufficient.
> */
> if (ntohs(ip->ip_off) & IP_DF) {
> size_t ipsec_hdrlen = ipsec_hdrsiz(sp);
> if (ip_len + ipsec_hdrlen > mtu) {
> if (flags & IP_RETURNMTU)
> *mtu_p = mtu - ipsec_hdrlen;
> error = EMSGSIZE;
> IP_STATINC(IP_STAT_CANTFRAG);
> splx(s);
> goto bad;
> }
> }
>
> ipsec_hdrsiz is declared a static function, but this is easy to change.
> I prefer calling ipsec_hdrsiz directly rather than using ipsec4_hdrsiz
> because we already have the policy so don't need to get it again.
>
> This does not fully solve our problem, however. In ip_forward, we
> call ipsec4_hdrsiz in order to determine the mtu size to send
> in the icmp error message. Well the calculation doesn't work, and
> we end up with an mtu of 1500. No help there.
>
> Fortunately, this problem is easy to fix. Because we set *mtu_p,
> ip_forward already has the new mtu. So in case EMSGSIZE in ip_forward,
> after setting type and code, we need to add:
> if (destmtu != 0)
> break;
>
> The only problem I'm still struggling with is that ipsec_hdrsiz returns
> a strange (meaning odd number) header length. I think this should be
> decreased until we have a multiple of 4. But that's just my opinion.
>
> BTW, IPv6 doesn't quite run into this because it just applies source
> fragmentation to the new packet. IPv6 should not fragment midstream,
> so this is probably not desired behavior. However, one could argue
> that the encapsulated packet is a new packet, therefore fragmentation
> is allowed. In my opinion, this doesn't ring true to the spirit of the
> IPv6 spec.
>
> -Bev
>
>
Home |
Main Index |
Thread Index |
Old Index