Re: [PATCH net-next v2 2/6] tunnels: PMTU discovery support for directly bridged IP packets

David Ahern Tue, 04 Aug 2020 06:55:05 -0700

On 8/3/20 11:53 PM, Stefano Brivio wrote:
> It's currently possible to bridge Ethernet tunnels carrying IP
> packets directly to external interfaces without assigning them
> addresses and routes on the bridged network itself: this is the case
> for UDP tunnels bridged with a standard bridge or by Open vSwitch.
> 
> PMTU discovery is currently broken with those configurations, because
> the encapsulation effectively decreases the MTU of the link, and
> while we are able to account for this using PMTU discovery on the
> lower layer, we don't have a way to relay ICMP or ICMPv6 messages
> needed by the sender, because we don't have valid routes to it.
> 
> On the other hand, as a tunnel endpoint, we can't fragment packets
> as a general approach: this is for instance clearly forbidden for
> VXLAN by RFC 7348, section 4.3:
> 
>    VTEPs MUST NOT fragment VXLAN packets.  Intermediate routers may
>    fragment encapsulated VXLAN packets due to the larger frame size.
>    The destination VTEP MAY silently discard such VXLAN fragments.
> 
> The same paragraph recommends that the MTU over the physical network
> accomodates for encapsulations, but this isn't a practical option for
> complex topologies, especially for typical Open vSwitch use cases.
> 
> Further, it states that:
> 
>    Other techniques like Path MTU discovery (see [RFC1191] and
>    [RFC1981]) MAY be used to address this requirement as well.
> 
> Now, PMTU discovery already works for routed interfaces, we get
> route exceptions created by the encapsulation device as they receive
> ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
> we already rebuild those messages with the appropriate MTU and route
> them back to the sender.
> 
> Add the missing bits for bridged cases:
> 
> - checks in skb_tunnel_check_pmtu() to understand if it's appropriate
>   to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
>   RFC 4443 section 2.4 for ICMPv6. This function is already called by
>   UDP tunnels
> 
> - a new function generating those ICMP or ICMPv6 replies. We can't
>   reuse icmp_send() and icmp6_send() as we don't see the sender as a
>   valid destination. This doesn't need to be generic, as we don't
>   cover any other type of ICMP errors given that we only provide an
>   encapsulation function to the sender
> 
> While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
> we might receive GSO buffers here, and the passed headroom already
> includes the inner MAC length, so we don't have to account for it
> a second time (that would imply three MAC headers on the wire, but
> there are just two).
> 
> This issue became visible while bridging IPv6 packets with 4500 bytes
> of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
> bytes of encapsulation headroom, we would advertise MTU as 3950, and
> we would reject fragmented IPv6 datagrams of 3958 bytes size on the
> wire. We're exclusively dealing with network MTU here, though, so we
> could get Ethernet frames up to 3964 octets in that case.
> 
> v2:
> - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
> - split IPv4/IPv6 functions (David Ahern)
> 
> Signed-off-by: Stefano Brivio <sbri...@redhat.com>
> ---
>  drivers/net/bareudp.c     |   5 +-
>  drivers/net/geneve.c      |   5 +-
>  drivers/net/vxlan.c       |   4 +-
>  include/net/dst.h         |  10 --
>  include/net/ip_tunnels.h  |   2 +
>  net/ipv4/ip_tunnel_core.c | 244 ++++++++++++++++++++++++++++++++++++++
>  6 files changed, 254 insertions(+), 16 deletions(-)
>


Much easier to follow

Reviewed-by: David Ahern <dsah...@gmail.com>

Re: [PATCH net-next v2 2/6] tunnels: PMTU discovery support for directly bridged IP packets

Reply via email to