On Sun, 12 Jul 2020 22:07:02 +0200 Florian Westphal <f...@strlen.de> wrote:
> There are existing deployments where a vxlan or geneve interface is part > of a bridge. > > In this case, MTU may look like this: > > bridge mtu: 1450 > vxlan (bridge port) mtu: 1450 > other bridge ports: 1450 > > physical link (used by vxlan) mtu: 1500. > > This makes sure that vxlan overhead (50 bytes) doesn't bring packets over the > 1500 MTU of the physical link. > > Unfortunately, in some cases, PMTU updates on the encap socket > can bring such setups into a non-working state: no traffic will pass > over the vxlan port (physical link) anymore. > Because of the bridge-based usage of the vxlan interface, the original > sender never learns of the change in path mtu and TCP clients will retransmit > the over-sized packets until timeout. > > > When this happens, a 'ip route flush cache' in the netns holding > the vxlan interface resolves the problem, i.e. the network is capable > of transporting the packets and the PMTU update is bogus. > > Another workaround is to enable 'net.ipv4.tcp_mtu_probing'. > > This patch series allows to configure vxlan and geneve interfaces > to ignore path mtu updates. Regardless of the comments to 1/3, I don't have any problem with this (didn't review yet) if it's the only way to currently work around the issue (of course :)). I think we should eventually fix PMTU discovery for bridged setups, but perhaps it's more complicated than that. I wonder, though: - wouldn't setting /proc/sys/net/ipv4/ip_no_pmtu_disc have the same effect? - does it really make sense to have this configurable for IPv6? -- Stefano