On 7/19/20 3:49 PM, Stefano Brivio wrote: >> >> With this test case, the lookup fails: >> >> [ 144.689378] vxlan: vxlan_xmit_one: dev vxlan_a 10.0.1.1/57864 -> >> 10.0.0.0/4789 len 5010 gw 10.0.1.2 >> [ 144.692755] vxlan: skb_tunnel_check_pmtu: dst dev br0 skb dev vxlan_a >> skb len 5010 encap_mtu 4000 headroom 50 >> [ 144.697682] vxlan: skb_dst_update_pmtu_no_confirm: calling >> ip_rt_update_pmtu+0x0/0x160/ffffffff825ee850 for dev br0 mtu 3950 >> [ 144.703601] IPv4: __ip_rt_update_pmtu: dev br0 mtu 3950 old_mtu 5000 >> 192.168.2.1 -> 192.168.2.2 >> [ 144.708177] IPv4: __ip_rt_update_pmtu: fib_lookup failed for >> 192.168.2.1 -> 192.168.2.2 >> >> Because the lookup fails, __ip_rt_update_pmtu skips creating the exception. >> >> This hack gets the lookup to succeed: >> >> fl4->flowi4_oif = dst->dev->ifindex; >> or >> fl4->flowi4_oif = 0; > > Oh, I didn't consider that... route. :) Here comes an added twist, which > currently needs Florian's changes from: > https://git.breakpoint.cc/cgit/fw/net-next.git/log/?h=udp_tun_pmtud_12 > > Test is as follows: > > test_pmtu_ipv4_vxlan4_exception_bridge() { > test_pmtu_ipvX_over_vxlanY_or_geneveY_exception vxlan 4 4 > > ip netns add ns-C > > ip -n ns-C link add veth_c_a type veth peer name veth_a_c > ip -n ns-C link set veth_a_c netns ns-A > > ip -n ns-C addr add 192.168.2.100/24 dev veth_c > > ip -n ns-C link set dev veth_c_a mtu 5000 > ip -n ns-C link set veth_c_a up > ip -n ns-A link set dev veth_a_c mtu 5000 > ip -n ns-A link set veth_c_a up > > ip -n ns-A link add br0 type bridge > ip -n ns-A link set br0 up > ip -n ns-A link set dev br0 mtu 5000 > ip -n ns-A link set veth_a_c master br0 > ip -n ns-A link set vxlan_a master br0 > > ip -n ns-A addr del 192.168.2.1/24 dev vxlan_a > ip -n ns-A addr add 192.168.2.1/24 dev br0 > > ip -n ns-C exec ping -c 1 -w 2 -M want -s 5000 192.168.2.2 > } > > I didn't check the test itself recently, I'm just copying from some > local changes I was trying last week, some commands might be wrong.
I fixed the exec typo, but yes even with my flowi4_oif hack it fails. > > The idea is: what if we now have another host (here, it's ns-C) sending > traffic to that bridge? Then the exception on a local interface isn't > enough, we actually need to send Fragmentation Needed back to where the > packet came from, and the bridge won't do it for us (with routing, it > already works). > > I haven't tried your hack, but I guess it would have the same problem. > What I saw in my tests and debug statements is that vxlan xmit does compensate for the tunnel overhead (e.g., skb_tunnel_check_pmtu in vxlan_xmit_one). It still feels like there are some minor details that are wrong - like the fib_lookup failing when called from the vxlan_xmit_one path. Does finding and fixing those make it work vs adding another config item? I can send my debug diff if it helps.