On 7/18/20 11:58 AM, Stefano Brivio wrote:
> On Sat, 18 Jul 2020 11:02:46 -0600
> David Ahern <[email protected]> wrote:
> 
>> On 7/18/20 12:56 AM, Stefano Brivio wrote:
>>> On Fri, 17 Jul 2020 09:04:51 -0600
>>> David Ahern <[email protected]> wrote:
>>>   
>>>> On 7/17/20 6:27 AM, Stefano Brivio wrote:  
>>>>>>    
>>>>>>> Note that this doesn't work as it is because of a number of reasons
>>>>>>> (skb doesn't have a dst, pkt_type is not PACKET_HOST), and perhaps we
>>>>>>> shouldn't be using icmp_send(), but at a glance that looks simpler.     
>>>>>>>  
>>>>>>
>>>>>> Yes, it also requires that the bridge has IP connectivity
>>>>>> to reach the inner ip, which might not be the case.    
>>>>>
>>>>> If the VXLAN endpoint is a port of the bridge, that needs to be the
>>>>> case, right? Otherwise the VXLAN endpoint can't be reached.
>>>>>     
>>>>>>> Another slight preference I have towards this idea is that the only
>>>>>>> known way we can break PMTU discovery right now is by using a bridge,
>>>>>>> so fixing the problem there looks more future-proof than addressing any
>>>>>>> kind of tunnel with this problem. I think FoU and GUE would hit the
>>>>>>> same problem, I don't know about IP tunnels, sticking that selftest
>>>>>>> snippet to whatever other test in pmtu.sh should tell.      
>>>>>>
>>>>>> Every type of bridge port that needs to add additional header on egress
>>>>>> has this problem in the bridge scenario once the peer of the IP tunnel
>>>>>> signals a PMTU event.    
>>>>>
>>>>> Yes :(  
>>>>
>>>> The vxlan/tunnel device knows it is a bridge port, and it knows it is
>>>> going to push a udp and ip{v6} header. So why not use that information
>>>> in setting / updating the MTU? That's what I was getting at on Monday
>>>> with my comment about lwtunnel_headroom equivalent.  
>>>
>>> If I understand correctly, you're proposing something similar to my
>>> earlier draft from:
>>>
>>>     <20200713003813.01f2d5d3@elisabeth>
>>>     https://lore.kernel.org/netdev/20200713003813.01f2d5d3@elisabeth/
>>>
>>> the problem with it is that it wouldn't help: the MTU is already set to
>>> the right value for both port and bridge in the case Florian originally
>>> reported.  
>>
>> I am definitely hand waving; I have not had time to create a setup
>> showing the problem. Is there a reproducer using only namespaces?
> 
> And I'm laser pointing: check the bottom of that email ;)
> 

With this test case, the lookup fails:

[  144.689378] vxlan: vxlan_xmit_one: dev vxlan_a 10.0.1.1/57864 ->
10.0.0.0/4789 len 5010 gw 10.0.1.2
[  144.692755] vxlan: skb_tunnel_check_pmtu: dst dev br0 skb dev vxlan_a
skb len 5010 encap_mtu 4000 headroom 50
[  144.697682] vxlan: skb_dst_update_pmtu_no_confirm: calling
ip_rt_update_pmtu+0x0/0x160/ffffffff825ee850 for dev br0 mtu 3950
[  144.703601] IPv4: __ip_rt_update_pmtu: dev br0 mtu 3950 old_mtu 5000
192.168.2.1 -> 192.168.2.2
[  144.708177] IPv4: __ip_rt_update_pmtu: fib_lookup failed for
192.168.2.1 -> 192.168.2.2

Because the lookup fails, __ip_rt_update_pmtu skips creating the exception.

This hack gets the lookup to succeed:

fl4->flowi4_oif = dst->dev->ifindex;
or
fl4->flowi4_oif = 0;

and the test passes.

Reply via email to