Steffen Klassert wrote: > On Sun, Mar 17, 2019 at 11:37:55PM +0000, Bram Yvahk wrote: >> We've experienced an issue with VTI when the path-mtu is smaller than the size >> of the "client" packet. >> >> What happens: IPv4 packet from the client (i.e. another system in the LAN) >> attempts to transmit some data; IPv4 header shows that 'DF' bit is not set but >> still the client receives ICMPv4 "need-to-frag" message [which the client does >> not expect and ignores]. >> >> Example: $ ping -s 1300 -M dont -c5 192.168.235.2 >> PING 192.168.235.3 (192.168.235.3) 1300(1328) bytes of data. >> From 192.168.236.254 icmp_seq=1 Frag needed and DF set (mtu = 1214) >> From 192.168.236.254 icmp_seq=2 Frag needed and DF set (mtu = 1214) >> From 192.168.236.254 icmp_seq=3 Frag needed and DF set (mtu = 1214) >> From 192.168.236.254 icmp_seq=4 Frag needed and DF set (mtu = 1214) >> From 192.168.236.254 icmp_seq=5 Frag needed and DF set (mtu = 1214) >> >> --- 192.168.235.3 ping statistics --- >> 5 packets transmitted, 0 received, +5 errors, 100% packet loss, time 3999ms > > Hm, this works here. Can you show how you setup the vti device? > Some tunnel configuration options (set ttl etc.) force to have > the DF bit set.
I will provide these details Tommorow. What I can say is that ttl was set to inherit. When testing this there is one important bit - which in hindsight I should've included in the previous message - the (IPsec) Gateway A needs to know the path-mtu to (IPsec) Gateway B. Some ways to accomplish this: - transmit a ICMP with DF bit set and a larger packet size from Gateway A to Gateway B - ensure the "nopmtudisc" option is *not* set in the xfrm state and then let client A transmit a ICMP *with* DF bit set to client B. [when "nopmtudisc" is set then all outgoing IPv4 ESP packet have the DF bit cleared, when "nopmtudisc" is not set then DF bit is copied from the client packet] For testing purposes I recommend to do the ping from Gateway A to Gateway B. (Otherwise tcpdumps/traffic get a bit more confusing.) A more in-depth description of what happens: Setup: ====== |----------| |-----------| |-------| |-----------| |----------| | client A |---| Gateway A |---| Hop H |---| Gateway B |---| client B | ------------ |-----------| |-------| |-----------| |----------| - testing with linux 4.14.95 (setup with more recent kernel is WIP) - link mtu between client A and Gateway A: 1500 - link mtu between Gateway A and Hop H: 1500 - link mtu between Hop H and Gateway B: 1280 - link mtu between Gateway B and client B: 1500 - path-mtu between Gateway A and Gateway B: 1280 - IPsec tunnel over *IPv4* between Gateway A and Gateway B - tunneling IPv4 over the IPsec tunnel - testing with VTI Scenario: ========== Before starting it's important to ensure that: - Gateway A does *not* know the path-mtu to Gateway B - Client A does *not* know the path-mtu to Gateway B * Step 1: client A: $ ping -M dont -s 1300 ip_of_client_B - IPv4 ICMP packet of client A does not have DF bit set - IPv4 ESP packet of Gateway A does not have DF bit set - Hop H receives a IPv4 ESP packet that is too large for link-mtu between Hop H and Gateway B: it fragments the IPv4 ESP packet. - Gateway B receives 2 IPv4 fragmented packets - (Client B receives one IPv4 ICMP packet from client A) * Step 2: Gateway A: $ ping -M do -s 1300 ip_of_gateway_B - IPv4 ICMP packet of Gateway A does have DF bit set - Gateway A receives a 'need to frag' ICMP from Hop H * Step 3: client A: $ ping -M dont -s 1300 ip_of_client_B - IPv4 ICMP packet of client A does not have DF bit set - Gateway A: it process this packet in VTI module and detects that packet size > path-mtu and then sends a 'need to frag' ICMP to client A. [this is the code I patched] => the critical bit in the above is that Gateway A learns the path-mtu to Gateway B. If it doesn't then it keeps assuming path-mtu is 1500 and the check in VTI will not trigger (since path-mtu of 1500 > packet size)