I have a system that establishes four L2TP over IPv6 tunnels using site-local 
addresses via the following:
ip l2tp add tunnel tunnel_id 1233 peer_tunnel_id 1233 encap ip local 
fd23:2355:accd::2:4 remote fd23:2355:accd::2:3
ip l2tp add session name net_l2tp1 tunnel_id 1233 session_id 1233 
peer_session_id 1233
ip link set dev net_l2tp1 up
ip l2tp add tunnel tunnel_id 1235 peer_tunnel_id 1235 encap ip local 
fd23:2355:accd::2:4 remote fd23:2355:accd::2:2
ip l2tp add session name net_l2tp2 tunnel_id 1235 session_id 1235 
peer_session_id 1235
ip link set dev net_l2tp2 up
ip l2tp add tunnel tunnel_id 2233 peer_tunnel_id 2233 encap ip local 
fd23:2355:accd::2:4 remote fd23:2355:accd::2:3
ip l2tp add session name net_l2tp3 tunnel_id 2233 session_id 2233 
peer_session_id 2233
ip link set dev net_l2tp3 up
ip l2tp add tunnel tunnel_id 2235 peer_tunnel_id 2235 encap ip local 
fd23:2355:accd::2:4 remote fd23:2355:accd::2:2
ip l2tp add session name net_l2tp4 tunnel_id 2235 session_id 2235 
peer_session_id 2235
ip link set dev net_l2tp4 up

These tunnels worked fine on kernel 4.4.  On kernel 4.15, there was a bug that 
caused intermittent L2TP packet errors, but everything worked fine after 
applying 4522a70db7aa5e77526a4079628578599821b193.

However, after upgrading to kernel 4.18 with 4522a70d (or upgrading to kernel 
5.0 which includes 4522a70d, or upgrading to the current master kernel branch), 
two of the four tunnels always fail to work properly after a reboot, although 
it appears random which two work and which two fail.

When I say "fail to work properly", the problem is that packets generated by 
the l2tp kernel modules (in response to a packet being sent to the associated 
net_l2tpX interface) are silently dropped.  The l2tp_debugfs kernel module 
reports that L2TP packets are being transmitted with no errors, iptables 
counters and nflog rules can be used to confirm that well-formed packets are 
generated and sent, but tcpdump does not see the packets being sent on any 
interface on the system.  iptables reports that the destination interface of 
the lost packets is "lo" (which is clearly incorrect and probably an indicator 
of the underlying issue), but `tcpdump -nnn -i lo` doesn't show any packets.  
Incoming L2TP packets appear to be processed correctly, only outgoing L2TP 
packets appear affected.

Reverting commit 93531c6743157d7e8c5792f8ed1a57641149d62c (identified by 
bisection) fixes this issue.

IPv4 L2TP tunnels do not appear affected by this issue.  Based on a few quick 
tests, it appears that switching to publicly-routable IPv6 addresses instead of 
site-local addresses seems to prevent this issue, although I haven't done 
sufficient testing of this, and it is not clear to me how the code in 93531c67 
might be affected by the type of IPv6 address, so this observation may be a red 
herring.  Manually deleting and re-creating a broken interface seems to make it 
work again, although I have not thoroughly experimented with making changes 
after boot time to see if the problem is entirely random, if it is based on the 
number of existing interfaces, if it is based on a boot-time timing issue, etc.

It is not obvious to me how commit 93531c6743157d7e8c5792f8ed1a57641149d62c 
causes this issue, or how it should be fixed.  Could someone take a look and 
point me in the right direction for further troubleshooting?

Thanks!

Reply via email to