Hello,
it took a while to build a testsystem for bisecting the issue. Finally I've
identified the patch that causes my problems.
BTW. The fq packet network scheduler is in use.
It's
[PATCH net-next] tcp/fq: move back to CLOCK_MONOTONIC
In the recent TCP/EDT patch series, I switched TCP and sch_fq clocks from
MONOTONIC to TAI, in order to meet the choice done
earlier for sch_etf packet scheduler.
But sure enough, this broke some setups were the TAI clock jumps forward (by
almost 50 year...), as reported by Leonard Crestez.
If we want to converge later, we'll probably need to add an skb field to
differentiate the clock bases, or a socket option.
In the meantime, an UDP application will need to use CLOCK_MONOTONIC base for
its SCM_TXTIME timestamps if using fq
packet scheduler.
Fixes: 72b0094f9182 ("tcp: switch tcp_clock_ns() to CLOCK_TAI base")
Fixes: 142537e41923 ("net_sched: sch_fq: switch to CLOCK_TAI")
Fixes: fd2bca2aa789 ("tcp: switch internal pacing timer to CLOCK_TAI")
Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
Reported-by: Leonard Crestez <leonard.crestez@xxxxxxx>
----
After reverting it in a current 5.2.18 kernel, the problem disappears. There
were some post fixes for other issues caused by this
patch. These fixed other similar issues, but not mine. I've already tried to
set the tstamp to zero in xfrm4_output.c, but with no
luck so far. I'm pretty sure, that reverting the clock patch isn't the proper
solution for upstream. So I what other way this can
be fixed?
---
[PATCH net] net: clear skb->tstamp in bridge forwarding path
Matteo reported forwarding issues inside the linux bridge, if the enslaved
interfaces use the fq qdisc.
Similar to commit 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths"),
we need to clear the tstamp field in
the bridge forwarding path.
Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Reported-and-tested-by: Matteo Croce <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
and
net: clear skb->tstamp in forwarding paths
Sergey reported that forwarding was no longer working if fq packet scheduler
was used.
This is caused by the recent switch to EDT model, since incoming packets might
have been timestamped by __net_timestamp()
__net_timestamp() uses ktime_get_real(), while fq expects packets using
CLOCK_MONOTONIC base.
The fix is to clear skb->tstamp in forwarding paths.
Fixes: 80b14dee ("net: Add a new socket option for a future transmit time.")
Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: default avatarEric Dumazet <[email protected]>
Reported-by: default avatarSergey Matyukevich <[email protected]>
Tested-by: default avatarSergey Matyukevich <[email protected]>
Signed-off-by: default avatarDavid S. Miller <[email protected]>
Best regards,
--
Thomas Bartschies
CVK IT Systeme
-----Ursprüngliche Nachricht-----
Von: Bartschies, Thomas
Gesendet: Dienstag, 17. September 2019 09:28
An: 'David Ahern' <[email protected]>; '[email protected]'
<[email protected]>
Betreff: AW: big ICMP requests get disrupted on IPSec tunnel activation
Hello,
thanks for the suggestion. Running pmtu.sh with kernel versions 4.19, 4.20 and
even 5.2.13 made no difference. All tests were successful every time.
Although my external ping tests still failing with the newer kernels. I've ran
the script after triggering my problem, to make sure all possible side effects
happening.
Please keep in mind, that even when the ICMP requests stalling, other
connections still going through. Like e.g. ssh or tracepath. I would expect
that all connection types would be affected if this is a MTU problem. Am I
wrong?
Any suggestions for more tests to isolate the cause?
Best regards,
--
Thomas Bartschies
CVK IT Systeme
-----Ursprüngliche Nachricht-----
Von: David Ahern [mailto:[email protected]]
Gesendet: Freitag, 13. September 2019 19:13
An: Bartschies, Thomas <[email protected]>; '[email protected]'
<[email protected]>
Betreff: Re: big ICMP requests get disrupted on IPSec tunnel activation
On 9/13/19 9:59 AM, Bartschies, Thomas wrote:
> Hello together,
>
> since kenel 4.20 we're observing a strange behaviour when sending big ICMP
> packets. An example is a packet size of 3000 bytes.
> The packets should be forwarded by a linux gateway (firewall) having multiple
> interfaces also acting as a vpn gateway.
>
> Test steps:
> 1. Disabled all iptables rules
> 2. Enabled the VPN IPSec Policies.
> 3. Start a ping with packet size (e.g. 3000 bytes) from a client in
> the DMZ passing the machine targeting another LAN machine 4. Ping
> works 5. Enable a VPN policy by sending pings from the gateway to a
> tunnel target. System tries to create the tunnel 6. Ping from 3. immediately
> stalls. No error messages. Just stops.
> 7. Stop Ping from 3. Start another without packet size parameter. Stalls also.
>
> Result:
> Connections from the client to other services on the LAN machine still
> work. Tracepath works. Only ICMP requests do not pass the gateway
> anymore. tcpdump sees them on incoming interface, but not on the outgoing LAN
> interface. IMCP requests to any other target IP address in LAN still work.
> Until one uses a bigger packet size. Then these alternative connections stall
> also.
>
> Flushing the policy table has no effect. Flushing the conntrack table has no
> effect. Setting rp_filter to loose (2) has no effect.
> Flush the route cache has no effect.
>
> Only a reboot of the gateway restores normal behavior.
>
> What can be the cause? Is this a networking bug?
>
some of these most likely will fail due to other reasons, but can you run
'tools/testing/selftests/net/pmtu.sh'[1] on 4.19 and then 4.20 and compare
results. Hopefully it will shed some light on the problem and can be used to
bisect to a commit that caused the regression.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/net/pmtu.sh