On Fri, Jan 19, 2018 at 03:45:46PM +0100, Tobias Hommel wrote:
>
> I tried to strip down the system configuration and was able to reproduce the
> problem with a minimal configuration:
> * ipsets are not used anymore
> * no firewall markings are used any longer
> * iptables are "completely empty", i.e. all policies set to ACCEPT and there
> is
> no rule in any table
> * no additional routing policies (ip rule) except the default ones
> * only main routing table is used
> * using a "minimal" kernel config:
> * run `make defconfig`
> * add basic things (ESP, IGB driver, some crypto algorithms)
> * add options required to boot up the system (TPM crypt, some device mapper
> options, overlayfs)
>
> I attached the minimal config (minimal.config) and the defconfig for reference
> (minimal.defconfig).
>
> The setup is really simple now, the gateway is forwarding HTTP connections
> between eth1(IPSec tunnels) and eth0 without any firewall, NAT, whatsoever.
Thanks a lot for your debugging effort!
>
> The only thing I can think of are the rather aggressive roadwarrior clients.
> There are 750 roadwarriors that are controlled by a script which starts and
> stops the IPSec connection.
I still can't reproduce it with my tests. This is probably some race
triggered due to your aggressive roadwarrior setup which I don't have.
> I tried 4.15-rc8 and have the same problem here (see attached
> kernel-4.15-rc8.log). SMP affinity for IRQs has changed in 4.15 and
> something's
There is one patch that could influence this which is not in v4.15-rc8:
commit 76a4201191814a0061cb5c861fafb9ecaa764846
("xfrm: Fix a race in the xdst pcpu cache.")
It is included in v4.15-rc9.
If this does not fix your problem, I'm out of ideas. In this case
I have to ask to do a bisection to find the offending commit.