Hello, On Tue, Jun 12, 2018 at 10:29 AM, Kristian Evensen <kristian.even...@gmail.com> wrote: > Thanks for spending time on this. I will see what I can manage in > terms of a bisect. Our last good kernel was 4.9, so at least it > narrows the scope down a bit compared to 4.4 or 4.1.
I hope we might have got somewhere. While looking more into ipsec and 4.14, we noticed large performance regressions (-~20%) on some low-powered devices we are also using. We quickly identified the removal of the flow cache as the "culprit", and the performance regression is discussed in the netdev-thread for the removal of the cache ("xfrm: remove flow cache"). For the time being and in order to restore the performance, we have reverted the patch series removing the flow cache. When running our tests (on the APU) after the revert, we no longer see the crash. Before the revert, the APU would always crash within some hours. After the revert, our tests have been running for 24 hours+. Our test is quite basic, we establish 1, 2, 3 ..., 50 tunnels and then run iperf on all tunnels in parallel. The tunnels are teared down between each iteration. We are still running the test and will keep doing so, but I thought I should share this finding in case it can help in fixing the error. I will report back in case we find out something more, and please let me know if you have any suggestions for things I can test. I don't for example know if it is safe to revert one and one commit of the flow cache, to try to pin the crash even more down. BR, Kristian