On Wed, Jan 31, 2018 at 09:26:51PM +0100, Markus Berner wrote: > > I'm running into a NULL pointer dereference after updating from Linux > 4.1.6 to > > 4.14.11 (see kernel log below). > > We are running into the same problem on our production machine, running > CoreOS 1576.5.0 Stable with the 4.14.11 kernel on a KVM Cloud VM. It is not > as easy to reproduce though in our case – we observed a total of 5 crashes > in the last 2 weeks - all except one on the production machine. > > > I still can't reproduce it with my tests. This is probably some race > > triggered due to your aggressive roadwarrior setup which I don't have. > > We have a similar setup to Tobias > - 2 Network Interfaces (KVM/virtio): Public and local VLAN > - Strongswan VPN in Tunnel mode between local VLAN and on-premise network, > running in a Docker container > - Quite a few iptables NAT and forwarding rules regarding other local Docker > containers > > Some Observations: > - The workaround of locking the IRQs of the Rx/Tx queues of all network > interfaces to CPU0 Tobias described a while back did not prevent the crashes > in our case > - The bug does not seem to correlate with load in our case, but load in > general is quite low. > > I am happy to help if I can, but unfortunately our possibilities are a bit > limited; both due to lack of kernel dev know-how as well as trying out > changes to configuration on the production machine. I subscribed to LKML > only now to respond, so I hope the reply works (and to the correct message).
Thanks for offering help, but I fear we have to wait until Tobias has bisected it.