On Fri, Jan 05, 2018 at 10:13:23PM +0100, Tobias Hommel wrote: > Hi, > > I'm running into a NULL pointer dereference after updating from Linux 4.1.6 to > 4.14.11 (see kernel log below). I tried 4.14.3 initially which did not work > either. > Anyone has an idea what is happening here? > > The affected machine has 2 active ethernet interfaces (igb driver) and acts as > a VPN gateway running strongswan. There are several hundreds of IPSec > roadwarriors connecting to eth1. eth0 connects to an infrastructure running an > HTTP server. > During my tests these roadwarriors connect to the gateway, sometimes download > a > large file from the HTTP server, disconnect and after a random delay repeat > these steps. > > Some observations I made: > * SMP Affinity for IRQs of the NICs Rx/Tx queues (/proc/irq/$IRQ/smp_affinity) > * all affinities set to default ff is broken > * setting affinity for all queues of both interfaces to the same CPU seems > to > work fine (running stable for more than 1 day now) > * setting affinity of eth0 queues to CPU 1 and affinity of eth1 queues to > CPU > 2 is broken and seems to always trigger the bug on CPU 1 > * the top 6 entries of the call trace are the same every time the system > crashes, the other entries differ sometimes > > The bug is 100% reproducible on the Intel Atom machine from the log below and > also on a HP ProLiant Gen6 (also igb driver). > I can, of course, provide further information (CPU, NIC, kernel config, more > traces, etc.) if required. > If helpful I could also run tests on HP ProLiant Gen9 which has different NICs > (tg3). > > [ 7998.489094] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000020 > [ 7998.496993] IP: xfrm_lookup+0x2a/0x7e0
xfrm_lookup+0x2a is at the very beginning of xfrm_lookup(), here we find: u16 family = dst_orig->ops->family; ops has an offset of 32 bytes (20 hex) in dst_orig, so looks like dst_orig is NULL. In the forwarding case, we get dst_orig from the skb and dst_orig can't be NULL here unless the skb itself is already fishy. Can you provide the following informations: - Your kernel config - The output of 'ip x p' and 'ip x s' - An object dump of xfrm_policy.o if possible 'objdump -d -S net/xfrm/xfrm_policy.o' (The path to xfrm_policy.o depends on how you build your kernels)