On Fri, Jan 05, 2018 at 10:13:23PM +0100, Tobias Hommel wrote:
> Hi,
> 
> I'm running into a NULL pointer dereference after updating from Linux 4.1.6 to
> 4.14.11 (see kernel log below). I tried 4.14.3 initially which did not work
> either.
> Anyone has an idea what is happening here?
> 
> The affected machine has 2 active ethernet interfaces (igb driver) and acts as
> a VPN gateway running strongswan. There are several hundreds of IPSec
> roadwarriors connecting to eth1. eth0 connects to an infrastructure running an
> HTTP server.
> During my tests these roadwarriors connect to the gateway, sometimes download 
> a
> large file from the HTTP server, disconnect and after a random delay repeat
> these steps.
> 
> Some observations I made:
> * SMP Affinity for IRQs of the NICs Rx/Tx queues (/proc/irq/$IRQ/smp_affinity)
>   * all affinities set to default ff is broken
>   * setting affinity for all queues of both interfaces to the same CPU seems 
> to
>     work fine (running stable for more than 1 day now)
>   * setting affinity of eth0 queues to CPU 1 and affinity of eth1 queues to 
> CPU
>     2 is broken and seems to always trigger the bug on CPU 1
> * the top 6 entries of the call trace are the same every time the system
>   crashes, the other entries differ sometimes
> 
> The bug is 100% reproducible on the Intel Atom machine from the log below and
> also on a HP ProLiant Gen6 (also igb driver).
> I can, of course, provide further information (CPU, NIC, kernel config, more
> traces, etc.) if required.
> If helpful I could also run tests on HP ProLiant Gen9 which has different NICs
> (tg3).
> 
> [ 7998.489094] BUG: unable to handle kernel NULL pointer dereference at 
> 0000000000000020
> [ 7998.496993] IP: xfrm_lookup+0x2a/0x7e0

xfrm_lookup+0x2a is at the very beginning of xfrm_lookup(), here we
find:

u16 family = dst_orig->ops->family;

ops has an offset of 32 bytes (20 hex) in dst_orig, so looks like
dst_orig is NULL.

In the forwarding case, we get dst_orig from the skb and dst_orig
can't be NULL here unless the skb itself is already fishy.

Can you provide the following informations:

- Your kernel config

- The output of 'ip x p' and 'ip x s'

- An object dump of xfrm_policy.o if possible 'objdump -d -S 
net/xfrm/xfrm_policy.o'
  (The path to xfrm_policy.o depends on how you build your kernels)

Reply via email to