On Tue, Jan 09, 2018 at 09:19:39AM +0100, Steffen Klassert wrote:
> On Mon, Jan 08, 2018 at 02:53:48PM +0100, Tobias Hommel wrote:
> 
> ...
> 
> > [  439.095554] BUG: unable to handle kernel NULL pointer dereference at 
> > 0000000000000020
> > [  439.103664] IP: xfrm_lookup+0x2a/0x7d0
> > [  439.107551] PGD 0 P4D 0 
> > [  439.110144] Oops: 0000 [#1] SMP PTI
> > [  439.113653] Modules linked in:
> > [  439.116774] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.14.12 #1
> > [  439.122900] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 
> > 1.01 07/11/2016
> > [  439.130769] task: ffff8cf33b0ea280 task.stack: ffff9492c0090000
> > [  439.136726] RIP: 0010:xfrm_lookup+0x2a/0x7d0
> > [  439.141005] RSP: 0018:ffff8cf33fd83bd0 EFLAGS: 00010246
> > [  439.146315] RAX: 0000000000000000 RBX: ffffffff87074080 RCX: 
> > 0000000000000000
> > [  439.153537] RDX: ffff8cf33fd83c48 RSI: 0000000000000000 RDI: 
> > ffffffff87074080
> > [  439.160780] RBP: ffffffff87074080 R08: 0000000000000002 R09: 
> > 0000000000000000
> > [  439.167958] R10: 0000000000000020 R11: 0000000000000020 R12: 
> > ffff8cf33fd83c48
> > [  439.175115] R13: 0000000000000000 R14: 0000000000000002 R15: 
> > ffff8cf33b240078
> > [  439.182337] FS:  0000000000000000(0000) GS:ffff8cf33fd80000(0000) 
> > knlGS:0000000000000000
> > [  439.190456] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  439.196227] CR2: 0000000000000020 CR3: 000000013200a000 CR4: 
> > 00000000001006e0
> > [  439.203386] Call Trace:
> > [  439.205869]  <IRQ>
> > [  439.207886]  __xfrm_route_forward+0xa4/0x110
> > [  439.212195]  ip_forward+0x3da/0x450
> > [  439.215696]  ? ip_rcv_finish+0x61/0x390
> > [  439.219542]  ip_rcv+0x2b5/0x380
> > [  439.222716]  ? inet_del_offload+0x30/0x30
> > [  439.226736]  __netif_receive_skb_core+0x751/0xb00
> > [  439.231469]  ? netif_receive_skb_internal+0x47/0xf0
> > [  439.236391]  netif_receive_skb_internal+0x47/0xf0
> > [  439.241150]  napi_gro_flush+0x50/0x70
> > [  439.244831]  napi_complete_done+0x90/0xd0
> > [  439.248872]  igb_poll+0x8fd/0xe80
> > [  439.252190]  net_rx_action+0x1fc/0x310
> > [  439.255978]  __do_softirq+0xd5/0x1cf
> > [  439.259584]  irq_exit+0xa3/0xb0
> > [  439.262763]  do_IRQ+0x45/0xc0
> > [  439.265772]  common_interrupt+0x95/0x95
> > [  439.269609]  </IRQ>
> > [  439.271733] RIP: 0010:cpuidle_enter_state+0x120/0x200
> > [  439.276810] RSP: 0018:ffff9492c0093eb8 EFLAGS: 00000282 ORIG_RAX: 
> > ffffffffffffff5d
> > [  439.284436] RAX: ffff8cf33fd9ea80 RBX: 0000000000000002 RCX: 
> > 000000663c21ea0f
> > [  439.291604] RDX: 0000000000000000 RSI: 00000000355556ca RDI: 
> > 0000000000000000
> > [  439.298772] RBP: ffff8cf33fda71e8 R08: 0000000000000003 R09: 
> > 0000000000000018
> > [  439.305930] R10: 00000000ffffffff R11: 000000000000057c R12: 
> > 000000663c21ea0f
> > [  439.313089] R13: 000000663c1c6c33 R14: 0000000000000002 R15: 
> > 0000000000000000
> > [  439.320259]  ? cpuidle_enter_state+0x11c/0x200
> > [  439.324740]  do_idle+0xd6/0x170
> > [  439.327885]  cpu_startup_entry+0x67/0x70
> > [  439.331837]  start_secondary+0x167/0x190
> > [  439.335788]  secondary_startup_64+0xa5/0xb0
> > [  439.340001] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 
> > 89 d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 
> > <48> 8b 46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 0f 84 
> > [  439.358988] RIP: xfrm_lookup+0x2a/0x7d0 RSP: ffff8cf33fd83bd0
> > [  439.364759] CR2: 0000000000000020
> > [  439.368105] ---[ end trace c6b298b556ea7769 ]---
> > [  439.372752] Kernel panic - not syncing: Fatal exception in interrupt
> > [  439.379255] Kernel Offset: 0x5000000 from 0xffffffff81000000 (relocation 
> > range: 0xffffffff80000000-0xffffffffbfffffff)
> > [  439.390029] Rebooting in 10 seconds..
> 
> ...
> 
> > 0000000000004230 <xfrm_lookup>:
> >     4230:   41 57                   push   %r15
> >     4232:   41 56                   push   %r14
> >     4234:   45 89 c6                mov    %r8d,%r14d
> >     4237:   41 55                   push   %r13
> >     4239:   41 54                   push   %r12
> >     423b:   49 89 f5                mov    %rsi,%r13
> >     423e:   55                      push   %rbp
> >     423f:   53                      push   %rbx
> >     4240:   49 89 d4                mov    %rdx,%r12
> >     4243:   48 89 fb                mov    %rdi,%rbx
> >     4246:   48 83 ec 40             sub    $0x40,%rsp
> >     424a:   65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
> >     4251:   00 00 
> >     4253:   48 89 44 24 38          mov    %rax,0x38(%rsp)
> >     4258:   31 c0                   xor    %eax,%eax
> >     425a:   48 8b 46 20             mov    0x20(%rsi),%rax
> 
> 
> The above is the failing instruction, RSI holds the second argument
> of the called function which is a NULL pointer. The second argument
> of xfrm_lookup() is dst_orig, so it is as I thought. Now let's find
> out why. I don't see anything obvious, so we need to narrow it down.
> 
> > CONFIG_INET_ESP=y
> > CONFIG_INET_ESP_OFFLOAD=y
> 
> You have CONFIG_INET_ESP_OFFLOAD enabled, this is new maybe it
> still has some problems. You should not hit an offload codepath
> because all your SAs are configured with UDP encapsulation which
> is still not supported with offload.
> 
> Please try to disable GRO on both interfaces and see what happens:
> 
> ethtool -K eth0 gro off
> ethtool -K eth1 gro off
I actually already tried that with only eth1 off, to verify I turned offloading
off for both interfaces. The same problem: see attached panic.gro_off.log

> 
> Then disable CONFIG_INET_ESP_OFFLOAD and try again.
Rebuild with CONFIG_INET_ESP_OFFLOAD disabled, same problem: see attached
panic.esp_offload_disabled.log

> 
> This should show us if this feature is responsible for the bug.
> 

I will try narrowing down the problem by trying out some older kernels for now.
[  510.217190] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000020
[  510.225167] IP: xfrm_lookup+0x2a/0x7d0
[  510.228934] PGD 0 P4D 0 
[  510.231508] Oops: 0000 [#1] SMP PTI
[  510.235006] Modules linked in:
[  510.238085] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.14.12 #2
[  510.244116] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 1.01 
07/11/2016
[  510.251881] task: ffff9cb6bb0e8000 task.stack: ffffb2ffc0088000
[  510.257829] RIP: 0010:xfrm_lookup+0x2a/0x7d0
[  510.262127] RSP: 0018:ffff9cb6bfd43c40 EFLAGS: 00010246
[  510.267387] RAX: 0000000000000000 RBX: ffffffff83074080 RCX: 0000000000000000
[  510.274570] RDX: ffff9cb6bfd43cb8 RSI: 0000000000000000 RDI: ffffffff83074080
[  510.281729] RBP: ffffffff83074080 R08: 0000000000000002 R09: 0000000000000001
[  510.288888] R10: 0000000000000032 R11: 0000000000000000 R12: ffff9cb6bfd43cb8
[  510.296055] R13: 0000000000000000 R14: 0000000000000002 R15: ffff9cb6bb244078
[  510.303215] FS:  0000000000000000(0000) GS:ffff9cb6bfd40000(0000) 
knlGS:0000000000000000
[  510.311361] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  510.317135] CR2: 0000000000000020 CR3: 000000014c00a000 CR4: 00000000001006e0
[  510.324317] Call Trace:
[  510.326798]  <IRQ>
[  510.328855]  __xfrm_route_forward+0xa4/0x110
[  510.333152]  ip_forward+0x3da/0x450
[  510.336644]  ? ip_rcv_finish+0x61/0x390
[  510.340507]  ip_rcv+0x2b5/0x380
[  510.343654]  ? inet_del_offload+0x30/0x30
[  510.347693]  __netif_receive_skb_core+0x751/0xb00
[  510.352426]  ? inet_gro_receive+0x1fb/0x2b0
[  510.356646]  ? netif_receive_skb_internal+0x47/0xf0
[  510.361550]  netif_receive_skb_internal+0x47/0xf0
[  510.366284]  napi_gro_receive+0x70/0x90
[  510.370132]  gro_cell_poll+0x53/0x90
[  510.373736]  net_rx_action+0x1fc/0x310
[  510.377518]  __do_softirq+0xd5/0x1cf
[  510.381123]  irq_exit+0xa3/0xb0
[  510.384294]  do_IRQ+0x45/0xc0
[  510.387282]  common_interrupt+0x95/0x95
[  510.391148]  </IRQ>
[  510.393272] RIP: 0010:cpuidle_enter_state+0x120/0x200
[  510.398350] RSP: 0018:ffffb2ffc008beb8 EFLAGS: 00000282 ORIG_RAX: 
ffffffffffffff3c
[  510.405967] RAX: ffff9cb6bfd5ea80 RBX: 0000000000000002 RCX: 00000076cb4fb64c
[  510.413150] RDX: 0000000000000000 RSI: 00000000355556ca RDI: 0000000000000000
[  510.420326] RBP: ffff9cb6bfd671e8 R08: 0000000000000003 R09: 0000000000000018
[  510.427485] R10: 00000000ffffffff R11: 0000000000000139 R12: 00000076cb4fb64c
[  510.434695] R13: 00000076cb468ed2 R14: 0000000000000002 R15: 0000000000000000
[  510.441866]  ? cpuidle_enter_state+0x11c/0x200
[  510.446330]  do_idle+0xd6/0x170
[  510.449491]  cpu_startup_entry+0x67/0x70
[  510.453443]  start_secondary+0x167/0x190
[  510.457397]  secondary_startup_64+0xa5/0xb0
[  510.461616] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 89 
d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 <48> 8b 
46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 0f 84 
[  510.480605] RIP: xfrm_lookup+0x2a/0x7d0 RSP: ffff9cb6bfd43c40
[  510.486411] CR2: 0000000000000020
[  510.489758] ---[ end trace dc7eee0efd22329c ]---
[  510.494411] Kernel panic - not syncing: Fatal exception in interrupt
[  510.500923] Kernel Offset: 0x1000000 from 0xffffffff81000000 (relocation 
range: 0xffffffff80000000-0xffffffffbfffffff)
[  510.511676] Rebooting in 10 seconds..
[ 1425.327056] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000020
[ 1425.335100] IP: xfrm_lookup+0x2a/0x7d0
[ 1425.339062] PGD 0 P4D 0 
[ 1425.341645] Oops: 0000 [#1] SMP PTI
[ 1425.345275] Modules linked in:
[ 1425.348484] CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 4.14.12 #1
[ 1425.354958] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 1.01 
07/11/2016
[ 1425.363044] task: ffff986b3b210000 task.stack: ffff9c23000f0000
[ 1425.369234] RIP: 0010:xfrm_lookup+0x2a/0x7d0
[ 1425.373667] RSP: 0018:ffff9c23000f3b20 EFLAGS: 00010246
[ 1425.379032] RAX: 0000000000000000 RBX: ffffffff8e074080 RCX: 0000000000000000
[ 1425.386505] RDX: ffff9c23000f3b98 RSI: 0000000000000000 RDI: ffffffff8e074080
[ 1425.394062] RBP: ffffffff8e074080 R08: 0000000000000002 R09: 0000000000000000
[ 1425.401420] R10: 0000000000000020 R11: 0000000000000020 R12: ffff9c23000f3b98
[ 1425.408786] R13: 0000000000000000 R14: 0000000000000002 R15: ffff986b3b244078
[ 1425.416162] FS:  0000000000000000(0000) GS:ffff986b3fc80000(0000) 
knlGS:0000000000000000
[ 1425.424568] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1425.430566] CR2: 0000000000000020 CR3: 000000006e00a000 CR4: 00000000001006e0
[ 1425.437940] Call Trace:
[ 1425.440486]  __xfrm_route_forward+0xa4/0x110
[ 1425.444886]  ip_forward+0x3da/0x450
[ 1425.448552]  ? ip_rcv_finish+0x61/0x390
[ 1425.452519]  ip_rcv+0x2b5/0x380
[ 1425.455727]  ? inet_del_offload+0x30/0x30
[ 1425.459896]  __netif_receive_skb_core+0x751/0xb00
[ 1425.464819]  ? __alloc_pages_nodemask+0xc6/0x1f0
[ 1425.469723]  ? netif_receive_skb_internal+0x47/0xf0
[ 1425.474931]  netif_receive_skb_internal+0x47/0xf0
[ 1425.479768]  napi_gro_receive+0x70/0x90
[ 1425.483781]  igb_poll+0x600/0xe80
[ 1425.487283]  ? xfrm4_dst_destroy+0x6d/0x90
[ 1425.491544]  net_rx_action+0x1fc/0x310
[ 1425.495465]  __do_softirq+0xd5/0x1cf
[ 1425.499156]  run_ksoftirqd+0x14/0x30
[ 1425.502943]  smpboot_thread_fn+0xf9/0x150
[ 1425.507250]  kthread+0xf2/0x130
[ 1425.510577]  ? sort_range+0x20/0x20
[ 1425.514269]  ? kthread_park+0x60/0x60
[ 1425.518021]  ret_from_fork+0x1f/0x30
[ 1425.521757] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 89 
d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 <48> 8b 
46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 0f 84 
[ 1425.541379] RIP: xfrm_lookup+0x2a/0x7d0 RSP: ffff9c23000f3b20
[ 1425.547367] CR2: 0000000000000020
[ 1425.550748] ---[ end trace 9cc9a035940887e0 ]---
[ 1425.555444] Kernel panic - not syncing: Fatal exception in interrupt
[ 1425.562184] Kernel Offset: 0xc000000 from 0xffffffff81000000 (relocation 
range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1425.572955] Rebooting in 10 seconds..

Reply via email to