On Tue, Jan 09, 2018 at 09:19:39AM +0100, Steffen Klassert wrote: > On Mon, Jan 08, 2018 at 02:53:48PM +0100, Tobias Hommel wrote: > > ... > > > [ 439.095554] BUG: unable to handle kernel NULL pointer dereference at > > 0000000000000020 > > [ 439.103664] IP: xfrm_lookup+0x2a/0x7d0 > > [ 439.107551] PGD 0 P4D 0 > > [ 439.110144] Oops: 0000 [#1] SMP PTI > > [ 439.113653] Modules linked in: > > [ 439.116774] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.14.12 #1 > > [ 439.122900] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS > > 1.01 07/11/2016 > > [ 439.130769] task: ffff8cf33b0ea280 task.stack: ffff9492c0090000 > > [ 439.136726] RIP: 0010:xfrm_lookup+0x2a/0x7d0 > > [ 439.141005] RSP: 0018:ffff8cf33fd83bd0 EFLAGS: 00010246 > > [ 439.146315] RAX: 0000000000000000 RBX: ffffffff87074080 RCX: > > 0000000000000000 > > [ 439.153537] RDX: ffff8cf33fd83c48 RSI: 0000000000000000 RDI: > > ffffffff87074080 > > [ 439.160780] RBP: ffffffff87074080 R08: 0000000000000002 R09: > > 0000000000000000 > > [ 439.167958] R10: 0000000000000020 R11: 0000000000000020 R12: > > ffff8cf33fd83c48 > > [ 439.175115] R13: 0000000000000000 R14: 0000000000000002 R15: > > ffff8cf33b240078 > > [ 439.182337] FS: 0000000000000000(0000) GS:ffff8cf33fd80000(0000) > > knlGS:0000000000000000 > > [ 439.190456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 439.196227] CR2: 0000000000000020 CR3: 000000013200a000 CR4: > > 00000000001006e0 > > [ 439.203386] Call Trace: > > [ 439.205869] <IRQ> > > [ 439.207886] __xfrm_route_forward+0xa4/0x110 > > [ 439.212195] ip_forward+0x3da/0x450 > > [ 439.215696] ? ip_rcv_finish+0x61/0x390 > > [ 439.219542] ip_rcv+0x2b5/0x380 > > [ 439.222716] ? inet_del_offload+0x30/0x30 > > [ 439.226736] __netif_receive_skb_core+0x751/0xb00 > > [ 439.231469] ? netif_receive_skb_internal+0x47/0xf0 > > [ 439.236391] netif_receive_skb_internal+0x47/0xf0 > > [ 439.241150] napi_gro_flush+0x50/0x70 > > [ 439.244831] napi_complete_done+0x90/0xd0 > > [ 439.248872] igb_poll+0x8fd/0xe80 > > [ 439.252190] net_rx_action+0x1fc/0x310 > > [ 439.255978] __do_softirq+0xd5/0x1cf > > [ 439.259584] irq_exit+0xa3/0xb0 > > [ 439.262763] do_IRQ+0x45/0xc0 > > [ 439.265772] common_interrupt+0x95/0x95 > > [ 439.269609] </IRQ> > > [ 439.271733] RIP: 0010:cpuidle_enter_state+0x120/0x200 > > [ 439.276810] RSP: 0018:ffff9492c0093eb8 EFLAGS: 00000282 ORIG_RAX: > > ffffffffffffff5d > > [ 439.284436] RAX: ffff8cf33fd9ea80 RBX: 0000000000000002 RCX: > > 000000663c21ea0f > > [ 439.291604] RDX: 0000000000000000 RSI: 00000000355556ca RDI: > > 0000000000000000 > > [ 439.298772] RBP: ffff8cf33fda71e8 R08: 0000000000000003 R09: > > 0000000000000018 > > [ 439.305930] R10: 00000000ffffffff R11: 000000000000057c R12: > > 000000663c21ea0f > > [ 439.313089] R13: 000000663c1c6c33 R14: 0000000000000002 R15: > > 0000000000000000 > > [ 439.320259] ? cpuidle_enter_state+0x11c/0x200 > > [ 439.324740] do_idle+0xd6/0x170 > > [ 439.327885] cpu_startup_entry+0x67/0x70 > > [ 439.331837] start_secondary+0x167/0x190 > > [ 439.335788] secondary_startup_64+0xa5/0xb0 > > [ 439.340001] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 > > 89 d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 > > <48> 8b 46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 0f 84 > > [ 439.358988] RIP: xfrm_lookup+0x2a/0x7d0 RSP: ffff8cf33fd83bd0 > > [ 439.364759] CR2: 0000000000000020 > > [ 439.368105] ---[ end trace c6b298b556ea7769 ]--- > > [ 439.372752] Kernel panic - not syncing: Fatal exception in interrupt > > [ 439.379255] Kernel Offset: 0x5000000 from 0xffffffff81000000 (relocation > > range: 0xffffffff80000000-0xffffffffbfffffff) > > [ 439.390029] Rebooting in 10 seconds.. > > ... > > > 0000000000004230 <xfrm_lookup>: > > 4230: 41 57 push %r15 > > 4232: 41 56 push %r14 > > 4234: 45 89 c6 mov %r8d,%r14d > > 4237: 41 55 push %r13 > > 4239: 41 54 push %r12 > > 423b: 49 89 f5 mov %rsi,%r13 > > 423e: 55 push %rbp > > 423f: 53 push %rbx > > 4240: 49 89 d4 mov %rdx,%r12 > > 4243: 48 89 fb mov %rdi,%rbx > > 4246: 48 83 ec 40 sub $0x40,%rsp > > 424a: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax > > 4251: 00 00 > > 4253: 48 89 44 24 38 mov %rax,0x38(%rsp) > > 4258: 31 c0 xor %eax,%eax > > 425a: 48 8b 46 20 mov 0x20(%rsi),%rax > > > The above is the failing instruction, RSI holds the second argument > of the called function which is a NULL pointer. The second argument > of xfrm_lookup() is dst_orig, so it is as I thought. Now let's find > out why. I don't see anything obvious, so we need to narrow it down. > > > CONFIG_INET_ESP=y > > CONFIG_INET_ESP_OFFLOAD=y > > You have CONFIG_INET_ESP_OFFLOAD enabled, this is new maybe it > still has some problems. You should not hit an offload codepath > because all your SAs are configured with UDP encapsulation which > is still not supported with offload. > > Please try to disable GRO on both interfaces and see what happens: > > ethtool -K eth0 gro off > ethtool -K eth1 gro off I actually already tried that with only eth1 off, to verify I turned offloading off for both interfaces. The same problem: see attached panic.gro_off.log
> > Then disable CONFIG_INET_ESP_OFFLOAD and try again. Rebuild with CONFIG_INET_ESP_OFFLOAD disabled, same problem: see attached panic.esp_offload_disabled.log > > This should show us if this feature is responsible for the bug. > I will try narrowing down the problem by trying out some older kernels for now.
[ 510.217190] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 510.225167] IP: xfrm_lookup+0x2a/0x7d0 [ 510.228934] PGD 0 P4D 0 [ 510.231508] Oops: 0000 [#1] SMP PTI [ 510.235006] Modules linked in: [ 510.238085] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.14.12 #2 [ 510.244116] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 1.01 07/11/2016 [ 510.251881] task: ffff9cb6bb0e8000 task.stack: ffffb2ffc0088000 [ 510.257829] RIP: 0010:xfrm_lookup+0x2a/0x7d0 [ 510.262127] RSP: 0018:ffff9cb6bfd43c40 EFLAGS: 00010246 [ 510.267387] RAX: 0000000000000000 RBX: ffffffff83074080 RCX: 0000000000000000 [ 510.274570] RDX: ffff9cb6bfd43cb8 RSI: 0000000000000000 RDI: ffffffff83074080 [ 510.281729] RBP: ffffffff83074080 R08: 0000000000000002 R09: 0000000000000001 [ 510.288888] R10: 0000000000000032 R11: 0000000000000000 R12: ffff9cb6bfd43cb8 [ 510.296055] R13: 0000000000000000 R14: 0000000000000002 R15: ffff9cb6bb244078 [ 510.303215] FS: 0000000000000000(0000) GS:ffff9cb6bfd40000(0000) knlGS:0000000000000000 [ 510.311361] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 510.317135] CR2: 0000000000000020 CR3: 000000014c00a000 CR4: 00000000001006e0 [ 510.324317] Call Trace: [ 510.326798] <IRQ> [ 510.328855] __xfrm_route_forward+0xa4/0x110 [ 510.333152] ip_forward+0x3da/0x450 [ 510.336644] ? ip_rcv_finish+0x61/0x390 [ 510.340507] ip_rcv+0x2b5/0x380 [ 510.343654] ? inet_del_offload+0x30/0x30 [ 510.347693] __netif_receive_skb_core+0x751/0xb00 [ 510.352426] ? inet_gro_receive+0x1fb/0x2b0 [ 510.356646] ? netif_receive_skb_internal+0x47/0xf0 [ 510.361550] netif_receive_skb_internal+0x47/0xf0 [ 510.366284] napi_gro_receive+0x70/0x90 [ 510.370132] gro_cell_poll+0x53/0x90 [ 510.373736] net_rx_action+0x1fc/0x310 [ 510.377518] __do_softirq+0xd5/0x1cf [ 510.381123] irq_exit+0xa3/0xb0 [ 510.384294] do_IRQ+0x45/0xc0 [ 510.387282] common_interrupt+0x95/0x95 [ 510.391148] </IRQ> [ 510.393272] RIP: 0010:cpuidle_enter_state+0x120/0x200 [ 510.398350] RSP: 0018:ffffb2ffc008beb8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff3c [ 510.405967] RAX: ffff9cb6bfd5ea80 RBX: 0000000000000002 RCX: 00000076cb4fb64c [ 510.413150] RDX: 0000000000000000 RSI: 00000000355556ca RDI: 0000000000000000 [ 510.420326] RBP: ffff9cb6bfd671e8 R08: 0000000000000003 R09: 0000000000000018 [ 510.427485] R10: 00000000ffffffff R11: 0000000000000139 R12: 00000076cb4fb64c [ 510.434695] R13: 00000076cb468ed2 R14: 0000000000000002 R15: 0000000000000000 [ 510.441866] ? cpuidle_enter_state+0x11c/0x200 [ 510.446330] do_idle+0xd6/0x170 [ 510.449491] cpu_startup_entry+0x67/0x70 [ 510.453443] start_secondary+0x167/0x190 [ 510.457397] secondary_startup_64+0xa5/0xb0 [ 510.461616] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 89 d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 <48> 8b 46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 0f 84 [ 510.480605] RIP: xfrm_lookup+0x2a/0x7d0 RSP: ffff9cb6bfd43c40 [ 510.486411] CR2: 0000000000000020 [ 510.489758] ---[ end trace dc7eee0efd22329c ]--- [ 510.494411] Kernel panic - not syncing: Fatal exception in interrupt [ 510.500923] Kernel Offset: 0x1000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 510.511676] Rebooting in 10 seconds..
[ 1425.327056] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 1425.335100] IP: xfrm_lookup+0x2a/0x7d0 [ 1425.339062] PGD 0 P4D 0 [ 1425.341645] Oops: 0000 [#1] SMP PTI [ 1425.345275] Modules linked in: [ 1425.348484] CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 4.14.12 #1 [ 1425.354958] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 1.01 07/11/2016 [ 1425.363044] task: ffff986b3b210000 task.stack: ffff9c23000f0000 [ 1425.369234] RIP: 0010:xfrm_lookup+0x2a/0x7d0 [ 1425.373667] RSP: 0018:ffff9c23000f3b20 EFLAGS: 00010246 [ 1425.379032] RAX: 0000000000000000 RBX: ffffffff8e074080 RCX: 0000000000000000 [ 1425.386505] RDX: ffff9c23000f3b98 RSI: 0000000000000000 RDI: ffffffff8e074080 [ 1425.394062] RBP: ffffffff8e074080 R08: 0000000000000002 R09: 0000000000000000 [ 1425.401420] R10: 0000000000000020 R11: 0000000000000020 R12: ffff9c23000f3b98 [ 1425.408786] R13: 0000000000000000 R14: 0000000000000002 R15: ffff986b3b244078 [ 1425.416162] FS: 0000000000000000(0000) GS:ffff986b3fc80000(0000) knlGS:0000000000000000 [ 1425.424568] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1425.430566] CR2: 0000000000000020 CR3: 000000006e00a000 CR4: 00000000001006e0 [ 1425.437940] Call Trace: [ 1425.440486] __xfrm_route_forward+0xa4/0x110 [ 1425.444886] ip_forward+0x3da/0x450 [ 1425.448552] ? ip_rcv_finish+0x61/0x390 [ 1425.452519] ip_rcv+0x2b5/0x380 [ 1425.455727] ? inet_del_offload+0x30/0x30 [ 1425.459896] __netif_receive_skb_core+0x751/0xb00 [ 1425.464819] ? __alloc_pages_nodemask+0xc6/0x1f0 [ 1425.469723] ? netif_receive_skb_internal+0x47/0xf0 [ 1425.474931] netif_receive_skb_internal+0x47/0xf0 [ 1425.479768] napi_gro_receive+0x70/0x90 [ 1425.483781] igb_poll+0x600/0xe80 [ 1425.487283] ? xfrm4_dst_destroy+0x6d/0x90 [ 1425.491544] net_rx_action+0x1fc/0x310 [ 1425.495465] __do_softirq+0xd5/0x1cf [ 1425.499156] run_ksoftirqd+0x14/0x30 [ 1425.502943] smpboot_thread_fn+0xf9/0x150 [ 1425.507250] kthread+0xf2/0x130 [ 1425.510577] ? sort_range+0x20/0x20 [ 1425.514269] ? kthread_park+0x60/0x60 [ 1425.518021] ret_from_fork+0x1f/0x30 [ 1425.521757] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 89 d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 <48> 8b 46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 0f 84 [ 1425.541379] RIP: xfrm_lookup+0x2a/0x7d0 RSP: ffff9c23000f3b20 [ 1425.547367] CR2: 0000000000000020 [ 1425.550748] ---[ end trace 9cc9a035940887e0 ]--- [ 1425.555444] Kernel panic - not syncing: Fatal exception in interrupt [ 1425.562184] Kernel Offset: 0xc000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 1425.572955] Rebooting in 10 seconds..