We already ship lib/systemd/system-sleep/hibinit-agent that does post
things in hibinit-resume to workaround things. Including stuff about
networking, due to previous issues with xen-netfront.

Given we already restart systemd-networkd, I wonder if we should do
better and not keep xen-netfront loaded in the kernel across suspend and
resume.

as in
1) extend hibinit-agent.system-sleep to also have pre command
2) in the pre step unload xen-netfront
3) in the post step only load xen-netfront
4) potentially still stop systemd-networkd in pre; which will be restarted 
after loading xen-netfront in post

Since xen-netfront is causing us issues; and we are already unloading
and reloading it; it makes sense to unload it before suspending
altogether.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1934424

Title:
  kernel NULL pointer dereference during xen hibernation

Status in linux-aws package in Ubuntu:
  Confirmed

Bug description:
  SRU Justification

  [Impact]

  Encountered the following panic while doing hibernation/resume testing
  with linux-aws 5.8 on Focal on an m3.xlarge (xen) instance type:

  [  594.291317] ACPI: Hardware changed while hibernated, success doubtful!
  [  594.411609] BUG: kernel NULL pointer dereference, address: 00000000000001f4
  [  594.424658] #PF: supervisor write access in kernel mode
  [  594.424660] #PF: error_code(0x0002) - not-present page
  [  594.424661] PGD 0 P4D 0
  [  594.424665] Oops: 0002 [#1] SMP PTI
  [  594.424668] CPU: 3 PID: 362 Comm: systemd-timesyn Not tainted 
5.8.0-1036-aws #38~20.04.1-Ubuntu
  [  594.424669] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
  [  594.424675] RIP: 0010:_raw_spin_lock_irqsave+0x23/0x40
  [  594.424678] Code: 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 9c 
58 0f 1f 44 00 00 49 89 c4 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 
17 75 07 4c 89 e0 41 5c 5d c3 89 c6 e8 e9 d1 56 ff 66 90
  [  594.424679] RSP: 0018:ffffc900004e3848 EFLAGS: 00010046
  [  594.424680] RAX: 0000000000000000 RBX: ffff8883bcc0d000 RCX: 
0000000000000e02
  [  594.424681] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
00000000000001f4
  [  594.424682] RBP: ffffc900004e3850 R08: ffff8883b90b5ec0 R09: 
000000000000005a
  [  594.424683] R10: ffffc900004e3910 R11: 0000000000000000 R12: 
0000000000000206
  [  594.424684] R13: ffffea000ee42d40 R14: 0000000000000000 R15: 
0000000000000001
  [  594.424686] FS:  00007f65ba055980(0000) GS:ffff8883c0ac0000(0000) 
knlGS:0000000000000000
  [  594.424687] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  594.424688] CR2: 00000000000001f4 CR3: 00000003b99f0001 CR4: 
00000000001606e0
  [  594.424692] Call Trace:
  [  594.424699]  xennet_start_xmit+0x158/0x570
  [  594.424704]  dev_hard_start_xmit+0x91/0x1f0
  [  594.424706]  ? validate_xmit_skb+0x300/0x340
  [  594.424710]  sch_direct_xmit+0x113/0x340
  [  594.424712]  __dev_queue_xmit+0x57c/0x8e0
  [  594.424714]  ? neigh_add_timer+0x37/0x60
  [  594.424716]  dev_queue_xmit+0x10/0x20
  [  594.424717]  neigh_resolve_output+0x112/0x1c0
  [  594.424721]  ip_finish_output2+0x19b/0x590
  [  594.424723]  __ip_finish_output+0xc8/0x1e0
  [  594.424725]  ip_finish_output+0x2d/0xb0
  [  594.424728]  ip_output+0x7a/0xf0
  [  594.424730]  ? __ip_finish_output+0x1e0/0x1e0
  [  594.424732]  ip_local_out+0x3d/0x50
  [  594.424734]  ip_send_skb+0x19/0x40
  [  594.424737]  udp_send_skb.isra.0+0x165/0x390
  [  594.424739]  udp_sendmsg+0xb0e/0xd50
  [  594.424742]  ? ip_reply_glue_bits+0x50/0x50
  [  594.424747]  ? delete_from_swap_cache+0x6a/0x90
  [  594.424750]  ? _cond_resched+0x19/0x30
  [  594.424754]  ? aa_sk_perm+0x43/0x1b0
  [  594.424757]  inet_sendmsg+0x65/0x70
  [  594.424759]  ? security_socket_sendmsg+0x35/0x50
  [  594.424760]  ? inet_sendmsg+0x65/0x70
  [  594.424764]  sock_sendmsg+0x5e/0x70
  [  594.424766]  __sys_sendto+0x113/0x190
  [  594.424770]  ? __secure_computing+0x42/0xe0
  [  594.424774]  ? syscall_trace_enter+0x10d/0x280
  [  594.424777]  __x64_sys_sendto+0x29/0x30
  [  594.424781]  do_syscall_64+0x49/0xc0
  [  594.424783]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [  594.424785] RIP: 0033:0x7f65baee4844
  [  594.424788] Code: 42 3f f7 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 
24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 
f0 ff ff 77 30 89 ef 48 89 44 24 08 e8 68 3f f7 ff 48 8b
  [  594.424789] RSP: 002b:00007ffe9b5fd3a0 EFLAGS: 00000293 ORIG_RAX: 
000000000000002c
  [  594.424790] RAX: ffffffffffffffda RBX: 00007ffe9b5fd4e0 RCX: 
00007f65baee4844
  [  594.424791] RDX: 0000000000000030 RSI: 00007ffe9b5fd3f0 RDI: 
0000000000000010
  [  594.424792] RBP: 0000000000000000 R08: 0000560426541678 R09: 
0000000000000010
  [  594.424793] R10: 0000000000000040 R11: 0000000000000293 R12: 
0000000000000000
  [  594.424794] R13: 00007ffe9b5fd3e4 R14: 0000000000000068 R15: 
0000000000000000
  [  594.424796] Modules linked in: btrfs blake2b_generic xor raid6_pq ufs 
msdos xfs libcrc32c dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd 
psmouse cryptd input_leds glue_helper serio_raw floppy sch_fq_codel drm 
ip_tables x_tables autofs4
  [  594.424813] CR2: 00000000000001f4
  [  594.424821] ---[ end trace bb5f35055c1a8060 ]---
  [  594.424822] RIP: 0010:_raw_spin_lock_irqsave+0x23/0x40
  [  594.424824] Code: 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 9c 
58 0f 1f 44 00 00 49 89 c4 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 
17 75 07 4c 89 e0 41 5c 5d c3 89 c6 e8 e9 d1 56 ff 66 90
  [  594.424824] RSP: 0018:ffffc900004e3848 EFLAGS: 00010046
  [  594.424825] RAX: 0000000000000000 RBX: ffff8883bcc0d000 RCX: 
0000000000000e02
  [  594.424826] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
00000000000001f4
  [  594.424826] RBP: ffffc900004e3850 R08: ffff8883b90b5ec0 R09: 
000000000000005a
  [  594.424827] R10: ffffc900004e3910 R11: 0000000000000000 R12: 
0000000000000206
  [  594.424827] R13: ffffea000ee42d40 R14: 0000000000000000 R15: 
0000000000000001
  [  594.424828] FS:  00007f65ba055980(0000) GS:ffff8883c0ac0000(0000) 
knlGS:0000000000000000
  [  594.424829] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  594.424830] CR2: 00000000000001f4 CR3: 00000003b99f0001 CR4: 
00000000001606e0
  [  594.424830] Kernel panic - not syncing: Fatal exception in interrupt
  [  594.425091] Kernel Offset: disabled

  This log was captured from the AWS system log (which is not a live
  console as this is a xen instance type). From the user perspective,
  the instance appears to have just rebooted during the hibernation /
  resume. It's only by logging the console that we have any information
  at all.

  This is not a frequent event. From 100 test runs on this instance, two
  failures were observed: this failure and another which left no traces
  of the issue.

  [Test Plan]

  Repeatedly hibernate/resume an EC2 instance.

  [Where problems could occur]

  The network must be started as if from boot. DHCP could fail.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1934424/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to