This is documented and fixed by mainline commit: commit d1ecfa9d1f402366b1776fbf84e635678a51414f Author: van der Linden, Frank <fllin...@amazon.com> Date: Fri May 4 16:11:00 2018 -0400
x86/xen: Reset VCPU0 info pointer after shared_info remap This patch fixes crashes during boot for HVM guests on older (pre HVM vector callback) Xen versions. Without this, current kernels will always fail to boot on those Xen versions. Sample stack trace: BUG: unable to handle kernel paging request at ffffffffff200000 IP: __xen_evtchn_do_upcall+0x1e/0x80 PGD 1e0e067 P4D 1e0e067 PUD 1e10067 PMD 235c067 PTE 0 Oops: 0002 [#1] SMP PTI Modules linked in: CPU: 0 PID: 512 Comm: kworker/u2:0 Not tainted 4.14.33-52.13.amzn1.x86_64 #1 Hardware name: Xen HVM domU, BIOS 3.4.3.amazon 11/11/2016 task: ffff88002531d700 task.stack: ffffc90000480000 RIP: 0010:__xen_evtchn_do_upcall+0x1e/0x80 RSP: 0000:ffff880025403ef0 EFLAGS: 00010046 RAX: ffffffff813cc760 RBX: ffffffffff200000 RCX: ffffc90000483ef0 RDX: ffff880020540a00 RSI: ffff880023c78000 RDI: 000000000000001c RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff880025403f5c R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880025400000(0000) knlGS:0000000000000 000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff200000 CR3: 0000000001e0a000 CR4: 00000000000006f0 Call Trace: <IRQ> do_hvm_evtchn_intr+0xa/0x10 __handle_irq_event_percpu+0x43/0x1a0 handle_irq_event_percpu+0x20/0x50 handle_irq_event+0x39/0x60 handle_fasteoi_irq+0x80/0x140 handle_irq+0xaf/0x120 do_IRQ+0x41/0xd0 common_interrupt+0x7d/0x7d </IRQ> During boot, the HYPERVISOR_shared_info page gets remapped to make it work with KASLR. This means that any pointer derived from it needs to be adjusted. The only value that this applies to is the vcpu_info pointer for VCPU 0. For PV and HVM with the callback vector feature, this gets done via the smp_ops prepare_boot_cpu callback. Older Xen versions do not support the HVM callback vector, so there is no Xen-specific smp_ops set up in that scenario. So, the vcpu_info pointer for VCPU 0 never gets set to the proper value, and the first reference of it will be bad. Fix this by resetting it immediately after the remap. Signed-off-by: Frank van der Linden <fllin...@amazon.com> Reviewed-by: Eduardo Valentin <edu...@amazon.com> Reviewed-by: Alakesh Haloi <alake...@amazon.com> Reviewed-by: Vallish Vaidyeshwara <vall...@amazon.com> Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com> Cc: Juergen Gross <jgr...@suse.com> Cc: Boris Ostrovsky <boris.ostrov...@oracle.com> Cc: xen-de...@lists.xenproject.org Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com> ** Changed in: linux-aws (Ubuntu Bionic) Status: New => In Progress ** Also affects: linux (Ubuntu) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Kamal Mostafa (kamalmostafa) ** Changed in: linux (Ubuntu Bionic) Status: New => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1771679 Title: Kernel panic on boot (m1.small in cn-north-1) Status in linux package in Ubuntu: New Status in linux-aws package in Ubuntu: New Status in linux source package in Bionic: In Progress Status in linux-aws source package in Bionic: In Progress Bug description: We're observing the following panic on boot when trying to boot an m1.small in cn-north-1 (which is a region in AWS China): [ 2.271681] Hardware name: Xen HVM domU, BIOS 3.4.3.amazon 11/11/2016 [ 2.271681] RIP: 0010:__xen_evtchn_do_upcall+0x24/0x80 [ 2.271681] RSP: 0000:ffff8e21aa003ea0 EFLAGS: 00010046 [ 2.271681] RAX: ffffffff9dd82920 RBX: ffffffffff200000 RCX: 000000008728b34e [ 2.271681] RDX: ffff8e21a3252800 RSI: ffff8e21a724e000 RDI: 000000000000001c [ 2.271681] RBP: ffff8e21aa003eb8 R08: ffffffff9ea05040 R09: 0000000000000000 [ 2.271681] R10: ffff8e21aa003f28 R11: 0000000000000000 R12: 0000000000000001 [ 2.271681] R13: 0000000000000000 R14: 0000000000000022 R15: ffff8e21a3246900 [ 2.271681] FS: 0000000000000000(0000) GS:ffff8e21aa000000(0000) knlGS:0000000000000000 [ 2.271681] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.271681] CR2: ffffffffff200000 CR3: 000000000b80a000 CR4: 00000000000006f0 [ 2.271681] Call Trace: [ 2.271681] <IRQ> [ 2.271681] xen_hvm_evtchn_do_upcall+0xe/0x10 [ 2.271681] do_hvm_evtchn_intr+0xe/0x20 [ 2.271681] __handle_irq_event_percpu+0x44/0x1a0 [ 2.271681] handle_irq_event_percpu+0x32/0x80 [ 2.271681] handle_irq_event+0x3b/0x60 [ 2.271681] handle_fasteoi_irq+0x75/0x130 [ 2.271681] handle_irq+0x20/0x30 [ 2.271681] do_IRQ+0x46/0xd0 [ 2.271681] common_interrupt+0x84/0x84 [ 2.271681] </IRQ> [ 2.271681] RIP: 0010:native_safe_halt+0x6/0x10 [ 2.271681] RSP: 0000:ffffffff9ea03e28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd [ 2.271681] RAX: ffffffff9e12fba0 RBX: 0000000000000000 RCX: 0000000000000000 [ 2.271681] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 2.271681] RBP: ffffffff9ea03e28 R08: 0000000000000002 R09: ffffffff9ea03e18 [ 2.271681] R10: ffffffff9ea03da0 R11: 00000000eb8a1d25 R12: 0000000000000000 [ 2.271681] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 2.271681] ? __cpuidle_text_start+0x8/0x8 [ 2.271681] default_idle+0x20/0x100 [ 2.271681] arch_cpu_idle+0x15/0x20 [ 2.271681] default_idle_call+0x23/0x30 [ 2.271681] do_idle+0x17f/0x1b0 [ 2.271681] cpu_startup_entry+0x73/0x80 [ 2.271681] rest_init+0xae/0xb0 [ 2.271681] start_kernel+0x4dc/0x4fd [ 2.271681] x86_64_start_reservations+0x24/0x26 [ 2.271681] x86_64_start_kernel+0x74/0x77 [ 2.271681] secondary_startup_64+0xa5/0xb0 [ 2.271681] Code: 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 65 48 8b 1d 92 dc 29 62 41 bc 01 00 00 00 65 44 8b 2d 8c 8c 29 62 <c6> 03 00 44 89 e0 65 0f c1 05 2e 96 2a 62 85 c0 75 3b 48 8b 05 [ 2.271681] RIP: __xen_evtchn_do_upcall+0x24/0x80 RSP: ffff8e21aa003ea0 [ 2.271681] CR2: ffffffffff200000 [ 2.271681] ---[ end trace 956d0f4244642614 ]--- [ 2.271681] Kernel panic - not syncing: Fatal exception in interrupt I've attached the full console log. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1771679/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp