Based on a response to xen-devel post I've cherry-picked these commits to our 5.15 kernel build and since then we have not encountered this problem.
6cf3e4c0d29102c74aca1ce0c1710be9d02e440e # x86/entry: Cleanup PARAVIRT 1462eb381b4c27576a3e818bc9f918765d327fdf # x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays 8b87d8cec1b31ea710568ae49ba5f5146318da0d # x86/entry,xen: Early rewrite of restore_regs_and_return_to_kernel() bbf92368b0b1fe472d489e62d3340d7897e9c697 # x86/text-patching: Make text_gen_insn() play nice with ANNOTATE_NOENDBR ba27d1a80871eb8dbeddf34ec7d396c149cbb8d7 # x86/ibt,paravirt: Use text_gen_insn() for paravirt_patch() -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-meta in Ubuntu. https://bugs.launchpad.net/bugs/2045248 Title: focal: 5.15.0-91 crashes on boot as Xen PV guest Status in linux-meta package in Ubuntu: New Bug description: We have a custom build of the kernel based on the Ubuntu- hwe-5.15-5.15.0-91.101_20.04.1 tag. It includes a small number of patches but nothing in the area of the early boot code. Xen is based on the upstream 4.15.5 stable branch with all patches up to and including XSA-444. In approximately 1% of pv guest boots we get the following crash which looks like it involves the entry_64.S code. We have seen this on different hardware models but only with an Intel processor although we don't have any AMD based systems. The problem was also observed with the 5.15.0-85 tag. I have had a look on the main line kernel branch for arch/x86/entry changes but I can't obviously connect this problem to anything there based on the commit messages. I don't have the knowledge to understand the code though and whether there is actually something relevant. ``` [ 0.303715] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization [ 0.303727] Spectre V2 : Mitigation: Enhanced IBRS [ 0.303733] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch [ 0.303740] Spectre V2 : Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT [ 0.303746] RETBleed: Mitigation: Enhanced IBRS [ 0.303752] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier [ 0.303760] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp [ 0.303771] MMIO Stale Data: Mitigation: Clear CPU buffers [ 0.303777] GDS: Unknown: Dependent on hypervisor status [ 0.303827] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [ 0.303835] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [ 0.303840] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [ 0.303846] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' [ 0.303851] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' [ 0.303857] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' [ 0.303865] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [ 0.303871] x86/fpu: xstate_offset[5]: 1088, xstate_sizes[5]: 64 [ 0.303877] x86/fpu: xstate_offset[6]: 1152, xstate_sizes[6]: 512 [ 0.303882] x86/fpu: xstate_offset[7]: 1664, xstate_sizes[7]: 1024 [ 0.303888] x86/fpu: Enabled xstate features 0xe7, context size is 2688 bytes, using 'standard' format. [ 0.327588] segment-related general protection fault: e030 [#1] SMP NOPTI [ 0.327604] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-91-generic #101~20.04.1custom1 [ 0.327614] RIP: e030:native_irq_return_iret+0x0/0x2 [ 0.327627] Code: 5b 41 5b 41 5a 41 59 41 58 58 59 5a 5e 5f 48 83 c4 08 eb 0f 0f 1f 00 90 66 66 2e 0f 1f 84 00 00 00 00 00 f6 44 24 20 04 75 02 <48> cf 57 0f 01 f8 eb 12 0f 20 df 90 90 90 90 90 48 81 e7 ff e7 ff [ 0.327640] RSP: e02b:ffffffff82e03bc8 EFLAGS: 00010046 [ 0.327647] RAX: 0000000000000000 RBX: ffffffff82e03c30 RCX: ffffffff81e01101 [ 0.327653] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001f [ 0.327660] RBP: ffffffff82e03bf8 R08: ffffffff81e011ef R09: 0000000000000005 [ 0.327666] R10: 0000000000000006 R11: e8ae0feb75ccff49 R12: ffffffff81e011ef [ 0.327672] R13: 0000000000000006 R14: ffffffff81e011f1 R15: 0000000000000002 [ 0.327684] FS: 0000000000000000(0000) GS:ffff888015a00000(0000) knlGS:0000000000000000 [ 0.327691] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.327696] CR2: 0000000000000000 CR3: 0000000002e10000 CR4: 0000000000050660 [ 0.327705] Call Trace: [ 0.327709] <TASK> [ 0.327713] ? show_trace_log_lvl+0x1d6/0x2ea [ 0.327723] ? show_trace_log_lvl+0x1d6/0x2ea [ 0.327729] ? insn_decode+0xec/0x100 [ 0.327738] ? show_regs.part.0+0x23/0x29 [ 0.327743] ? __die_body.cold+0x8/0xd [ 0.327748] ? die_addr+0x3e/0x60 [ 0.327756] ? exc_general_protection+0x1c1/0x350 [ 0.327766] ? asm_exc_general_protection+0x27/0x30 [ 0.327772] ? restore_regs_and_return_to_kernel+0x1d/0x2c [ 0.327778] ? restore_regs_and_return_to_kernel+0x1b/0x2c [ 0.327784] ? restore_regs_and_return_to_kernel+0x1b/0x2c [ 0.327789] ? asm_sysvec_xen_hvm_callback+0x11/0x20 [ 0.327796] ? native_iret+0x7/0x7 [ 0.327801] ? insn_get_displacement+0x4d/0x110 [ 0.327807] insn_decode+0xec/0x100 [ 0.327813] optimize_nops+0x68/0x150 [ 0.327819] ? restore_regs_and_return_to_kernel+0x1d/0x2c [ 0.327825] ? restore_regs_and_return_to_kernel+0x2c/0x2c [ 0.327830] ? restore_regs_and_return_to_kernel+0x20/0x2c [ 0.327837] apply_alternatives+0x181/0x3a0 [ 0.327843] ? restore_regs_and_return_to_kernel+0x1b/0x2c [ 0.327848] ? fb_is_primary_device+0x25/0x73 [ 0.327855] ? restore_regs_and_return_to_kernel+0x1b/0x2c [ 0.327861] ? apply_alternatives+0x8/0x3a0 [ 0.327867] ? fb_is_primary_device+0x6e/0x73 [ 0.327872] ? apply_returns+0xfc/0x180 [ 0.327878] ? fb_is_primary_device+0x6e/0x73 [ 0.327883] ? sanitize_boot_params.constprop.0+0xa/0xef [ 0.327889] ? fb_is_primary_device+0x73/0x73 [ 0.327895] alternative_instructions+0xa9/0x173 [ 0.327904] arch_cpu_finalize_init+0x2c/0x51 [ 0.327909] start_kernel+0x425/0x4ce [ 0.327916] x86_64_start_reservations+0x24/0x2a [ 0.327922] xen_start_kernel+0x41e/0x429 [ 0.327928] startup_xen+0x3e/0x3e [ 0.327934] </TASK> [ 0.327937] Modules linked in: [ 0.327943] ---[ end trace c275641b4f1eba81 ]--- [ 0.327948] RIP: e030:native_irq_return_iret+0x0/0x2 [ 0.327954] Code: 5b 41 5b 41 5a 41 59 41 58 58 59 5a 5e 5f 48 83 c4 08 eb 0f 0f 1f 00 90 66 66 2e 0f 1f 84 00 00 00 00 00 f6 44 24 20 04 75 02 <48> cf 57 0f 01 f8 eb 12 0f 20 df 90 90 90 90 90 48 81 e7 ff e7 ff [ 0.327967] RSP: e02b:ffffffff82e03bc8 EFLAGS: 00010046 [ 0.327972] RAX: 0000000000000000 RBX: ffffffff82e03c30 RCX: ffffffff81e01101 [ 0.327978] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001f [ 0.327984] RBP: ffffffff82e03bf8 R08: ffffffff81e011ef R09: 0000000000000005 [ 0.327990] R10: 0000000000000006 R11: e8ae0feb75ccff49 R12: ffffffff81e011ef [ 0.327996] R13: 0000000000000006 R14: ffffffff81e011f1 R15: 0000000000000002 [ 0.328006] FS: 0000000000000000(0000) GS:ffff888015a00000(0000) knlGS:0000000000000000 [ 0.328012] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.328018] CR2: 0000000000000000 CR3: 0000000002e10000 CR4: 0000000000050660 [ 0.328027] Kernel panic - not syncing: Attempted to kill the idle task! ``` # lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04 Codename: focal # uname -a Linux hostname 5.15.0-91-generic #101~20.04.1custom1 SMP Thu Nov 23 12:37:35 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux # cat /proc/version_signature Ubuntu 5.15.0-91.101~20.04.1custom1-generic 5.15.131 # xl info host : hostname release : 5.15.0-91-generic version : #101~20.04.1custom1 SMP Thu Nov 23 12:37:35 UTC 2023 machine : x86_64 nr_cpus : 80 max_cpu_id : 79 nr_nodes : 2 cores_per_socket : 20 threads_per_core : 2 cpu_mhz : 2294.609 hw_caps : bfebfbff:77fef3ff:2c100800:00000121:0000000f:f3bfbfff:00405f4e:00000100 virt_caps : pv hvm hvm_directio pv_directio hap shadow iommu_hap_pt_share vmtrace total_memory : 130523 free_memory : 79395 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 15 xen_extra : .5 xen_version : 4.15.5 xen_caps : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit2 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : Mon Nov 20 09:36:08 2023 +0000 git:0196200b35-dirty xen_commandline : placeholder console=vga,com2 com2=115200,8n1 dom0_max_vcpus=4-8 dom0_mem=min:6144,max:65536m iommu=on,required,intpost,verbose,debug x2apic=off sched=credit2 flask=enforcing gnttab_max_frames=128 xpti=off smt=on cpufreq=xen:performance spec-ctrl=gds-mit=0 cc_compiler : gcc (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0 cc_compile_by : cc_compile_domain : cc_compile_date : Mon Nov 20 09:37:08 UTC 2023 build_id : 986e88b638105b0dfc4ecf5c9bbb9743a61b2677 xend_config_format : 4 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-meta/+bug/2045248/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp