On 28/11/2025 1:19 pm, Julian Vetter wrote: > On 11/27/25 16:20, Andrew Cooper wrote: >> On 27/11/2025 2:31 pm, Julian Vetter wrote: >>> Currently Intel CPUs in EFI mode with the "Execute Disable Bit" disabled >>> and the 'CONFIG_REQUIRE_NX=y' fail to boot, because this check is >>> performed before trampoline_setup is called, which determines if NX is >>> supported or if it's hidden by 'MSR_IA32_MISC_ENABLE[34] = 1' (if so, >>> re-enables NX). >>> >>> Signed-off-by: Julian Vetter <[email protected]> >> Lovely... This isn't the only bug; there's another one from the Vates >> forums about AMD CPUs which I haven't gotten around to fixing yet. >> > Thank you. I will have a look. I haven't seen this thread.
https://xcp-ng.org/forum/post/80714 But the tl;dr is that AMD have introduced a firmware option to disable NX. Unlike Intel, there's no positive way to know you've reactivated it. A conversation with AMD has revealed that there's no capability to prevent setting EFER.NXE, and that NX is always available in practice. I'm pretty sure the firmware is just clearing NX in the CPUID Override MSR. However, to reactivate this safely, we need to do a wrmsr_safe(), which means we need to delay setting NXE until exception handling is available which is rather later on boot. There's also a tangle with the order-of-initialisation of the CPUID Override MSRs which I found recently while doing something else. The other observation is that, even on a STRICT_NX build of Xen, we can boot into __start_xen() because we can't insert NX into the pagetables that early. In fact it's quite late that we lock down permissions; see the calls to modify_xen_mappings() in __start_xen(). Given that we need to be this late for AMD, we can also move the Intel logic later (effectively reverts part of the original work; sorry Alejandro) which means we can also use safe accessors, and we don't need to worry about the divergent early paths. > >> Do you have any more information about which system looks like this? >> > I'm not sure if I understand your question correctly, but I was just > booting an Intel based machine newer than ~2012. I have tested this on 4 > different machines, on which 3 hit this code path. One was a HPE > ProLiant m510 Server with a XEON CPU Broadwell. > , second was a Mini PC with Celeron CPU, Sorry, not enough information here to figure out the microarchitecture. > and third was an old Intel NUC DCCP847DYE also with a Celeron CPU. Sandy Bridge. > The only system where I couldn't reproduce the issue was an old > workstation with a Gigabyte mainboard. It has the flag in the Bios to > set MSR_IA32_MISC_ENABLE, but I'm not sure if it was actually booting > via UEFI. Same, not enough information here. But, it's clear that Intel's XD-disable is still honoured in EFI mode on a wide range of systems, and that we need a fix for UEFI. > I will verify this on monday. I booted all the 3 other systems > via UEFI -> Grub -> multiboot2. My grub entry looks like this: > > multiboot2 /boot/xen.gz dom0_mem=2656M,max:2656M watchdog ucode=scan > dom0_max_vcpus=1-8 crashkernel=256M,below=4G console=vga vga=mode-0x0311 > module2 boot/vmlinuz console=hvc0 console=tty0 init=/bin/sh > module2 boot/initrd-dom0 > >> trampoline_setup isn't executed on all EFI boots. I had a different fix >> in mind, but it's a little more complicated. > Aha. yes, I didn't thought about other code paths. https://xenbits.xen.org/docs/latest/hypervisor-guide/x86/how-xen-boots.html Here's something I put together to cover some of these details. But, most of the detail is in the source only. ~Andrew
