From: Hou Wenlong <[email protected]> [ Upstream commit db9aded979b491a24871e1621cd4e8822dbca859 ]
The PVH entry is available for 32-bit KVM guests, and 32-bit KVM guests do not depend on CONFIG_X86_PAE. However, mk_early_pgtbl_32() builds different pagetables depending on whether CONFIG_X86_PAE is set. Therefore, enabling PAE mode for 32-bit KVM guests without CONFIG_X86_PAE being set would result in a boot failure during CR3 loading. Signed-off-by: Hou Wenlong <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Message-ID: <d09ce9a134eb9cbc16928a5b316969f8ba606b81.1768017442.git.houwenlong....@antgroup.com> Signed-off-by: Sasha Levin <[email protected]> --- LLM Generated explanations, may be completely bogus: Now I have all the information needed for a complete analysis. --- ## Detailed Analysis ### 1. Commit Message Analysis The subject clearly states the problem: PAE mode is being unconditionally enabled for all PVH guests, but for 32-bit guests it should only be enabled when `CONFIG_X86_PAE` is set. The commit message explicitly describes a **boot failure** scenario: loading CR3 fails when the page table format doesn't match the CPU's paging mode expectation. The commit is: - Reviewed by Juergen Gross (Xen subsystem maintainer) - Signed off by Juergen Gross (subsystem maintainer sign-off) ### 2. Code Change Analysis **The Bug Mechanism:** The PVH entry path in `arch/x86/platform/pvh/head.S` is the boot entry point for PVH (Para-Virtualized Hardware) guests, used by Xen and KVM. The flow is: 1. **Line 94-97** (before fix): PAE mode is unconditionally enabled in CR4: ```94:97:arch/x86/platform/pvh/head.S /* Enable PAE mode. */ mov %cr4, %eax orl $X86_CR4_PAE, %eax mov %eax, %cr4 ``` 2. For **64-bit** guests (`CONFIG_X86_64`), this is correct — PAE is always needed as a prerequisite for long mode (line 99-104). 3. For **32-bit** guests (the `#else` path starting at line 196), the code calls `mk_early_pgtbl_32()` to build early page tables: ```196:205:arch/x86/platform/pvh/head.S #else /* CONFIG_X86_64 */ call mk_early_pgtbl_32 mov $_pa(initial_page_table), %eax mov %eax, %cr3 mov %cr0, %eax or $(X86_CR0_PG | X86_CR0_PE), %eax mov %eax, %cr0 ``` 4. `mk_early_pgtbl_32()` in `arch/x86/kernel/head32.c` builds **fundamentally different** page table structures depending on `CONFIG_X86_PAE`: ```95:103:arch/x86/kernel/head32.c #ifdef CONFIG_X86_PAE typedef pmd_t pl2_t; #define pl2_base initial_pg_pmd #define SET_PL2(val) { .pmd = (val), } #else typedef pgd_t pl2_t; #define pl2_base initial_page_table #define SET_PL2(val) { .pgd = (val), } #endif ``` - With `CONFIG_X86_PAE`: Builds **3-level PAE page tables** (PGDIR_SHIFT=30, uses PMDs + PDPTEs) - Without `CONFIG_X86_PAE`: Builds **2-level non-PAE page tables** (PGDIR_SHIFT=22, uses PGDs directly) **The crash**: When PAE is enabled in CR4 but non-PAE page tables are loaded into CR3, the CPU interprets the 2-level page directory as a PAE PDPT (Page Directory Pointer Table). When paging is activated (CR0.PG=1), the processor tries to load the PDPTE entries from the address in CR3. The non-PAE page directory entries are completely incompatible with PAE PDPTE format, causing a **#GP fault or triple fault**, resulting in an immediate boot failure. **The Fix:** Simply wrapping the PAE enablement with proper `#ifdef` guards: ```asm #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) /* Enable PAE mode. */ mov %cr4, %eax orl $X86_CR4_PAE, %eax mov %eax, %cr4 #endif ``` This ensures PAE is only enabled when: - `CONFIG_X86_64` is set (64-bit always needs PAE for long mode), or - `CONFIG_X86_PAE` is set (32-bit with PAE — page tables match) Note that the 32-bit path at lines 212-220 already has code to disable PAE before jumping to `startup_32`, which confirms the original author was aware that PAE and non-PAE modes exist, but the initial enablement was not properly guarded. ### 3. Classification This is a **boot failure fix**. It's not a feature, cleanup, or optimization. It fixes a configuration where a 32-bit PVH guest without `CONFIG_X86_PAE` completely fails to boot. ### 4. Scope and Risk Assessment - **Lines changed**: 2 lines added (`#if defined(...)` and `#endif`), 0 lines removed - **Files touched**: 1 (`arch/x86/platform/pvh/head.S`) - **Complexity**: Minimal — conditional compilation guard - **Risk**: Extremely low - For `CONFIG_X86_64`: No change (the `#if` is always true) - For `CONFIG_X86_32` with `CONFIG_X86_PAE`: No change (the `#if` is true) - For `CONFIG_X86_32` without `CONFIG_X86_PAE`: PAE is no longer enabled, matching the page table format — this is the bug fix - **Regression potential**: Near zero. The only behavioral change is for the broken configuration that currently crashes. ### 5. User Impact - **Who is affected**: Anyone running a 32-bit kernel without PAE as a PVH/KVM guest. This is a legitimate configuration since `config PVH` has no dependency on `CONFIG_X86_PAE` or `CONFIG_X86_64`. - **Severity**: Complete boot failure — the system cannot boot at all - **Workaround**: Users must enable `CONFIG_X86_PAE` for 32-bit PVH guests, which may not be obvious and adds unnecessary configuration constraints ### 6. Stability Indicators - **Reviewed-by**: Juergen Gross (Xen subsystem maintainer) — very strong indicator - **Signed-off-by**: Juergen Gross (accepted through the Xen tree) - The fix is trivially correct by inspection — when `CONFIG_X86_PAE` is not set, enabling PAE in CR4 is wrong because the page tables aren't in PAE format ### 7. Dependency Check - **No dependencies** on other patches — this is a completely self- contained fix - The affected code (`pvh_start_xen` with the unconditional PAE enablement) exists in **all current stable trees** (confirmed present from v5.4 through v6.12) - The fix context is slightly different in older stable trees (e.g., `rep\n movsl` vs `rep movsl`, PIC vs absolute addressing), so minor backport adjustments may be needed for older trees, but the fix itself (adding `#if`/`#endif` around the PAE lines) is trivial to adapt ### 8. Conclusion This is a textbook stable kernel fix: - **Fixes a real, severe bug**: Complete boot failure for a valid kernel configuration - **Obviously correct**: The fix is a 2-line conditional compilation guard that makes PAE enablement match page table format - **Small and contained**: 2 lines added to 1 file - **No new features**: Just correctness fix - **No risk of regression**: Only changes behavior for the currently- broken configuration - **Present in all stable trees**: The bug has existed since the original PVH commit (v4.11, 2017) - **Reviewed by subsystem maintainer**: Juergen Gross **YES** arch/x86/platform/pvh/head.S | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S index 344030c1a81d4..53ee2d53fcf8e 100644 --- a/arch/x86/platform/pvh/head.S +++ b/arch/x86/platform/pvh/head.S @@ -91,10 +91,12 @@ SYM_CODE_START(pvh_start_xen) leal rva(early_stack_end)(%ebp), %esp +#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) /* Enable PAE mode. */ mov %cr4, %eax orl $X86_CR4_PAE, %eax mov %eax, %cr4 +#endif #ifdef CONFIG_X86_64 /* Enable Long mode. */ -- 2.51.0
