On Wed, 09 Mar, at 01:02:44AM, Alexis Murzeau wrote: > > Thanks for you suggestion. > Unfortunately, this patch doesn't make it works, the crash still > occurs (at the same RIP and traceback). > > Using /dev/mem on a running system (with kernel 4.3), the memory > around RIP (0xaa9462ee) is : > aa9462d0 sub rsp,0x28 > aa9462d4 lea rdx,[rip+0x2445] # 0xaa948720 > aa9462db mov ecx,0x4 > aa9462e0 call func_aa9447c0 ; call to ConvertPointer(4, & 0xaa948720) > aa9462e5 mov r11,QWORD PTR [rip+0x2434] # 0xaa948720 > aa9462ec xor eax,eax > aa9462ee mov BYTE PTR [r11+0x1],0x1 > aa9462f3 add rsp,0x28 > aa9462f7 ret
Interesting. This code sequence is pretty typical of runtime drivers that have registered to be notified when SetVirtualAddressMap() is invoked. It basically just calls ConvertPointer() and updates an internal pointer with the new virtual address in the memory map passed to SetVirtualAddressMap(). The first argument to ConvertPointer() isn't actually contained in the UEFI spec (go figure). Digging around in the Tianocore source reveals that it's EFI_INTERNAL_POINTER, which is distinct from EFI_INTERNAL_FUNCTION (0x00000002). Not all that helpful. > The QWORD at address 0xaa948720 is 0 though on the running system. My first reaction was: weird, 0x0 is an invalid address, and that I'd always expect dereferencing that address would cause a page fault. But we're dealing with physical addresses, and 0 is a compeletely legitimate address, and in fact, contains Boot Services Code on your machine, [ 0.000000] efi: mem00: [Boot Code | | | | | | | |WB|WT|WC|UC] range=[0x0000000000000000-0x0000000000001000) (0MB) Looking at the mapping for the first page between the working and non-working kernels shows, ---[ User Space ]--- [Good] 0x0000000000000000-0x0000000000001000 4K RW GLB NX pte [Bad] 0x0000000000000000-0x0000000000001000 4K pte Oops. The zero page isn't mapped at all with the new scheme, which explains why working kernels don't fault but the new one does. This probably used to work because trim_bios_range() inserts a mapping for the first page into the e820 map, which is used to construct the kernel page tables. It's that code path rather than the EFI mapping code that allowed this to work in the past (I'm guessing). Could you boot a working kernel with memblock=debug on the kernel command line and look out for, memblock: Could not reserve boot range [0x0000000000-0x0000000fff] or similar. I'd like to confirm what's going on here. If memblock=debug results in too much output you could simply change the memblock_dbg() call in efi_reserve_boot_services() to a printk(). Because if this analysis is true, this patch should fix things, --- diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 2326bf51978f..7db49e975b11 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -194,8 +194,6 @@ void __init efi_reserve_boot_services(void) && start <= __pa_symbol(_end)) || !e820_all_mapped(start, start+size, E820_RAM) || memblock_is_region_reserved(start, size)) { - /* Could not reserve, skip it */ - md->num_pages = 0; memblock_dbg("Could not reserve boot range [0x%010llx-0x%010llx]\n", start, start+size-1); } else