On Wed, 09 Mar, at 01:02:44AM, Alexis Murzeau wrote:
> 
> Thanks for you suggestion.
> Unfortunately, this patch doesn't make it works, the crash still
> occurs (at the same RIP and traceback).
> 
> Using /dev/mem on a running system (with kernel 4.3), the memory
> around RIP (0xaa9462ee) is :
> aa9462d0  sub rsp,0x28
> aa9462d4  lea rdx,[rip+0x2445] # 0xaa948720
> aa9462db  mov ecx,0x4
> aa9462e0  call func_aa9447c0  ; call to ConvertPointer(4, & 0xaa948720)
> aa9462e5  mov r11,QWORD PTR [rip+0x2434] # 0xaa948720
> aa9462ec  xor eax,eax
> aa9462ee  mov BYTE PTR [r11+0x1],0x1
> aa9462f3  add rsp,0x28
> aa9462f7  ret

Interesting. This code sequence is pretty typical of runtime drivers
that have registered to be notified when SetVirtualAddressMap() is
invoked. It basically just calls ConvertPointer() and updates an
internal pointer with the new virtual address in the memory map passed
to SetVirtualAddressMap().

The first argument to ConvertPointer() isn't actually contained in the
UEFI spec (go figure). Digging around in the Tianocore source reveals
that it's EFI_INTERNAL_POINTER, which is distinct from
EFI_INTERNAL_FUNCTION (0x00000002). Not all that helpful.

> The QWORD at address 0xaa948720 is 0 though on the running system.

My first reaction was: weird, 0x0 is an invalid address, and that I'd
always expect dereferencing that address would cause a page fault.

But we're dealing with physical addresses, and 0 is a compeletely
legitimate address, and in fact, contains Boot Services Code on your
machine,


[    0.000000] efi: mem00: [Boot Code          |   |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x0000000000000000-0x0000000000001000) (0MB)


Looking at the mapping for the first page between the working and
non-working kernels shows,


 ---[ User Space ]---
[Good] 0x0000000000000000-0x0000000000001000           4K       RW              
   GLB NX pte
[Bad]  0x0000000000000000-0x0000000000001000           4K                       
          pte


Oops. The zero page isn't mapped at all with the new scheme, which
explains why working kernels don't fault but the new one does.

This probably used to work because trim_bios_range() inserts a mapping
for the first page into the e820 map, which is used to construct the
kernel page tables. It's that code path rather than the EFI mapping
code that allowed this to work in the past (I'm guessing).

Could you boot a working kernel with memblock=debug on the kernel
command line and look out for,

  memblock: Could not reserve boot range [0x0000000000-0x0000000fff]

or similar. I'd like to confirm what's going on here. If
memblock=debug results in too much output you could simply change the
memblock_dbg() call in efi_reserve_boot_services() to a printk().

Because if this analysis is true, this patch should fix things,

---
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 2326bf51978f..7db49e975b11 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -194,8 +194,6 @@ void __init efi_reserve_boot_services(void)
                                && start <= __pa_symbol(_end)) ||
                        !e820_all_mapped(start, start+size, E820_RAM) ||
                        memblock_is_region_reserved(start, size)) {
-                       /* Could not reserve, skip it */
-                       md->num_pages = 0;
                        memblock_dbg("Could not reserve boot range 
[0x%010llx-0x%010llx]\n",
                                     start, start+size-1);
                } else

Reply via email to