Re: Recent patches break ACPI tables

luca Mon, 19 Jun 2023 12:40:48 -0700

Il 19/06/23 20:35, Almudena Garcia ha scritto:

But the code which starts the secondary cpus is so much later than the crash.


Then, the crash could be produced by the reading of ACPI tables, which are 
supposed to be in a certain memory region, defined by a physical address.

phystokv will doesn't solve fully the problem, because the lapic address is out 
of the range allowed by this function. Currently, we are using paging to map 
every ACPI table which we need to access (to get a virtual address of this).

But the search of the initial ACPI address is based in a physical address range.

I could go a bit further with debugging, and it seems that the problemis a bit different, it seems removing the 1:1 map exposed an issue thatwent hidden so far.

In my test the cpu is reset by a triple fault (you can see this byenabling interrupt and cpu_reset logging with qemu, e.g. using -dint,cpu_reset) which is triggered after the first call to splvm:


(gdb) bt
#0  splvm () at ../i386/i386/spl.S:122

#1 0xc1001da6 in pmap_enter (pmap=<optimized out>, v=<optimized out>,pa=<optimized out>, prot=<optimized out>, wired=<optimized out>) at../i386/intel/pmap.c:2171#2 0xc1029b99 in pmap_steal_memory (size=<optimized out>) at../vm/vm_resident.c:278#3 0xc1029c48 in vm_page_bootstrap (startp=<optimized out>,endp=<optimized out>) at ../vm/vm_resident.c:207

#4  0xc101b893 in vm_mem_bootstrap () at ../vm/vm_init.c:65
#5  0xc10161d1 in setup_main () at ../kern/startup.c:115

#6 0xc1004652 in c_boot_entry (bi=<optimized out>) at../i386/i386at/model_dep.c:578

#7  0xc1000093 in iplt_done () at ../i386/i386at/boothdr.S:103
(gdb) si
124             cli
1: x/i $pc
=> 0xc100ac5d <splvm+5>:       cli
(gdb)
125             CPU_NUMBER(%edx)
1: x/i $pc
=> 0xc100ac5e <splvm+6>:       mov    %cs:0xc109bc6c,%edx
(gdb)
0xc100ac65      125             CPU_NUMBER(%edx)
1: x/i $pc
=> 0xc100ac65 <splvm+13>:      mov    %cs:0x20(%edx),%edx
(gdb)
t_page_fault () at ../i386/i386/locore.S:435
435             pushl   $(T_PAGE_FAULT)         /* mark a page fault trap */
1: x/i $pc
=> 0xc100a42c <t_page_fault>:  push   $0xe

... and here it will enter recursively t_page_fault, because intrap_from_kernel there is another CPU_NUMBER. I guess the triple faultis triggered because at some point the exception stack overflows.


With --enable-ncpu=2 it seems that CPU_NUMBER is

#define CPU_NUMBER(reg) \
        movl    %cs:lapic, reg          ;\
        movl    %cs:APIC_ID(reg), reg   ;\
        shrl    $24, reg                ;\

and at this stage the lapic pointer is not yet initialized:

(gdb) p lapic
$4 = (volatile ApicLocalUnit *) 0x0
(gdb) x &lapic
0xc109bc6c <lapic>:       0x00000000

I guess so far this worked because the address 0 was mapped, and now itisn't.

I'm not sure what would be the proper way to solve this. I triedanticipating the call to machine_init() to be before vm_mem_bootstrap()(to have lapic initialized) but this triggers another assert.


Any idea?


Luca

Re: Recent patches break ACPI tables

Reply via email to