Re: Recent patches break ACPI tables

Almudena Garcia Mon, 19 Jun 2023 15:44:16 -0700

Maybe add a little conditional in the assembly code (return 0 if lapic is 0, 
using cmp and je) could fix the problem in a simpler way


El lunes 19 de junio de 2023, l...@orpolo.org escribió:
> Il 19/06/23 20:35, Almudena Garcia ha scritto:
> > But the code which starts the secondary cpus is so much later than the 
> > crash.
> > 
> > Then, the crash could be produced by the reading of ACPI tables, which are 
> > supposed to be in a certain memory region, defined by a physical address.
> > 
> > phystokv will doesn't solve fully the problem, because the lapic address is 
> > out of the range allowed by this function. Currently, we are using paging 
> > to map every ACPI table which we need to access (to get a virtual address 
> > of this).
> > 
> > But the search of the initial ACPI address is based in a physical address 
> > range.
> 
> I could go a bit further with debugging, and it seems that the problem 
> is a bit different, it seems removing the 1:1 map exposed an issue that 
> went hidden so far.
> 
> In my test the cpu is reset by a triple fault (you can see this by 
> enabling interrupt and cpu_reset logging with qemu, e.g. using -d 
> int,cpu_reset) which is triggered after the first call to splvm:
> 
> (gdb) bt
> #0  splvm () at ../i386/i386/spl.S:122
> #1  0xc1001da6 in pmap_enter (pmap=<optimized out>, v=<optimized out>, 
> pa=<optimized out>, prot=<optimized out>, wired=<optimized out>) at 
> ../i386/intel/pmap.c:2171
> #2  0xc1029b99 in pmap_steal_memory (size=<optimized out>) at 
> ../vm/vm_resident.c:278
> #3  0xc1029c48 in vm_page_bootstrap (startp=<optimized out>, 
> endp=<optimized out>) at ../vm/vm_resident.c:207
> #4  0xc101b893 in vm_mem_bootstrap () at ../vm/vm_init.c:65
> #5  0xc10161d1 in setup_main () at ../kern/startup.c:115
> #6  0xc1004652 in c_boot_entry (bi=<optimized out>) at 
> ../i386/i386at/model_dep.c:578
> #7  0xc1000093 in iplt_done () at ../i386/i386at/boothdr.S:103
> (gdb) si
> 124           cli
> 1: x/i $pc
> => 0xc100ac5d <splvm+5>:      cli
> (gdb)
> 125           CPU_NUMBER(%edx)
> 1: x/i $pc
> => 0xc100ac5e <splvm+6>:      mov    %cs:0xc109bc6c,%edx
> (gdb)
> 0xc100ac65    125             CPU_NUMBER(%edx)
> 1: x/i $pc
> => 0xc100ac65 <splvm+13>:     mov    %cs:0x20(%edx),%edx
> (gdb)
> t_page_fault () at ../i386/i386/locore.S:435
> 435           pushl   $(T_PAGE_FAULT)         /* mark a page fault trap */
> 1: x/i $pc
> => 0xc100a42c <t_page_fault>: push   $0xe
> 
> ... and here it will enter recursively t_page_fault, because in 
> trap_from_kernel there is another CPU_NUMBER. I guess the triple fault 
> is triggered because at some point the exception stack overflows.
> 
> With --enable-ncpu=2 it seems that CPU_NUMBER is
> 
> #define       CPU_NUMBER(reg) \
>       movl    %cs:lapic, reg          ;\
>       movl    %cs:APIC_ID(reg), reg   ;\
>       shrl    $24, reg                ;\
> 
> and at this stage the lapic pointer is not yet initialized:
> 
> (gdb) p lapic
> $4 = (volatile ApicLocalUnit *) 0x0
> (gdb) x &lapic
> 0xc109bc6c <lapic>:   0x00000000
> 
> I guess so far this worked because the address 0 was mapped, and now it 
> isn't.
> 
> I'm not sure what would be the proper way to solve this. I tried 
> anticipating the call to machine_init() to be before vm_mem_bootstrap() 
> (to have lapic initialized) but this triggers another assert.
> 
> Any idea?
> 
> 
> Luca
> 
>

-- 
Enviado desde mi dispositivo Sailfish

Re: Recent patches break ACPI tables

Reply via email to