Maybe add a little conditional in the assembly code (return 0 if lapic is 0, using cmp and je) could fix the problem in a simpler way
El lunes 19 de junio de 2023, l...@orpolo.org escribió: > Il 19/06/23 20:35, Almudena Garcia ha scritto: > > But the code which starts the secondary cpus is so much later than the > > crash. > > > > Then, the crash could be produced by the reading of ACPI tables, which are > > supposed to be in a certain memory region, defined by a physical address. > > > > phystokv will doesn't solve fully the problem, because the lapic address is > > out of the range allowed by this function. Currently, we are using paging > > to map every ACPI table which we need to access (to get a virtual address > > of this). > > > > But the search of the initial ACPI address is based in a physical address > > range. > > I could go a bit further with debugging, and it seems that the problem > is a bit different, it seems removing the 1:1 map exposed an issue that > went hidden so far. > > In my test the cpu is reset by a triple fault (you can see this by > enabling interrupt and cpu_reset logging with qemu, e.g. using -d > int,cpu_reset) which is triggered after the first call to splvm: > > (gdb) bt > #0 splvm () at ../i386/i386/spl.S:122 > #1 0xc1001da6 in pmap_enter (pmap=<optimized out>, v=<optimized out>, > pa=<optimized out>, prot=<optimized out>, wired=<optimized out>) at > ../i386/intel/pmap.c:2171 > #2 0xc1029b99 in pmap_steal_memory (size=<optimized out>) at > ../vm/vm_resident.c:278 > #3 0xc1029c48 in vm_page_bootstrap (startp=<optimized out>, > endp=<optimized out>) at ../vm/vm_resident.c:207 > #4 0xc101b893 in vm_mem_bootstrap () at ../vm/vm_init.c:65 > #5 0xc10161d1 in setup_main () at ../kern/startup.c:115 > #6 0xc1004652 in c_boot_entry (bi=<optimized out>) at > ../i386/i386at/model_dep.c:578 > #7 0xc1000093 in iplt_done () at ../i386/i386at/boothdr.S:103 > (gdb) si > 124 cli > 1: x/i $pc > => 0xc100ac5d <splvm+5>: cli > (gdb) > 125 CPU_NUMBER(%edx) > 1: x/i $pc > => 0xc100ac5e <splvm+6>: mov %cs:0xc109bc6c,%edx > (gdb) > 0xc100ac65 125 CPU_NUMBER(%edx) > 1: x/i $pc > => 0xc100ac65 <splvm+13>: mov %cs:0x20(%edx),%edx > (gdb) > t_page_fault () at ../i386/i386/locore.S:435 > 435 pushl $(T_PAGE_FAULT) /* mark a page fault trap */ > 1: x/i $pc > => 0xc100a42c <t_page_fault>: push $0xe > > ... and here it will enter recursively t_page_fault, because in > trap_from_kernel there is another CPU_NUMBER. I guess the triple fault > is triggered because at some point the exception stack overflows. > > With --enable-ncpu=2 it seems that CPU_NUMBER is > > #define CPU_NUMBER(reg) \ > movl %cs:lapic, reg ;\ > movl %cs:APIC_ID(reg), reg ;\ > shrl $24, reg ;\ > > and at this stage the lapic pointer is not yet initialized: > > (gdb) p lapic > $4 = (volatile ApicLocalUnit *) 0x0 > (gdb) x &lapic > 0xc109bc6c <lapic>: 0x00000000 > > I guess so far this worked because the address 0 was mapped, and now it > isn't. > > I'm not sure what would be the proper way to solve this. I tried > anticipating the call to machine_init() to be before vm_mem_bootstrap() > (to have lapic initialized) but this triggers another assert. > > Any idea? > > > Luca > > -- Enviado desde mi dispositivo Sailfish