On Thu, 1 Feb 2024 at 14:01, Jonathan Cameron <jonathan.came...@huawei.com> wrote: > > Can you run QEMU under gdb and give the backtrace when it stops > > on the abort() ? That will probably have a helpful clue. I > > suspect something is failing to pass a valid retaddr in > > when it calls a load/store function.
> [Switching to Thread 0x7ffff56ff6c0 (LWP 21916)] > __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) > at ./nptl/pthread_kill.c:44 > 44 ./nptl/pthread_kill.c: No such file or directory. > (gdb) bt > #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized > out>) at ./nptl/pthread_kill.c:44 > #1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at > ./nptl/pthread_kill.c:78 > #2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at > ./nptl/pthread_kill.c:89 > #3 0x00007ffff77c43b6 in __GI_raise (sig=sig@entry=6) at > ../sysdeps/posix/raise.c:26 > #4 0x00007ffff77aa87c in __GI_abort () at ./stdlib/abort.c:79 > #5 0x0000555555c0d4ce in cpu_abort > (cpu=cpu@entry=0x555556fd9000, fmt=fmt@entry=0x555555fe3378 > "cpu_io_recompile: could not find TB for pc=%p") > at ../../cpu-target.c:359 > #6 0x0000555555c59435 in cpu_io_recompile (cpu=cpu@entry=0x555556fd9000, > retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611 > #7 0x0000555555c5c956 in io_prepare > (retaddr=0, addr=19595792376, attrs=..., xlat=<optimized out>, > cpu=0x555556fd9000, out_offset=<synthetic pointer>) > at ../../accel/tcg/cputlb.c:1339 > #8 do_ld_mmio_beN > (cpu=0x555556fd9000, full=0x7fffee0d96e0, ret_be=ret_be@entry=0, > addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at > ../../accel/tcg/cputlb.c:2030 > #9 0x0000555555c5dfad in do_ld_8 > (cpu=cpu@entry=0x555556fd9000, p=p@entry=0x7ffff56fddc0, > mmu_idx=<optimized out>, type=type@entry=MMU_DATA_LOAD, memop=<optimized > out>, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:2356 > #10 0x0000555555c6026f in do_ld8_mmu > (cpu=cpu@entry=0x555556fd9000, addr=addr@entry=19595792376, > oi=oi@entry=52, ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) > at ../../accel/tcg/cputlb.c:2439 > #11 0x0000555555c629f9 in cpu_ldq_mmu (ra=0, oi=52, addr=19595792376, > env=0x555556fdb7c0) at ../../accel/tcg/ldst_common.c.inc:169 > #12 cpu_ldq_le_mmuidx_ra (env=0x555556fdb7c0, addr=19595792376, > mmu_idx=<optimized out>, ra=ra@entry=0) > at ../../accel/tcg/ldst_common.c.inc:301 > #13 0x0000555555b18ede in ptw_ldq (in=0x7ffff56fdf00) at > ../../target/i386/tcg/sysemu/excp_helper.c:98 > #14 ptw_ldq (in=0x7ffff56fdf00) at > ../../target/i386/tcg/sysemu/excp_helper.c:93 > #15 mmu_translate (env=env@entry=0x555556fdb7c0, in=in@entry=0x7ffff56fdfa0, > out=out@entry=0x7ffff56fdf70, err=err@entry=0x7ffff56fdf80) > at ../../target/i386/tcg/sysemu/excp_helper.c:173 > #16 0x0000555555b19c95 in get_physical_address > (err=0x7ffff56fdf80, out=0x7ffff56fdf70, mmu_idx=0, > access_type=MMU_INST_FETCH, addr=18446744072116178925, env=0x555556fdb7c0) > at ../../target/i386/tcg/sysemu/excp_helper.c:578 > #17 x86_cpu_tlb_fill > (cs=0x555556fd9000, addr=18446744072116178925, size=<optimized out>, > access_type=MMU_INST_FETCH, mmu_idx=0, probe=<optimized out>, retaddr=0) at > ../../target/i386/tcg/sysemu/excp_helper.c:604 > #18 0x0000555555c5dd2b in probe_access_internal > (cpu=<optimized out>, addr=18446744072116178925, > fault_size=fault_size@entry=1, access_type=access_type@entry=MMU_INST_FETCH, > mmu_idx=0, nonfault=nonfault@entry=false, phost=0x7ffff56fe0d0, > pfull=0x7ffff56fe0c8, retaddr=0, check_mem_cbs=false) > at ../../accel/tcg/cputlb.c:1432 > #19 0x0000555555c61ff8 in get_page_addr_code_hostp (env=<optimized out>, > addr=addr@entry=18446744072116178925, hostp=hostp@entry=0x0) > at ../../accel/tcg/cputlb.c:1603 > #20 0x0000555555c50a2b in get_page_addr_code (addr=18446744072116178925, > env=<optimized out>) > at /home/jic23/src/qemu/include/exec/exec-all.h:594 > #21 tb_htable_lookup (cpu=<optimized out>, pc=pc@entry=18446744072116178925, > cs_base=0, flags=415285936, cflags=4278353920) > at ../../accel/tcg/cpu-exec.c:231 > #22 0x0000555555c50c08 in tb_lookup > (cpu=cpu@entry=0x555556fd9000, pc=pc@entry=18446744072116178925, > cs_base=cs_base@entry=0, flags=<optimized out>, cflags=<optimized out>) at > ../../accel/tcg/cpu-exec.c:267 > #23 0x0000555555c51e23 in helper_lookup_tb_ptr (env=0x555556fdb7c0) at > ../../accel/tcg/cpu-exec.c:423 > #24 0x00007fffa9076ead in code_gen_buffer () > #25 0x0000555555c50fab in cpu_tb_exec (cpu=cpu@entry=0x555556fd9000, > itb=<optimized out>, tb_exit=tb_exit@entry=0x7ffff56fe708) > at ../../accel/tcg/cpu-exec.c:458 > #26 0x0000555555c51492 in cpu_loop_exec_tb > (tb_exit=0x7ffff56fe708, last_tb=<synthetic pointer>, > pc=18446744072116179169, tb=<optimized out>, cpu=0x555556fd9000) > at ../../accel/tcg/cpu-exec.c:920 > #27 cpu_exec_loop (cpu=cpu@entry=0x555556fd9000, sc=sc@entry=0x7ffff56fe7a0) > at ../../accel/tcg/cpu-exec.c:1041 > #28 0x0000555555c51d11 in cpu_exec_setjmp (cpu=cpu@entry=0x555556fd9000, > sc=sc@entry=0x7ffff56fe7a0) at ../../accel/tcg/cpu-exec.c:1058 > #29 0x0000555555c523b4 in cpu_exec (cpu=cpu@entry=0x555556fd9000) at > ../../accel/tcg/cpu-exec.c:1084 > #30 0x0000555555c74053 in tcg_cpus_exec (cpu=cpu@entry=0x555556fd9000) at > ../../accel/tcg/tcg-accel-ops.c:76 > #31 0x0000555555c741a0 in mttcg_cpu_thread_fn (arg=arg@entry=0x555556fd9000) > at ../../accel/tcg/tcg-accel-ops-mttcg.c:95 > #32 0x0000555555dfb580 in qemu_thread_start (args=0x55555703c3e0) at > ../../util/qemu-thread-posix.c:541 > #33 0x00007ffff78176ba in start_thread (arg=<optimized out>) at > ./nptl/pthread_create.c:444 > #34 0x00007ffff78a60d0 in clone3 () at > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 So, that looks like: * we call cpu_tb_exec(), which executes some generated code * that generated code calls the lookup_tb_ptr helper to see if we have a generated TB already for the address we're going to execute next * lookup_tb_ptr probes the TLB to see if we know the host RAM address for the guest address * this results in a TLB walk for an instruction fetch * the page table descriptor load is to IO memory * io_prepare assumes it needs to do a TLB recompile, because can_do_io is clear I am not surprised that the corner case of "the guest put its page tables in an MMIO device" has not yet come up :-) I'm really not sure how the icount handling should interact with that... -- PMM