Hi, I've noticed what I believe to be an error in the RISC-V implementation. The RISC-V spec[1] states:
Note that load and load-reserved instructions generate load exceptions, whereas store, storeconditional, and AMO instructions generate store/AMO exceptions. For an AMO operation, a translation page fault should record into m/scause the value 15 Store/AMO Page Fault. Instead, I observe it record 13 Load Page Fault. Here is a look at this using the execlog plugin: 0, 0x3fbf6dfee8, 0x8f5b02f, "amoswap.d zero,a5,(a1)", priv -> 0x0000000000000001, scause -> 0x000000000000000d 0, 0xffffffff8031b5ac, 0x14021273, "csrrw tp,sscratch,tp", tp -> 0xffffffd600926400 In this scenario a Linux user-space app issues the AMO to an address that is not yet mapped, which causes the exception 0xd 13 Load Page Fault. I've spent time looking into this. The AMO operation, at run-time (after TCG), has a call tree that looks like: code_gen_buffer do_ld8_mmu mmu_lookup mmu_lookup1 tlb_fill cpu_riscv_tlb_fill get_physical_address As the AMO starts with the read/load part of the operation, the translation fails since the pte.V=0 (not yet mapped), and because get_physical_address is told access_type=MMU_DATA_LOAD, upon return the raise_mmu_exception() turns that MMU_DATA_LOAD into a RISCV_EXCP_LOAD_PAGE_FAULT. This example is a Linux scenario, and I think that I see the Load Page Fault happen, and then upon return from the exception handler, a Store Page fault happens, after which the AMO is able to complete. So Linux has been "gracefully" covering this situation. I've looked into trying to detect this condition and change the exception_index/cause accordingly. Fundamentally there would need to be state reachable via get_physical_adress() that can indicate an AMO. In the short time I've looked at this, I can't find it. Advice/tips welcome. I'm on stable-9.1 <v9.1.1> sha 0ff5ab6f57a2. Thanks, Eric [1] riscv-privileged-20211203.pdf, 3.1.15 Machine Cause Register
