On 4/21/25 10:58, Eric DeVolder wrote:
Hi,
I've noticed what I believe to be an error in the RISC-V
implementation. The RISC-V spec[1] states:
Note that load and load-reserved instructions generate load
exceptions, whereas store, storeconditional, and AMO instructions
generate store/AMO exceptions.
For an AMO operation, a translation page fault should record into
m/scause the value 15 Store/AMO Page Fault. Instead, I observe it
record 13 Load Page Fault.
Here is a look at this using the execlog plugin:
0, 0x3fbf6dfee8, 0x8f5b02f, "amoswap.d zero,a5,(a1)", priv ->
0x0000000000000001, scause -> 0x000000000000000d
0, 0xffffffff8031b5ac, 0x14021273, "csrrw tp,sscratch,tp", tp
-> 0xffffffd600926400
In this scenario a Linux user-space app issues the AMO to an address
that is not yet mapped, which causes the exception 0xd 13 Load Page
Fault.
I've spent time looking into this. The AMO operation, at run-time
(after TCG), has a call tree that looks like:
code_gen_buffer
do_ld8_mmu
mmu_lookup
mmu_lookup1
tlb_fill
cpu_riscv_tlb_fill
get_physical_address
As the AMO starts with the read/load part of the operation, the
translation fails since the pte.V=0 (not yet mapped), and because
get_physical_address is told access_type=MMU_DATA_LOAD, upon return
the raise_mmu_exception() turns that MMU_DATA_LOAD into a
RISCV_EXCP_LOAD_PAGE_FAULT.
This example is a Linux scenario, and I think that I see the Load Page
Fault happen, and then upon return from the exception handler, a Store
Page fault happens, after which the AMO is able to complete. So Linux
has been "gracefully" covering this situation.
I've looked into trying to detect this condition and change the
exception_index/cause accordingly. Fundamentally there would need to
be state reachable via get_physical_adress() that can indicate an AMO.
In the short time I've looked at this, I can't find it.
Advice/tips welcome.
I'm on stable-9.1 <v9.1.1> sha 0ff5ab6f57a2.
Fixed in 9.2 by 98f21c30f5beffc45232721ae79c019df58ae9f1.
r~