On 5/10/25 07:17, Jeff Law wrote: > On 5/9/25 2:27 PM, Vineet Gupta wrote: >> This showed up when debugging the testcase for PR119164. >> >> RISC-V FRM mode-switching state machine has special handling for transitions >> to and from a call_insn as FRM needs to saved/restored around calls (any >> call is considered potentially FRM clobbering). Consider the following >> canonical example where insns 2, 4, 6 come are due to user code, while >> the rest of frm save/restore insns 1, 3, 5, 7 need to be generated for the >> ABI semantics. >> >> test_float_point_frm_static: >> 1: frrm a5 <-- >> 2: fsrmi 2 >> 3: fsrm a5 <-- >> 4: call normalize_vl >> 5: frrm a5 <-- >> 6: fsrmi 3 >> 7: fsrm a5 <-- > Not the focus of this series, but isn't the frrm@5 unnecessary?
It is necessary. Any function call can potentially change frm. Assuming normalize_vl does, we need to read the latest "global" so that the restore in insn 7 does the right thing. > If there is no use of FRM after a call, but before the next set of FRM, > then the restore isn't needed. This is a pretty standard optimization > in caller-saves style code generation. Though maybe it's really an > artifact of not showing the uses of FRM in your little example? Please see above. >> Current implementation of RISC-V TARGET_MODE_NEEDED has special handling >> if the call_insn is last insn of BB, to ensure FRM save/reads are emitted >> on all the edges. However it doesn't work as intended and is borderline >> bogus for following reasons: >> >> - It fails to detect call_insn as last of BB (PR119164 test) if the >> next BB starts with a code label (say due to call being conditional). >> Granted this is a deficiency of API next_nonnote_nondebug_insn_bb () >> which incorrectly returns next BB code_label as opposed to returning >> NULL (and this behavior is kind of relied upon by much of gcc). >> This causes missed/delayed state transition to DYN. >> >> - If code is tightened to actually detect above such as: >> >> - rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn); >> - if (!insn) >> + if (BB_END (BLOCK_FOR_INSN (cur_insn)) == cur_insn) > I'm guessing the concern here was CUR_INSN could be a call and the last > real insn in the block, but there could be notes and debug statements > between the call and the actual end of the block? Yeah in the test case, there was a label as actual last insn of BB, but generic mode_sw after call skips over labels (NONDEBUG_INSN_P used in the iterator). Whereas the {prev,next}_nonnote_nondebug_insn_bb () doesn't hence the impedance mismatch. > Would it be safer as: > rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn); > if (!insn || BLOCK_FOR_INSN (cur_insn) != BLOCK_FOR_INSN (insn) > > But it's all academic since this code really just needs to go away. Exactly. >> edge insertion happens but ends up splitting the BB which generic >> mode-sw doesn't expect and ends up hittng an ICE. >> >> - TARGET_MODE_NEEDED hook typically don't modify the CFG. > Yea, we probably don't want TARGET_MODE_NEEDED to be modifying the CFG. > >> - For abnormal edges, insert_insn_end_basic_block () is called, which >> by design on encountering call_insn as last in BB, inserts new insn >> BEFORE the call, not after. > Right. In theory we're not supposed to be asking for insertion on > abnormals anyway. It would be interesting to know if that's still > happening or if it's legacy from the late 90s when we first started > lighting up LCM algorithms and made all the common mistakes WRT abnormals. Well the RISC-V code was looping thru all the edges and handling regular and abnormal edges. When I added the debug prints there, it was sometimes hitting the abnormal edge case and putting before call, not after. >> So this is just all wrong and ripe for removal. Moreover there seems to >> be no testsuite coverage for this code path at all. Results don't change >> at all if this is removed. >> >> The total number of FRM read/writes emitted (static count) across all >> benchmarks of a SPEC2017 -Ofast -march=rv64gcv build decrease slightly >> so its a net win even if minimal but the real gain is reduced complexity >> and maintenance. >> >> Before Patch >> ---------------- --------------- >> frrm fsrmi fsrm frrm fsrmi frrm >> perlbench_r 42 0 4 42 0 4 >> cpugcc_r 167 0 17 167 0 17 >> bwaves_r 16 0 1 16 0 1 >> mcf_r 11 0 0 11 0 0 >> cactusBSSN_r 79 0 27 76 0 27 >> namd_r 119 0 63 119 0 63 >> parest_r 218 0 114 168 0 114 <-- >> povray_r 123 1 17 123 1 17 >> lbm_r 6 0 0 6 0 0 >> omnetpp_r 17 0 1 17 0 1 >> wrf_r 2287 13 1956 2287 13 1956 >> cpuxalan_r 17 0 1 17 0 1 >> ldecod_r 11 0 0 11 0 0 >> x264_r 14 0 1 14 0 1 >> blender_r 724 12 182 724 12 182 >> cam4_r 324 13 169 324 13 169 >> deepsjeng_r 11 0 0 11 0 0 >> imagick_r 265 16 34 265 16 34 >> leela_r 12 0 0 12 0 0 >> nab_r 13 0 1 13 0 1 >> exchange2_r 16 0 1 16 0 1 >> fotonik3d_r 20 0 11 20 0 11 >> roms_r 33 0 23 33 0 23 >> xz_r 6 0 0 6 0 0 >> ---------------- --------------- >> 4551 55 2623 4498 55 2623 >> >> gcc/ChangeLog: >> * config/riscv/riscv.cc (riscv_frm_emit_after_bb_end): Delete. >> (riscv_frm_mode_needed): Remove call riscv_frm_emit_after_bb_end. > OK once all the pre/post requisites are installed. Thx, -Vineet