On 5/10/25 07:17, Jeff Law wrote:
> On 5/9/25 2:27 PM, Vineet Gupta wrote:
>> This showed up when debugging the testcase for PR119164.
>>
>> RISC-V FRM mode-switching state machine has special handling for transitions
>> to and from a call_insn as FRM needs to saved/restored around calls (any
>> call is considered potentially FRM clobbering). Consider the following
>> canonical example where insns 2, 4, 6 come are due to user code, while
>> the rest of frm save/restore insns 1, 3, 5, 7 need to be generated for the
>> ABI semantics.
>>
>> test_float_point_frm_static:
>> 1: frrm a5 <--
>> 2: fsrmi 2
>> 3: fsrm a5 <--
>> 4: call normalize_vl
>> 5: frrm a5 <--
>> 6: fsrmi 3
>> 7: fsrm a5 <--
> Not the focus of this series, but isn't the frrm@5 unnecessary?
It is necessary. Any function call can potentially change frm. Assuming
normalize_vl does, we need to read the latest "global" so that the restore in
insn 7 does the right thing.
> If there is no use of FRM after a call, but before the next set of FRM,
> then the restore isn't needed. This is a pretty standard optimization
> in caller-saves style code generation. Though maybe it's really an
> artifact of not showing the uses of FRM in your little example?
Please see above.
>> Current implementation of RISC-V TARGET_MODE_NEEDED has special handling
>> if the call_insn is last insn of BB, to ensure FRM save/reads are emitted
>> on all the edges. However it doesn't work as intended and is borderline
>> bogus for following reasons:
>>
>> - It fails to detect call_insn as last of BB (PR119164 test) if the
>> next BB starts with a code label (say due to call being conditional).
>> Granted this is a deficiency of API next_nonnote_nondebug_insn_bb ()
>> which incorrectly returns next BB code_label as opposed to returning
>> NULL (and this behavior is kind of relied upon by much of gcc).
>> This causes missed/delayed state transition to DYN.
>>
>> - If code is tightened to actually detect above such as:
>>
>> - rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn);
>> - if (!insn)
>> + if (BB_END (BLOCK_FOR_INSN (cur_insn)) == cur_insn)
> I'm guessing the concern here was CUR_INSN could be a call and the last
> real insn in the block, but there could be notes and debug statements
> between the call and the actual end of the block?
Yeah in the test case, there was a label as actual last insn of BB, but generic
mode_sw after call skips over labels (NONDEBUG_INSN_P used in the iterator).
Whereas the {prev,next}_nonnote_nondebug_insn_bb () doesn't hence the impedance
mismatch.
> Would it be safer as:
> rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn);
> if (!insn || BLOCK_FOR_INSN (cur_insn) != BLOCK_FOR_INSN (insn)
>
> But it's all academic since this code really just needs to go away.
Exactly.
>> edge insertion happens but ends up splitting the BB which generic
>> mode-sw doesn't expect and ends up hittng an ICE.
>>
>> - TARGET_MODE_NEEDED hook typically don't modify the CFG.
> Yea, we probably don't want TARGET_MODE_NEEDED to be modifying the CFG.
>
>> - For abnormal edges, insert_insn_end_basic_block () is called, which
>> by design on encountering call_insn as last in BB, inserts new insn
>> BEFORE the call, not after.
> Right. In theory we're not supposed to be asking for insertion on
> abnormals anyway. It would be interesting to know if that's still
> happening or if it's legacy from the late 90s when we first started
> lighting up LCM algorithms and made all the common mistakes WRT abnormals.
Well the RISC-V code was looping thru all the edges and handling regular and
abnormal edges.
When I added the debug prints there, it was sometimes hitting the abnormal edge
case and putting before call, not after.
>> So this is just all wrong and ripe for removal. Moreover there seems to
>> be no testsuite coverage for this code path at all. Results don't change
>> at all if this is removed.
>>
>> The total number of FRM read/writes emitted (static count) across all
>> benchmarks of a SPEC2017 -Ofast -march=rv64gcv build decrease slightly
>> so its a net win even if minimal but the real gain is reduced complexity
>> and maintenance.
>>
>> Before Patch
>> ---------------- ---------------
>> frrm fsrmi fsrm frrm fsrmi frrm
>> perlbench_r 42 0 4 42 0 4
>> cpugcc_r 167 0 17 167 0 17
>> bwaves_r 16 0 1 16 0 1
>> mcf_r 11 0 0 11 0 0
>> cactusBSSN_r 79 0 27 76 0 27
>> namd_r 119 0 63 119 0 63
>> parest_r 218 0 114 168 0 114 <--
>> povray_r 123 1 17 123 1 17
>> lbm_r 6 0 0 6 0 0
>> omnetpp_r 17 0 1 17 0 1
>> wrf_r 2287 13 1956 2287 13 1956
>> cpuxalan_r 17 0 1 17 0 1
>> ldecod_r 11 0 0 11 0 0
>> x264_r 14 0 1 14 0 1
>> blender_r 724 12 182 724 12 182
>> cam4_r 324 13 169 324 13 169
>> deepsjeng_r 11 0 0 11 0 0
>> imagick_r 265 16 34 265 16 34
>> leela_r 12 0 0 12 0 0
>> nab_r 13 0 1 13 0 1
>> exchange2_r 16 0 1 16 0 1
>> fotonik3d_r 20 0 11 20 0 11
>> roms_r 33 0 23 33 0 23
>> xz_r 6 0 0 6 0 0
>> ---------------- ---------------
>> 4551 55 2623 4498 55 2623
>>
>> gcc/ChangeLog:
>> * config/riscv/riscv.cc (riscv_frm_emit_after_bb_end): Delete.
>> (riscv_frm_mode_needed): Remove call riscv_frm_emit_after_bb_end.
> OK once all the pre/post requisites are installed.
Thx,
-Vineet