On 5/10/25 07:17, Jeff Law wrote:
> On 5/9/25 2:27 PM, Vineet Gupta wrote:
>> This showed up when debugging the testcase for PR119164.
>>
>> RISC-V FRM mode-switching state machine has special handling for transitions
>> to and from a call_insn as FRM needs to saved/restored around calls (any
>> call is considered potentially FRM clobbering). Consider the following
>> canonical example where insns 2, 4, 6 come are due to user code, while
>> the rest of frm save/restore insns 1, 3, 5, 7 need to be generated for the
>> ABI semantics.
>>
>>          test_float_point_frm_static:
>>          1:  frrm    a5 <--
>>       2:     fsrmi   2
>>          3:  fsrm    a5 <--
>>       4:     call    normalize_vl
>>          5:  frrm    a5 <--
>>       6:     fsrmi   3
>>          7:  fsrm    a5 <--
> Not the focus of this series, but isn't the frrm@5 unnecessary?  

It is necessary. Any function call can potentially change frm. Assuming
normalize_vl does, we need to read the latest "global" so that the restore in
insn 7 does the right thing.


> If there is no use of FRM after a call, but before the next set of FRM, 
> then the restore isn't needed.  This is a pretty standard optimization 
> in caller-saves style code generation.  Though maybe it's really an 
> artifact of not showing the uses of FRM in your little example?

Please see above.

>> Current implementation of RISC-V TARGET_MODE_NEEDED has special handling
>> if the call_insn is last insn of BB, to ensure FRM save/reads are emitted
>> on all the edges. However it doesn't work as intended and is borderline
>> bogus for following reasons:
>>
>>   - It fails to detect call_insn as last of BB (PR119164 test) if the
>>     next BB starts with a code label (say due to call being conditional).
>>     Granted this is a deficiency of API next_nonnote_nondebug_insn_bb ()
>>     which incorrectly returns next BB code_label as opposed to returning
>>     NULL (and this behavior is kind of relied upon by much of gcc).
>>     This causes missed/delayed state transition to DYN.
>>
>>   - If code is tightened to actually detect above such as:
>>
>>       -  rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn);
>>       -  if (!insn)
>>       +  if (BB_END (BLOCK_FOR_INSN (cur_insn)) == cur_insn)
> I'm guessing the concern here was CUR_INSN could be a call and the last 
> real insn in the block, but there could be notes and debug statements 
> between the call and the actual end of the block?

Yeah in the test case, there was a label as actual last insn of BB, but generic
mode_sw after call skips over labels (NONDEBUG_INSN_P used in the iterator).
Whereas the {prev,next}_nonnote_nondebug_insn_bb () doesn't hence the impedance
mismatch.

> Would it be safer as:
> rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn);
> if (!insn || BLOCK_FOR_INSN (cur_insn) != BLOCK_FOR_INSN (insn)
>
> But it's all academic since this code really just needs to go away.

Exactly.

>>     edge insertion happens but ends up splitting the BB which generic
>>     mode-sw doesn't expect and ends up hittng an ICE.
>>
>>   - TARGET_MODE_NEEDED hook typically don't modify the CFG.
> Yea, we probably don't want TARGET_MODE_NEEDED to be modifying the CFG.
>
>>   - For abnormal edges, insert_insn_end_basic_block () is called, which
>>     by design on encountering call_insn as last in BB, inserts new insn
>>     BEFORE the call, not after.
> Right.  In theory we're not supposed to be asking for insertion on 
> abnormals anyway.  It would be interesting to know if that's still 
> happening or if it's legacy from the late 90s when we first started 
> lighting up LCM algorithms and made all the common mistakes WRT abnormals.

Well the RISC-V code was looping thru all the edges and handling regular and
abnormal edges.
When I added the debug prints there, it was sometimes hitting the abnormal edge
case and putting before call, not after.

>> So this is just all wrong and ripe for removal. Moreover there seems to
>> be no testsuite coverage for this code path at all. Results don't change
>> at all if this is removed.
>>
>> The total number of FRM read/writes emitted (static count) across all
>> benchmarks of a SPEC2017 -Ofast -march=rv64gcv build decrease slightly
>> so its a net win even if minimal but the real gain is reduced complexity
>> and maintenance.
>>
>>                     Before             Patch
>>                 ----------------  ---------------
>>                  frrm fsrmi fsrm   frrm fsrmi frrm
>>      perlbench_r   42    0    4      42    0    4
>>         cpugcc_r  167    0   17     167    0   17
>>         bwaves_r   16    0    1      16    0    1
>>            mcf_r   11    0    0      11    0    0
>>     cactusBSSN_r   79    0   27      76    0   27
>>           namd_r  119    0   63     119    0   63
>>         parest_r  218    0  114     168    0  114 <--
>>         povray_r  123    1   17     123    1   17
>>            lbm_r    6    0    0       6    0    0
>>        omnetpp_r   17    0    1      17    0    1
>>            wrf_r 2287   13 1956    2287   13 1956
>>       cpuxalan_r   17    0    1      17    0    1
>>         ldecod_r   11    0    0      11    0    0
>>           x264_r   14    0    1      14    0    1
>>        blender_r  724   12  182     724   12  182
>>           cam4_r  324   13  169     324   13  169
>>      deepsjeng_r   11    0    0      11    0    0
>>        imagick_r  265   16   34     265   16   34
>>          leela_r   12    0    0      12    0    0
>>            nab_r   13    0    1      13    0    1
>>      exchange2_r   16    0    1      16    0    1
>>      fotonik3d_r   20    0   11      20    0   11
>>           roms_r   33    0   23      33    0   23
>>             xz_r    6    0    0       6    0    0
>>                ----------------   ---------------
>>                  4551   55 2623    4498   55 2623
>>
>> gcc/ChangeLog:
>>      * config/riscv/riscv.cc (riscv_frm_emit_after_bb_end): Delete.
>>      (riscv_frm_mode_needed): Remove call riscv_frm_emit_after_bb_end.
> OK once all the pre/post requisites are installed.

Thx,
-Vineet

Reply via email to