Re: [PATCH 0/6] RISC-V: frm state-machine improvements

钟居哲 Sun, 11 May 2025 19:23:05 -0700

Hi, vineet.

>> I have a feeling this has to do with following:
>> https://godbolt.org/z/Px9es7j1r


I saw in there are 2 fsrm instruction inside the main loop in Clang generated 
ASM which I think GCC is better.

Correct me if I am wrong. Thanks.



[email protected]
 
From: Vineet Gupta
Date: 2025-05-10 04:27
To: gcc-patches
CC: gnu-toolchain; Jeff Law; Robin Dapp; Juzhe Zhong; Pan Li; Kito Cheng; 
Vineet Gupta
Subject: [PATCH 0/6] RISC-V: frm state-machine improvements
Hi,
 
This came out of Rivos perf team reporting (shoutout to Siavash) that
some of the SPEC2017 workloads had unnecessary FRM wiggles, when
none were needed. The writes in particular could be expensive.
 
I started with reduced test for PR/119164 from blender:node_testure_util.c.
 
However in trying to understand (and a botched rewrite of whole thing)
it turned out that lot of code was just unnecessary leading to more
complexity than warranted. As a result there are more deletions here and
the actual improvements come from just a few lines of actual changes.
 
I've verified each patch incrementally with
- Testsuite run (unchanged, 1 unexpected pass 
gcc.target/riscv/rvv/autovec/pr119114.c)
- SPEC build
- Static analysis of FRM read/write insns emitted in all of SPEC binaries.
- There's BPI date for some of this too, but the delta there is not
   significant as this could really be uarch specific.
 
Here's the result for static analysis.
 
 
            1. revert-confluence  2. remove-edge-insert  4-fewer-frm-restore  
5-call-backtrack
                                  3. remove-mode-after
              -------------------  --------------------  -------------------  
---------------
                frrm fsrmi fsrm       frrm fsrmi fsrm       frrm fsrmi fsrm     
frrm fsrmi fsrm
    perlbench_r   42    0    4          42    0    4          17    0    1      
  17    0    1
       cpugcc_r  167    0   17         167    0   17          11    0    0      
  11    0    0
       bwaves_r   16    0    1          16    0    1          16    0    1      
  16    0    1
          mcf_r   11    0    0          11    0    0          11    0    0      
  11    0    0
   cactusBSSN_r   79    0   27          76    0   27          19    0    1      
  19    0    1
         namd_r  119    0   63         119    0   63          14    0    1      
  14    0    1
       parest_r  218    0  114         168    0  114          24    0    1      
  24    0    1
       povray_r  123    1   17         123    1   17          26    1    6      
  26    1    6
          lbm_r    6    0    0           6    0    0           6    0    0      
   6    0    0
      omnetpp_r   17    0    1          17    0    1          17    0    1      
  17    0    1
          wrf_r 2287   13 1956        2287   13 1956        1268   13 1603      
 613   13   82
     cpuxalan_r   17    0    1          17    0    1          17    0    1      
  17    0    1
       ldecod_r   11    0    0          11    0    0          11    0    0      
  11    0    0
         x264_r   14    0    1          14    0    1          11    0    0      
  11    0    0
      blender_r  724   12  182         724   12  182          61   12   42      
  39   12   16
         cam4_r  324   13  169         324   13  169          45   13   20      
  40   13   17
    deepsjeng_r   11    0    0          11    0    0          11    0    0      
  11    0    0
      imagick_r  265   16   34         265   16   34         132   16   25      
  33   16   18
        leela_r   12    0    0          12    0    0          12    0    0      
  12    0    0
          nab_r   13    0    1          13    0    1          13    0    1      
  13    0    1
    exchange2_r   16    0    1          16    0    1          16    0    1      
  16    0    1
    fotonik3d_r   20    0   11          20    0   11          19    0    1      
  19    0    1
         roms_r   33    0   23          33    0   23          21    0    1      
  21    0    1
           xz_r    6    0    0           6    0    0           6    0    0      
   6    0    0
              --------------------  -------------------  -------------------  
----------------
                4551   55 2623        4498   55 2623        1804   55 1707      
1023   55  150
              --------------------  -------------------  -------------------  
----------------
                          7729                  7176                  3566      
          1228
              --------------------  -------------------  -------------------  
----------------
 
It seems wrf still has half of all read/writes
                 613   13   82
 
with one function having the bulk of them
      solve_em_  555    1   50
 
This is 1 static RM so ideally needs 1 save and 1 restore.
 
I have a feeling this has to do with following:
    https://godbolt.org/z/Px9es7j1r
 
The function call code path need not bother with frm save/restore at
all. This is currently being investigated but could take more time.
 
Please review.
 
Thx,
-Vineet
 
Vineet Gupta (6):
  emit-rtl: document next_nonnote_nondebug_insn_bb () can breach into
    next BB
  RISC-V: frm/mode-switch: remove TARGET_MODE_CONFLUENCE
  RISC-V: frm/mode-switch: remove dubious frm edge insertion before
    call_insn
  RISC-V: frm/mode-switch: TARGET_MODE_AFTER not needed for frm
    switching
  RISC-V: frm/mode-switch: Reduce FRM restores on DYN transition
  RISC-V: frm/mode-switch: robustify call_insn backtracking
    [PR119164][PR120203]
 
gcc/config/riscv/riscv.cc                     | 166 +++---------------
gcc/emit-rtl.cc                               |   6 +-
.../rvv/base/float-point-dynamic-frm-74.c     |   2 +-
.../gcc.target/riscv/rvv/base/pr119164.c      |  22 +++
4 files changed, 50 insertions(+), 146 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr119164.c
 
-- 
2.43.0

Re: [PATCH 0/6] RISC-V: frm state-machine improvements

Reply via email to