On 5/9/25 2:27 PM, Vineet Gupta wrote:
Hi,

This came out of Rivos perf team reporting (shoutout to Siavash) that
some of the SPEC2017 workloads had unnecessary FRM wiggles, when
none were needed. The writes in particular could be expensive.

I started with reduced test for PR/119164 from blender:node_testure_util.c.

However in trying to understand (and a botched rewrite of whole thing)
it turned out that lot of code was just unnecessary leading to more
complexity than warranted. As a result there are more deletions here and
the actual improvements come from just a few lines of actual changes.

I've verified each patch incrementally with
  - Testsuite run (unchanged, 1 unexpected pass 
gcc.target/riscv/rvv/autovec/pr119114.c)
  - SPEC build
  - Static analysis of FRM read/write insns emitted in all of SPEC binaries.
  - There's BPI date for some of this too, but the delta there is not
    significant as this could really be uarch specific.

Here's the result for static analysis.


             1. revert-confluence  2. remove-edge-insert  4-fewer-frm-restore  
5-call-backtrack
                                   3. remove-mode-after
               -------------------  --------------------  -------------------  
---------------
                 frrm fsrmi fsrm       frrm fsrmi fsrm       frrm fsrmi fsrm    
 frrm fsrmi fsrm
     perlbench_r   42    0    4          42    0    4          17    0    1     
   17    0    1
        cpugcc_r  167    0   17         167    0   17          11    0    0     
   11    0    0
        bwaves_r   16    0    1          16    0    1          16    0    1     
   16    0    1
           mcf_r   11    0    0          11    0    0          11    0    0     
   11    0    0
    cactusBSSN_r   79    0   27          76    0   27          19    0    1     
   19    0    1
          namd_r  119    0   63         119    0   63          14    0    1     
   14    0    1
        parest_r  218    0  114         168    0  114          24    0    1     
   24    0    1
        povray_r  123    1   17         123    1   17          26    1    6     
   26    1    6
           lbm_r    6    0    0           6    0    0           6    0    0     
    6    0    0
       omnetpp_r   17    0    1          17    0    1          17    0    1     
   17    0    1
           wrf_r 2287   13 1956        2287   13 1956        1268   13 1603     
  613   13   82
      cpuxalan_r   17    0    1          17    0    1          17    0    1     
   17    0    1
        ldecod_r   11    0    0          11    0    0          11    0    0     
   11    0    0
          x264_r   14    0    1          14    0    1          11    0    0     
   11    0    0
       blender_r  724   12  182         724   12  182          61   12   42     
   39   12   16
          cam4_r  324   13  169         324   13  169          45   13   20     
   40   13   17
     deepsjeng_r   11    0    0          11    0    0          11    0    0     
   11    0    0
       imagick_r  265   16   34         265   16   34         132   16   25     
   33   16   18
         leela_r   12    0    0          12    0    0          12    0    0     
   12    0    0
           nab_r   13    0    1          13    0    1          13    0    1     
   13    0    1
     exchange2_r   16    0    1          16    0    1          16    0    1     
   16    0    1
     fotonik3d_r   20    0   11          20    0   11          19    0    1     
   19    0    1
          roms_r   33    0   23          33    0   23          21    0    1     
   21    0    1
            xz_r    6    0    0           6    0    0           6    0    0     
    6    0    0
               --------------------  -------------------  -------------------  
----------------
                 4551   55 2623        4498   55 2623        1804   55 1707     
 1023   55  150
               --------------------  -------------------  -------------------  
----------------
                           7729                  7176                  3566     
           1228
               --------------------  -------------------  -------------------  
----------------

It seems wrf still has half of all read/writes
                  613   13   82

with one function having the bulk of them
       solve_em_  555    1   50

This is 1 static RM so ideally needs 1 save and 1 restore.

I have a feeling this has to do with following:
     https://godbolt.org/z/Px9es7j1r

The function call code path need not bother with frm save/restore at
all. This is currently being investigated but could take more time.
Frankly I'm surprised we need FRM adjustments as much as we do, though presumably there's some builtin or somesuch that we need to twiddle FRM to implement and as a result if the builtin ever gets used it leads to FRM games. But it still seems high. For example, what does xz do that triggers any FRM adjustments, even statically?!?



Please review.
Will do.

Thanks!

jeff

Reply via email to