Hi, vineet. >> I have a feeling this has to do with following: >> https://godbolt.org/z/Px9es7j1r
I saw in there are 2 fsrm instruction inside the main loop in Clang generated ASM which I think GCC is better. Correct me if I am wrong. Thanks. juzhe.zh...@rivai.ai From: Vineet Gupta Date: 2025-05-10 04:27 To: gcc-patches CC: gnu-toolchain; Jeff Law; Robin Dapp; Juzhe Zhong; Pan Li; Kito Cheng; Vineet Gupta Subject: [PATCH 0/6] RISC-V: frm state-machine improvements Hi, This came out of Rivos perf team reporting (shoutout to Siavash) that some of the SPEC2017 workloads had unnecessary FRM wiggles, when none were needed. The writes in particular could be expensive. I started with reduced test for PR/119164 from blender:node_testure_util.c. However in trying to understand (and a botched rewrite of whole thing) it turned out that lot of code was just unnecessary leading to more complexity than warranted. As a result there are more deletions here and the actual improvements come from just a few lines of actual changes. I've verified each patch incrementally with - Testsuite run (unchanged, 1 unexpected pass gcc.target/riscv/rvv/autovec/pr119114.c) - SPEC build - Static analysis of FRM read/write insns emitted in all of SPEC binaries. - There's BPI date for some of this too, but the delta there is not significant as this could really be uarch specific. Here's the result for static analysis. 1. revert-confluence 2. remove-edge-insert 4-fewer-frm-restore 5-call-backtrack 3. remove-mode-after ------------------- -------------------- ------------------- --------------- frrm fsrmi fsrm frrm fsrmi fsrm frrm fsrmi fsrm frrm fsrmi fsrm perlbench_r 42 0 4 42 0 4 17 0 1 17 0 1 cpugcc_r 167 0 17 167 0 17 11 0 0 11 0 0 bwaves_r 16 0 1 16 0 1 16 0 1 16 0 1 mcf_r 11 0 0 11 0 0 11 0 0 11 0 0 cactusBSSN_r 79 0 27 76 0 27 19 0 1 19 0 1 namd_r 119 0 63 119 0 63 14 0 1 14 0 1 parest_r 218 0 114 168 0 114 24 0 1 24 0 1 povray_r 123 1 17 123 1 17 26 1 6 26 1 6 lbm_r 6 0 0 6 0 0 6 0 0 6 0 0 omnetpp_r 17 0 1 17 0 1 17 0 1 17 0 1 wrf_r 2287 13 1956 2287 13 1956 1268 13 1603 613 13 82 cpuxalan_r 17 0 1 17 0 1 17 0 1 17 0 1 ldecod_r 11 0 0 11 0 0 11 0 0 11 0 0 x264_r 14 0 1 14 0 1 11 0 0 11 0 0 blender_r 724 12 182 724 12 182 61 12 42 39 12 16 cam4_r 324 13 169 324 13 169 45 13 20 40 13 17 deepsjeng_r 11 0 0 11 0 0 11 0 0 11 0 0 imagick_r 265 16 34 265 16 34 132 16 25 33 16 18 leela_r 12 0 0 12 0 0 12 0 0 12 0 0 nab_r 13 0 1 13 0 1 13 0 1 13 0 1 exchange2_r 16 0 1 16 0 1 16 0 1 16 0 1 fotonik3d_r 20 0 11 20 0 11 19 0 1 19 0 1 roms_r 33 0 23 33 0 23 21 0 1 21 0 1 xz_r 6 0 0 6 0 0 6 0 0 6 0 0 -------------------- ------------------- ------------------- ---------------- 4551 55 2623 4498 55 2623 1804 55 1707 1023 55 150 -------------------- ------------------- ------------------- ---------------- 7729 7176 3566 1228 -------------------- ------------------- ------------------- ---------------- It seems wrf still has half of all read/writes 613 13 82 with one function having the bulk of them solve_em_ 555 1 50 This is 1 static RM so ideally needs 1 save and 1 restore. I have a feeling this has to do with following: https://godbolt.org/z/Px9es7j1r The function call code path need not bother with frm save/restore at all. This is currently being investigated but could take more time. Please review. Thx, -Vineet Vineet Gupta (6): emit-rtl: document next_nonnote_nondebug_insn_bb () can breach into next BB RISC-V: frm/mode-switch: remove TARGET_MODE_CONFLUENCE RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn RISC-V: frm/mode-switch: TARGET_MODE_AFTER not needed for frm switching RISC-V: frm/mode-switch: Reduce FRM restores on DYN transition RISC-V: frm/mode-switch: robustify call_insn backtracking [PR119164][PR120203] gcc/config/riscv/riscv.cc | 166 +++--------------- gcc/emit-rtl.cc | 6 +- .../rvv/base/float-point-dynamic-frm-74.c | 2 +- .../gcc.target/riscv/rvv/base/pr119164.c | 22 +++ 4 files changed, 50 insertions(+), 146 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr119164.c -- 2.43.0