https://gcc.gnu.org/g:2cc7dc042140083dee3a9459fd79e1eb0b4c1a71

commit 2cc7dc042140083dee3a9459fd79e1eb0b4c1a71
Author: Jeff Law <j...@ventanamicro.com>
Date:   Thu Sep 5 15:45:25 2024 -0600

    [V2][RISC-V] Avoid unnecessary extensions after sCC insns
    
    So the first patch failed the pre-commit CI; it didn't fail in my testing
    because I'm using --with-arch to set a default configuration that includes
    things like zicond to ensure that's always tested.  And the failing test is
    skipped when zicond is enabled by default.
    
    The failing test is designed to ensure that we don't miss an if-conversion 
due
    to costing issues around the extension that was typically done in an sCC
    sequence (which is why it's only run when zicond is off).
    
    The test failed because we have a little routine that is highly dependent on
    the code generated by the sCC expander and will adjust the costing to 
account
    for expansion quirks that usually go away in register allocation.
    
    That code needs to be enhanced to work after the sCC expansion change.
    Essentially it needs to account for the subreg extraction that shows up in 
the
    sequence as well as being a bit looser on mode checking.
    
    I kept the code working for the old sequences -- in theory a user could 
conjure
    up the old sequence so handling them seems useful.
    
    This also drops the testsuite changes.  Palmer's change makes them 
unnecessary.
    
    ---
    
    So I was looking at a performance regression in spec with Ventana's internal
    tree.  Ultimately the problem was a bad interaction with an internal patch
    (REP_MODE_EXTENDED), fwprop and ext-dce.  The details of that problem aren't
    particularly important.
    
    Removal of the local patch went reasonably well.  But I did see some 
secondary
    cases where we had redundant sign extensions.  The most notable cases come 
from
    the integer sCC insns.
    
    Expansion of those cases for rv64 can be improved using Jivan's trick. ie, 
if
    the target is not DImode, then create a DImode temporary for the result and
    copy the low bits out with a promoted subreg to the real target.
    
    With the change in expansion the final code we generate is slightly 
different
    for a few tests at -O1/-Og, but should perform the same.  The key for the
    affected tests is we're not seeing the introduction of unnecessary 
extensions.
    Rather than adjust the regexps to handle the -O1/-Og output, skipping for 
those
    seemed OK to me.  I didn't extract a testcase.  I'm a bit fried from digging
    through LTO'd code right now.
    
    gcc/
            * config/riscv/riscv.cc (riscv_expand_int_scc): For rv64, use a DI
            temporary for the output and a promoted subreg to extract it into SI
            arget.
            (riscv_noce_conversion_profitable_p): Recognize new output from
            sCC expansion too.
    
    (cherry picked from commit b567e5ead5d54f022c57b48f31653f6ae6ece007)

Diff:
---
 gcc/config/riscv/riscv.cc | 46 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 74de8c7f3d7..e2e2d2db342 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4217,11 +4217,29 @@ riscv_noce_conversion_profitable_p (rtx_insn *seq,
              riscv_if_info.original_cost += COSTS_N_INSNS (1);
              riscv_if_info.max_seq_cost += COSTS_N_INSNS (1);
            }
-         last_dest = NULL_RTX;
+
          rtx dest = SET_DEST (x);
-         if (COMPARISON_P (src)
+
+         /* Do something similar for the  moves that are likely to
+            turn into NOP moves by the time the register allocator is
+            done.  These are also side effects of how our sCC expanders
+            work.  We'll want to check and update LAST_DEST here too.  */
+         if (last_dest
              && REG_P (dest)
-             && GET_MODE (dest) == SImode)
+             && GET_MODE (dest) == SImode
+             && SUBREG_P (src)
+             && SUBREG_PROMOTED_VAR_P (src)
+             && REGNO (SUBREG_REG (src)) == REGNO (last_dest))
+           {
+             riscv_if_info.original_cost += COSTS_N_INSNS (1);
+             riscv_if_info.max_seq_cost += COSTS_N_INSNS (1);
+             if (last_dest)
+               last_dest = dest;
+           }
+         else
+           last_dest = NULL_RTX;
+
+         if (COMPARISON_P (src) && REG_P (dest))
            last_dest = dest;
        }
       else
@@ -4903,13 +4921,31 @@ riscv_expand_int_scc (rtx target, enum rtx_code code, 
rtx op0, rtx op1, bool *in
   riscv_extend_comparands (code, &op0, &op1);
   op0 = force_reg (word_mode, op0);
 
+  /* For sub-word targets on rv64, do the computation in DImode
+     then extract the lowpart for the final target, marking it
+     as sign extended.  Note that it's also properly zero extended,
+     but it's probably more profitable to expose it as sign extended.  */
+  rtx t;
+  if (TARGET_64BIT && GET_MODE (target) == SImode)
+    t = gen_reg_rtx (DImode);
+  else
+    t = target;
+
   if (code == EQ || code == NE)
     {
       rtx zie = riscv_zero_if_equal (op0, op1);
-      riscv_emit_binary (code, target, zie, const0_rtx);
+      riscv_emit_binary (code, t, zie, const0_rtx);
     }
   else
-    riscv_emit_int_order_test (code, invert_ptr, target, op0, op1);
+    riscv_emit_int_order_test (code, invert_ptr, t, op0, op1);
+
+  if (t != target)
+    {
+      t = gen_lowpart (SImode, t);
+      SUBREG_PROMOTED_VAR_P (t) = 1;
+      SUBREG_PROMOTED_SET (t, SRP_SIGNED);
+      emit_move_insn (target, t);
+    }
 }
 
 /* Like riscv_expand_int_scc, but for floating-point comparisons.  */

Reply via email to