https://gcc.gnu.org/g:36daa0da95fba18d0d4eb4e10fa07ac3b76fa426

commit r16-512-g36daa0da95fba18d0d4eb4e10fa07ac3b76fa426
Author: Shreya Munnangi <smunnan...@ventanamicro.com>
Date:   Sat May 10 07:18:33 2025 -0600

    [V2][RISC-V] Synthesize more efficient IOR/XOR sequences
    
    So mvconst_internal's primary benefit is in constant synthesis not impacting
    the combine budget in terms of the number of instructions it is willing to
    combine together at any given time.  The downside is mvconst_internal breaks
    combine's toplevel costing model and as a result many other patterns have 
to be
    implemented as define_insn_and_splits rather than the often more natural
    define_splits.
    
    This primarily impacts logical operations where we want to see the constant
    operand and potentially simplify the logical with other nearby logicals or
    shifts.
    
    We can reduce our reliance on mvconst_internal and generate better code for
    various cases by generating better initial code for logical operations.
    
    So let's assume we have a inclusive-or of a register with a nontrivial
    constant.  Right now we will load the nontrivial constant into a new pseudo
    (using multiple instructions), then emit a two register source ior 
operation.
    
    For some cases we can just generate the code we want at expansion time.
    Concretely let's take this testcase:
    
    > unsigned long foo(unsigned long src) { return src | 0x8800000000000007; }
    
    Right now we generate this code:
    
    >         li      a5,-15
    >         slli    a5,a5,59
    >         addi    a5,a5,7
    >         or      a0,a0,a5
    
    The first three instructions are synthesizing the constant.  The last
    instruction performs the desired operation.  But we can do better:
    
    >         ori     a0,a0,7
    >         bseti   a0,a0,59
    >         bseti   a0,a0,63
    
    Notice how we never even bother to synthesize the constant.
    
    IOR/XOR are pretty simple and this patch focuses exclusively on those. We 
use
    [x]ori to set whatever low 11 bits we need, then bset/binv for a small 
number
    of higher bits.  We use the cost of constant synthesis as our budget.
    
    We also support a couple special cases.  First, we might be able to rotate 
the
    source value such that all the bits we want to manipulate are in the low 11
    bits.  So we rotate the source, manipulate the bits, then rotate things 
back to
    where they belong.  I didn't see this trigger in spec, but I did trivially 
find
    a testcase where it was likely faster.
    
    Second, we can have cases where we want to invert most of the bits, but a 
small
    number are supposed to be preserved.  We can pre-flip the bits we want to
    preserve with binv, then invert the whole register with not (which puts the
    bits to be preserved back in their original state).
    
    I suspect there are likely a few more cases that could be improved, but the
    patch should stand on its own now and getting it out of the way allows us to
    focus on logical AND which is far tougher, but also more important in the 
task
    of removing mvconst_internal.
    
    As we're not removing mvconst_internal yet, this patch is mostly a nop. I 
did
    look at spec before/after and didn't see anything particular interesting.  I
    also temporarily removed mvconst_internal and looked at spec before/after to
    hopefully ensure we weren't missing anything obvious in the XOR/IOR cases.
    Obviously that latter test showed all kinds of regressions with AND.
    
    We're still working through implementation details on the AND case and
    determining what bridge patterns we're going to need to ensure we don't
    regress.   But this XOR/IOR patch is in good enough shape that it can go
    forward now.
    
    Naturally this has been run through my tester (bootstrap & regression test 
is
    in flight, but won't finish for many more hours).  Obviously I'm quite
    interested in anything spit out by the pre-commit CI system.
    
    gcc/
    
            * config/riscv/iterators.md (OPTAB): New iterator.
            * config/riscv/predicates.md (arith_or_zbs_operand): Remove.
            (reg_or_const_int_operand): New predicate.
            * config/riscv/riscv-protos.h (synthesize_ior_xor): Prototype.
            * config/riscv/riscv.cc (synthesize_ior_xor): New function.
            * config/riscv/riscv.md (ior/xor expander): Use synthesize_ior_xor.
    
    gcc/testsuite/
    
            * gcc.target/riscv/ior-synthesis-1.c: New test.
            * gcc.target/riscv/ior-synthesis-2.c: New test.
            * gcc.target/riscv/xor-synthesis-1.c: New test.
            * gcc.target/riscv/xor-synthesis-2.c: New test.
            * gcc.target/riscv/xor-synthesis-3.c: New test.
    
                Co-authored-by: Jeff Law  <j...@ventanamicro.com>

Diff:
---
 gcc/config/riscv/iterators.md                    |   3 +
 gcc/config/riscv/predicates.md                   |  12 +-
 gcc/config/riscv/riscv-protos.h                  |   1 +
 gcc/config/riscv/riscv.cc                        | 199 +++++++++++++++++++++++
 gcc/config/riscv/riscv.md                        |  11 +-
 gcc/testsuite/gcc.target/riscv/ior-synthesis-1.c |   8 +
 gcc/testsuite/gcc.target/riscv/ior-synthesis-2.c |   8 +
 gcc/testsuite/gcc.target/riscv/xor-synthesis-1.c |   8 +
 gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c |  10 ++
 gcc/testsuite/gcc.target/riscv/xor-synthesis-3.c |   8 +
 10 files changed, 258 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 214c20ba7b80..584b345f02c4 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -262,6 +262,9 @@
 
 (define_code_attr fix_uns [(fix "fix") (unsigned_fix "fixuns")])
 
+(define_code_attr OPTAB [(ior "IOR")
+                         (xor "XOR")])
+
 
 ;; -------------------------------------------------------------------
 ;; Code Attributes
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index c9a638cd1034..23690792b32e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -380,14 +380,6 @@
   (and (match_code "const_int")
        (match_test "SINGLE_BIT_MASK_OPERAND (UINTVAL (op))")))
 
-;; Register, small constant or single bit constant for use in
-;; bseti/binvi.
-(define_predicate "arith_or_zbs_operand"
-  (ior (match_operand 0 "const_arith_operand")
-       (match_operand 0 "register_operand")
-       (and (match_test "TARGET_ZBS")
-           (match_operand 0 "single_bit_mask_operand"))))
-
 (define_predicate "not_single_bit_mask_operand"
   (and (match_code "const_int")
        (match_test "SINGLE_BIT_MASK_OPERAND (~UINTVAL (op))")))
@@ -689,3 +681,7 @@
 (define_predicate "bitpos_mask_operand"
   (and (match_code "const_int")
        (match_test "TARGET_64BIT ? INTVAL (op) == 63 : INTVAL (op) == 31")))
+
+(define_predicate "reg_or_const_int_operand"
+  (ior (match_operand 0 "const_int_operand")
+       (match_operand 0 "register_operand")))
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index b0d5bbb8570b..271a9a3228d8 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -140,6 +140,7 @@ extern void riscv_expand_sssub (rtx, rtx, rtx);
 extern void riscv_expand_ustrunc (rtx, rtx);
 extern void riscv_expand_sstrunc (rtx, rtx);
 extern int riscv_register_move_cost (machine_mode, reg_class_t, reg_class_t);
+extern bool synthesize_ior_xor (rtx_code, rtx [3]);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool 
*invert_ptr = 0);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a28e4f7abb87..8b77a3539bc5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -14210,6 +14210,205 @@ bool need_shadow_stack_push_pop_p ()
   return is_zicfiss_p () && riscv_save_return_addr_reg_p ();
 }
 
+/* Synthesize OPERANDS[0] = OPERANDS[1] CODE OPERANDS[2].
+
+    OPERANDS[0] and OPERANDS[1] will be a REG and may be the same
+    REG.
+
+    OPERANDS[2] is a CONST_INT.
+
+    CODE is IOR or XOR.
+
+    Return TRUE if the operation was fully synthesized and the caller
+    need not generate additional code.  Return FALSE if the operation
+    was not synthesized and the caller is responsible for emitting the
+    proper sequence.  */
+
+bool
+synthesize_ior_xor (rtx_code code, rtx operands[3])
+{
+  /* Trivial cases that don't need synthesis.  */
+  if (SMALL_OPERAND (INTVAL (operands[2]))
+     || ((TARGET_ZBS || TARGET_XTHEADBS || TARGET_ZBKB)
+        && single_bit_mask_operand (operands[2], word_mode)))
+    return false;
+
+  /* The number of instructions to synthesize the constant is a good
+     estimate of the budget.  That does not account for out of order
+     execution an fusion in the constant synthesis those would naturally
+     decrease the budget.  It also does not account for the IOR/XOR at
+     the end of the sequence which would increase the budget.  */
+  int budget = (TARGET_ZBS ? riscv_const_insns (operands[2], true) : -1);
+  int original_budget = budget;
+
+  /* Bits we need to set in operands[0].  As we synthesize the operation,
+     we clear bits in IVAL.  Once IVAL is zero, then synthesis of the
+     operation is complete.  */
+  unsigned HOST_WIDE_INT ival = INTVAL (operands[2]);
+  
+  /* Check if we want to use [x]ori. Then get the remaining bits
+     and decrease the budget by one. */
+  if ((ival & HOST_WIDE_INT_UC (0x7ff)) != 0)
+    {
+      ival &= ~HOST_WIDE_INT_UC (0x7ff);
+      budget--;
+    }
+
+  /* Check for bseti cases. For each remaining bit in ival,
+     decrease the budget by one. */
+  while (ival)
+    {
+      HOST_WIDE_INT tmpval = HOST_WIDE_INT_UC (1) << ctz_hwi (ival);
+      ival &= ~tmpval;
+      budget--;
+    }
+
+  /* If we're flipping all but a small number of bits we can pre-flip
+     the outliers, then flip all the bits, which would restore those
+     bits that were pre-flipped. */
+  if ((TARGET_ZBS || TARGET_XTHEADBS || TARGET_ZBKB)
+      && budget < 0
+      && code == XOR
+      && popcount_hwi (~INTVAL (operands[2])) < original_budget)
+    {
+      /* Pre-flipping bits we want to preserve.  */
+      rtx input = operands[1];
+      ival = ~INTVAL (operands[2]);
+      while (ival)
+       {
+         HOST_WIDE_INT tmpval = HOST_WIDE_INT_UC (1) << ctz_hwi (ival);
+         rtx x = GEN_INT (tmpval);
+         x = gen_rtx_XOR (word_mode, input, x);
+         emit_insn (gen_rtx_SET (operands[0], x));
+         input = operands[0];
+         ival &= ~tmpval;
+       }
+
+      /* Now flip all the bits, which restores the bits we were
+        preserving.  */
+      rtx x = gen_rtx_NOT (word_mode, input);
+      emit_insn (gen_rtx_SET (operands[0], x));
+      return true;
+    }
+
+  /* One more approach we can try.  If our budget is 3+ instructions,
+     then we can try to rotate the source so that the bits we want to
+     set are in the low 11 bits.  We then use [x]ori to set those low
+     bits, then rotate things back into their proper place.  */
+  if ((TARGET_ZBB || TARGET_XTHEADBB || TARGET_ZBKB)
+      && budget < 0
+      && popcount_hwi (INTVAL (operands[2])) <= 11
+      && riscv_const_insns (operands[2], true) >= 3)
+    {
+      ival = INTVAL (operands[2]);
+      /* First see if the constant trivially fits into 11 bits in the LSB.  */
+      int lsb = ctz_hwi (ival);
+      int msb = BITS_PER_WORD - 1 - clz_hwi (ival);
+      if (msb - lsb + 1 <= 11)
+       {
+         /* Rotate the source right by LSB bits.  */
+         rtx x = GEN_INT (lsb);
+         x = gen_rtx_ROTATERT (word_mode, operands[1], x);
+         emit_insn (gen_rtx_SET (operands[0], x));
+
+         /* Shift the constant right by LSB bits.  */
+         x = GEN_INT (ival >> lsb);
+
+         /* Perform the IOR/XOR operation.  */
+         x = gen_rtx_fmt_ee (code, word_mode, operands[0], x);
+         emit_insn (gen_rtx_SET (operands[0], x));
+
+         /* And rotate left to put everything back in place, we don't
+            have rotate left by a constant, so use rotate right by
+            an adjusted constant.  */
+         x = GEN_INT (BITS_PER_WORD - lsb);
+         x = gen_rtx_ROTATERT (word_mode, operands[1], x);
+         emit_insn (gen_rtx_SET (operands[0], x));
+         return true;
+       }
+
+      /* Maybe the bits are split between the high and low parts
+        of the constant.  A bit more complex, but still manageable.
+
+        Conceptually we want to rotate left the constant by the number
+        of leading zeros after masking off all but the low 11 bits.  */
+      int rotcount = clz_hwi (ival & 0x7ff) - (BITS_PER_WORD - 11);
+
+      /* Rotate the constant left by MSB bits.  */
+      ival = (ival << rotcount) | (ival >> (BITS_PER_WORD - rotcount));
+
+      /* Now we can do the same tests as before. */
+      lsb = ctz_hwi (ival);
+      msb = BITS_PER_WORD - clz_hwi (ival);
+      if ((INTVAL (operands[2]) & HOST_WIDE_INT_UC (0x7ff)) != 0
+         && msb - lsb + 1 <= 11)
+       {
+         /* Rotate the source left by ROTCOUNT bits, we don't have
+            rotate left by a constant, so use rotate right by an
+            adjusted constant.  */
+         rtx x = GEN_INT (BITS_PER_WORD - rotcount);
+         x = gen_rtx_ROTATERT (word_mode, operands[1], x);
+         emit_insn (gen_rtx_SET (operands[0], x));
+
+         /* We've already rotated the constant.  So perform the IOR/XOR
+            operation.  */
+         x = GEN_INT (ival);
+         x = gen_rtx_fmt_ee (code, word_mode, operands[0], x);
+         emit_insn (gen_rtx_SET (operands[0], x));
+
+         /* And rotate right to put everything into its proper place.  */
+         x = GEN_INT (rotcount);
+         x = gen_rtx_ROTATERT (word_mode, operands[0], x);
+         emit_insn (gen_rtx_SET (operands[0], x));
+         return true;
+       }
+    }
+
+  /* If after accounting for bseti the remaining budget has 
+     gone to less than zero, it forces the value into a
+     register and performs the IOR operation.  It returns
+     TRUE to the caller so the caller knows code generation
+     is complete. */
+  if (budget < 0)
+    {
+      rtx x = force_reg (word_mode, operands[2]); 
+      x = gen_rtx_fmt_ee (code, word_mode, operands[1], x);
+      emit_insn (gen_rtx_SET (operands[0], x));
+      return true;
+    }
+
+  /* Synthesis is better than loading the constant.  */
+  ival = INTVAL (operands[2]);
+  rtx input = operands[1];
+
+  /* Emit the [x]ori insn that sets the low 11 bits into
+     the proper state.  */
+  if ((ival & HOST_WIDE_INT_UC (0x7ff)) != 0)
+    {
+      rtx x = GEN_INT (ival & HOST_WIDE_INT_UC (0x7ff));
+      x = gen_rtx_fmt_ee (code, word_mode, input, x);
+      emit_insn (gen_rtx_SET (operands[0], x));
+      input = operands[0];
+      ival &= ~HOST_WIDE_INT_UC (0x7ff);
+    }
+
+  /* We figure out a single bit as a constant and
+     generate a CONST_INT node for that.  Then we 
+     construct the IOR node, then the SET node and 
+     emit it.  An IOR with a suitable constant that is
+     a single bit will be implemented with a bseti. */
+  while (ival)
+    {
+      HOST_WIDE_INT tmpval = HOST_WIDE_INT_UC (1) << ctz_hwi (ival);
+      rtx x = GEN_INT (tmpval);
+      x = gen_rtx_fmt_ee (code, word_mode, input, x);
+      emit_insn (gen_rtx_SET (operands[0], x));
+      input = operands[0];
+      ival &= ~tmpval;
+    }
+  return true;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 259997fef68f..154b49d55c56 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1767,8 +1767,15 @@
 (define_expand "<optab><mode>3"
   [(set (match_operand:X 0 "register_operand")
        (any_or:X (match_operand:X 1 "register_operand" "")
-                  (match_operand:X 2 "arith_or_zbs_operand" "")))]
-  "")
+                  (match_operand:X 2 "reg_or_const_int_operand" "")))]
+  ""
+
+{
+  /* If synthesis of the logical op is successful, then no further code
+     generation is necessary.  Else just generate code normally.  */
+  if (CONST_INT_P (operands[2]) && synthesize_ior_xor (<OPTAB>, operands))
+    DONE;
+})
 
 (define_insn "*<optab><mode>3"
   [(set (match_operand:X                0 "register_operand" "=r,r")
diff --git a/gcc/testsuite/gcc.target/riscv/ior-synthesis-1.c 
b/gcc/testsuite/gcc.target/riscv/ior-synthesis-1.c
new file mode 100644
index 000000000000..04644cddbdba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/ior-synthesis-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target { rv64 } } } */
+/* { dg-options "-march=rv64gb -mabi=lp64d" } */
+
+unsigned long foo(unsigned long src) { return src | 0x8c00000000000001; }
+
+/* { dg-final { scan-assembler-times "\\srori\t" 2 } } */
+/* { dg-final { scan-assembler-times "\\sori\t" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/riscv/ior-synthesis-2.c 
b/gcc/testsuite/gcc.target/riscv/ior-synthesis-2.c
new file mode 100644
index 000000000000..f28fe5e79b93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/ior-synthesis-2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target { rv64 } } } */
+/* { dg-options "-march=rv64gb -mabi=lp64d" } */
+
+unsigned long foo(unsigned long src) { return src | 0x8800000000000007; }
+
+/* { dg-final { scan-assembler-times "\\sbseti\t" 2 } } */
+/* { dg-final { scan-assembler-times "\\sori\t" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/riscv/xor-synthesis-1.c 
b/gcc/testsuite/gcc.target/riscv/xor-synthesis-1.c
new file mode 100644
index 000000000000..c630a79e4710
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xor-synthesis-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target { rv64 } } } */
+/* { dg-options "-march=rv64gb -mabi=lp64d" } */
+
+unsigned long foo(unsigned long src) { return src ^ 0xffffffffefffffffUL; }
+
+/* { dg-final { scan-assembler-times "\\sbinvi\t" 1 } } */
+/* { dg-final { scan-assembler-times "\\snot\t" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c 
b/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c
new file mode 100644
index 000000000000..25457d260750
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { rv64 } } } */
+/* { dg-options "-march=rv64gb -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
+
+unsigned long foo(unsigned long src) { return src ^ 0x8800000000000007; }
+
+/* xfailed until we remove mvconst_internal.  */
+/* { dg-final { scan-assembler-times "\\sbinvi\t" 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times "\\sxori\t" 1 { xfail *-*-* } } } */
+
diff --git a/gcc/testsuite/gcc.target/riscv/xor-synthesis-3.c 
b/gcc/testsuite/gcc.target/riscv/xor-synthesis-3.c
new file mode 100644
index 000000000000..765904b4a495
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xor-synthesis-3.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target { rv64 } } } */
+/* { dg-options "-march=rv64gb -mabi=lp64d" } */
+
+unsigned long foo(unsigned long src) { return src ^ 0x8c00000000000001; }
+
+/* { dg-final { scan-assembler-times "\\srori\t" 2 } } */
+/* { dg-final { scan-assembler-times "\\sxori\t" 1 } } */
+

Reply via email to