https://gcc.gnu.org/g:68251c8c320a33ea36c4b16f50e67d12fa404908

commit 68251c8c320a33ea36c4b16f50e67d12fa404908
Author: Jovan Vukic <jovan.vu...@rt-rk.com>
Date:   Sun Sep 29 10:06:43 2024 -0600

    [PATCH v2] RISC-V: Improve code generation for select of consecutive 
constants
    
    Based on the valuable feedback I received, I decided to implement the patch
    in the RTL pipeline. Since a similar optimization already exists in
    simplify_binary_operation_1, I chose to generalize my original approach
    and place it directly below that code.
    
    The expression (X xor C1) + C2 is simplified to X xor (C1 xor C2) under
    the conditions described in the patch. This is a more general optimization,
    but it still applies to the RISC-V case, which was my initial goal:
    
    long f1(long x, long y) {
        return (x > y) ? 2 : 3;
    }
    
    Before the patch, the generated assembly is:
    
    f1(long, long):
            sgt     a0,a0,a1
            xori    a0,a0,1
            addi    a0,a0,2
            ret
    
    After the patch, the generated assembly is:
    
    f1(long, long):
            sgt     a0,a0,a1
            xori    a0,a0,3
            ret
    
    The patch optimizes cases like x LT/GT y ? 2 : 3 (and x GE/LE y ? 3 : 2),
    as initially intended. Since this optimization is more general, I noticed
    it also optimizes cases like x < CONST ? 3 : 2 when CONST < 0. I’ve added
    tests for these cases as well.
    
    A bit of logic behind the patch: The equality A + B == A ^ B + 2 * (A & B)
    always holds true. This can be simplified to A ^ B if 2 * (A & B) == 0.
    In our case, we have A == X ^ C1, B == C2 and X is either 0 or 1.
    
            PR target/108038
    
    gcc/ChangeLog:
    
            * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): 
New
            simplification.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/riscv/slt-1.c: New test.
    
    (cherry picked from commit a0f1f504b2c49a3695b91d3323d2e2419ef970db)

Diff:
---
 gcc/simplify-rtx.cc                    | 12 +++++++
 gcc/testsuite/gcc.target/riscv/slt-1.c | 59 ++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 260c77584de2..6c2ea43607a3 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -2979,6 +2979,18 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
                                    simplify_gen_binary (XOR, mode, op1,
                                                         XEXP (op0, 1)));
 
+      /* (plus (xor X C1) C2) is (xor X (C1^C2)) if X is either 0 or 1 and
+        2 * ((X ^ C1) & C2) == 0; based on A + B == A ^ B + 2 * (A & B). */
+      if (CONST_SCALAR_INT_P (op1)
+         && GET_CODE (op0) == XOR
+         && CONST_SCALAR_INT_P (XEXP (op0, 1))
+         && nonzero_bits (XEXP (op0, 0), mode) == 1
+         && 2 * (INTVAL (XEXP (op0, 1)) & INTVAL (op1)) == 0
+         && 2 * ((1 ^ INTVAL (XEXP (op0, 1))) & INTVAL (op1)) == 0)
+       return simplify_gen_binary (XOR, mode, XEXP (op0, 0),
+                                   simplify_gen_binary (XOR, mode, op1,
+                                                        XEXP (op0, 1)));
+
       /* Canonicalize (plus (mult (neg B) C) A) to (minus A (mult B C)).  */
       if (!HONOR_SIGN_DEPENDENT_ROUNDING (mode)
          && GET_CODE (op0) == MULT
diff --git a/gcc/testsuite/gcc.target/riscv/slt-1.c 
b/gcc/testsuite/gcc.target/riscv/slt-1.c
new file mode 100644
index 000000000000..29a640660810
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/slt-1.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+#include <stdint.h>
+
+#define COMPARISON(TYPE, OP, OPN, RESULT_TRUE, RESULT_FALSE) \
+    TYPE test_##OPN(TYPE x, TYPE y) { \
+        return (x OP y) ? RESULT_TRUE : RESULT_FALSE; \
+    }
+
+/* Signed comparisons */
+COMPARISON(int64_t, >, GT1, 2, 3)
+COMPARISON(int64_t, >, GT2, 5, 6)
+
+COMPARISON(int64_t, <, LT1, 2, 3)
+COMPARISON(int64_t, <, LT2, 5, 6)
+
+COMPARISON(int64_t, >=, GE1, 3, 2)
+COMPARISON(int64_t, >=, GE2, 6, 5)
+
+COMPARISON(int64_t, <=, LE1, 3, 2)
+COMPARISON(int64_t, <=, LE2, 6, 5)
+
+/* Unsigned comparisons */
+COMPARISON(uint64_t, >, GTU1, 2, 3)
+COMPARISON(uint64_t, >, GTU2, 5, 6)
+
+COMPARISON(uint64_t, <, LTU1, 2, 3)
+COMPARISON(uint64_t, <, LTU2, 5, 6)
+
+COMPARISON(uint64_t, >=, GEU1, 3, 2)
+COMPARISON(uint64_t, >=, GEU2, 6, 5)
+
+COMPARISON(uint64_t, <=, LEU1, 3, 2)
+COMPARISON(uint64_t, <=, LEU2, 6, 5)
+
+#define COMPARISON_IMM(TYPE, OP, OPN, RESULT_TRUE, RESULT_FALSE) \
+    TYPE testIMM_##OPN(TYPE x) { \
+        return (x OP -3) ? RESULT_TRUE : RESULT_FALSE; \
+    }
+
+/* Signed comparisons with immediate */
+COMPARISON_IMM(int64_t, >, GT1, 3, 2)
+
+COMPARISON_IMM(int64_t, <, LT1, 2, 3)
+
+COMPARISON_IMM(int64_t, >=, GE1, 3, 2)
+
+COMPARISON_IMM(int64_t, <=, LE1, 2, 3)
+
+/* { dg-final { scan-assembler-times "sgt\\t" 4 } } */
+/* { dg-final { scan-assembler-times "sgtu\\t" 4 } } */
+/* { dg-final { scan-assembler-times "slt\\t" 4 } } */
+/* { dg-final { scan-assembler-times "sltu\\t" 4 } } */
+/* { dg-final { scan-assembler-times "slti\\t" 4 } } */
+/* { dg-final { scan-assembler-times "xori\\ta0,a0,1" 8 } } */
+/* { dg-final { scan-assembler-times "xori\\ta0,a0,3" 12 } } */
+

Reply via email to