https://gcc.gnu.org/g:ea1a75bfd580788b6c1f0a772051797bbddedf82
commit r17-2097-gea1a75bfd580788b6c1f0a772051797bbddedf82 Author: Jeff Law <[email protected]> Date: Thu Jul 2 11:41:09 2026 -0600 [RISC-V] Improve ADD synthesis So testing passed for the V2 of this patch, but there were some minor issues I felt needed to be addressed. First in the new pattern, we can use %S to output the right value rather than recomputing it ourselves. Second the test was tightend up slightly by adding missing escapes. Finally a typos in the ChangeLog and a comment in bitmanip.md was fixed. I didn't go through a full test cycle on those changes, but I did test those on riscv32-elf and riscv64-elf with no regressions. Attached is the final patch I'm pushing to the trunk. -- So I instrumented the 3 ALU synthesis routines and built 502.gcc, then fed those results into some python code that allows me to compare instruction counts and total size for tests across LLVM and GCC. Naturally the idea was to see if there were cases we should handle but were missing. This fixes cases in add_synthesis. First we weren't utilizing add.uw, so there's a relatively small set of cases where we can take the original constant C, sign extend it from 32 to 64 bit resulting in C'. If C' is cheaper to synthesize than C, then we can load up C' into a GPR, then use add.uw. This (of course) requires the upper 32 bits of C to be zero and bit 31 to be on. The second case is for INT_MIN. Adding INT_MIN to a register ultimately just flips the uppermost bit and thus can be implemented with a binvi. Combine (of course) collapses the bit inversion case back into arithmetic. Given the result is just a binvi, this patch recognizes that special case as a new pattern. That has a secondary effect of fixing the xfail for xor-synthesis-2.c which was failing for precisely this reason. While exploring the logical space it also came to light that we should be using riscv_integer_cost rather than riscv_const_insns. The latter clamps at 3. So if we had C with cost 5 and C' with cost 4 and we can use either, we really want to use C', but didn't have a way to make that selection. Using riscv_integer_cost resolves that *and* we generate less junk RTL since we don't have to call GEN_INT so often. I haven't included testcase for that in this patch, but definitely will on the ior/xor/and space. At this time the synthesis side for addition looks good relative to LLVM, but sometimes combine is going to undo its work. I checked every case from that set where GCC has more instructions than LLVM and each and every one was a scenario where combine+mvconst_internal undid the early synthesis work. So just more reasons to keep pushing on that problem. I did add a special pattern for the INT_MIN case. That was trivial and since it collapses to a single insn with Zbs it seemed like the right thing to do in case combine discovers it from some other path. Both GCC and LLVM seem to be missing shNadd.uw support; after some head-banging I did manage to characterize some cases where shNadd.uw was unique enough to be useful. That exploration was ongoing when the latest test run fired up so that support will land in a later patch. I mentioned my evaluation also looked at code size differences. That brings in general constant synthesis and there's a significant cluster of cases where LLVM consistently does better (li|lui+shift sometimes encodes better than lui+addi). That's already being tracked in bugzilla. The other insight from this effort is that ADD, IOR, XOR are relatively minor when compared to AND. I'm filtering out simm12 constants because those are trivially handled. What was left was ~1k unique constants passed to AND. ~100 to ADD and ~100 to IOR/XOR. Point being the larger effort towards AND handling seems more likely to pay dividends. Given the larger set of primitives for AND it's no surprise we've already spent considerably more effort there. Tested on riscv32-elf and riscv64-elf with no regressions. Bootstrapped and regression tested on the K3 and c920 platforms. Waiting on pre-commit CI before pushing. gcc/ * config/riscv/bitmanip.md (xor_for_plus_minint): New pattern. * config/riscv/riscv.cc (synthesize_add): Handle INT_MIN as bit inversion. Add support for add.uw. Use riscv_integer_cost rather than riscv_const_insns. (synthesize_add_extended): Use riscv_integer_cost rather than riscv_const_insns. gcc/testsuite/ * gcc.target/riscv/add-synthesis-3.c: New test. * gcc.target/riscv/xor-synthesis-2.c: No longer xfail. Diff: --- gcc/config/riscv/bitmanip.md | 11 ++++++ gcc/config/riscv/riscv.cc | 44 +++++++++++++++++++----- gcc/testsuite/gcc.target/riscv/add-synthesis-3.c | 8 +++++ gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c | 5 ++- 4 files changed, 57 insertions(+), 11 deletions(-) diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md index 992e949a0990..be893f12d917 100644 --- a/gcc/config/riscv/bitmanip.md +++ b/gcc/config/riscv/bitmanip.md @@ -899,6 +899,17 @@ "<bit_optab>i\t%0,%1,%S2" [(set_attr "type" "bitmanip")]) +;; This form can be created by combine. +(define_insn "*xor_for_plus_minint" + [(set (match_operand:X 0 "register_operand" "=r") + (plus:X (match_operand:X 1 "register_operand" "r") + (match_operand 2 "const_int_operand")))] + "(TARGET_ZBS + && (INTVAL (operands[2]) + == sext_hwi ((HOST_WIDE_INT_1U << (BITS_PER_WORD - 1)), BITS_PER_WORD)))" + "binvi\t%0,%1,%S2" + [(set_attr "type" "bitmanip")]) + ;; We can easily handle zero extensions (define_split [(set (match_operand:DI 0 "register_operand") diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index d5eab5421318..09dc41930aa7 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -16166,10 +16166,20 @@ synthesize_add (rtx operands[3]) if (SMALL_OPERAND (INTVAL (operands[2]))) return false; - int budget1 = riscv_const_insns (operands[2], true); - int budget2 = riscv_const_insns (GEN_INT (-INTVAL (operands[2])), true); - HOST_WIDE_INT ival = INTVAL (operands[2]); + int budget1 = riscv_integer_cost (ival, true); + int budget2 = riscv_integer_cost (-ival, true); + + /* If the constant is MIN_INT for the target, then it's just a bit flip + of the highest bit. */ + HOST_WIDE_INT sextval = sext_hwi (HOST_WIDE_INT_1U << (BITS_PER_WORD - 1), + BITS_PER_WORD); + if (TARGET_ZBS && ival == sextval) + { + rtx x = gen_rtx_XOR (word_mode, operands[1], operands[2]); + emit_insn (gen_rtx_SET (operands[0], x)); + return true; + } /* If we can emit two addi insns then that's better than synthesizing the constant into a temporary, then adding the temporary to the @@ -16200,11 +16210,11 @@ synthesize_add (rtx operands[3]) ival = INTVAL (operands[2]); if (TARGET_ZBA && (((ival % 2) == 0 && budget1 - > riscv_const_insns (GEN_INT (ival >> 1), true)) + > riscv_integer_cost (ival >> 1, true)) || ((ival % 4) == 0 && budget1 - > riscv_const_insns (GEN_INT (ival >> 2), true)) + > riscv_integer_cost (ival >> 2, true)) || ((ival % 8) == 0 && budget1 - > riscv_const_insns (GEN_INT (ival >> 3), true)))) + > riscv_integer_cost (ival >> 3, true)))) { // Load the shifted constant into a temporary int shct = ctz_hwi (ival); @@ -16225,6 +16235,24 @@ synthesize_add (rtx operands[3]) return true; } + /* If the constant has the upper 32 bits clear and if after sign + extension from 32 to 64 bits it's synthesizable cheaply, + then synthesize C' and use add.uw. */ + if ((TARGET_64BIT && TARGET_ZBA) + && (ival & HOST_WIDE_INT_UC (0xffffffff00000000)) == 0 + && riscv_integer_cost (sext_hwi (ival, 32), true) < budget1) + { + /* Load the sign extended constant into a temporary. */ + rtx tempreg = force_reg (word_mode, GEN_INT (sext_hwi (ival, 32))); + + /* Add the zero-extended temporary to the other input to construct + the add.uw insn. */ + rtx x = gen_rtx_ZERO_EXTEND (word_mode, gen_lowpart (SImode, tempreg)); + x = gen_rtx_PLUS (word_mode, x, operands[1]); + emit_insn (gen_rtx_SET (operands[0], x)); + return true; + } + /* If the negated constant is cheaper than the original, then negate the constant and use sub. */ if (budget2 < budget1) @@ -16272,8 +16300,8 @@ synthesize_add_extended (rtx operands[3]) return false; HOST_WIDE_INT ival = INTVAL (operands[2]); - int budget1 = riscv_const_insns (operands[2], true); - int budget2 = riscv_const_insns (GEN_INT (-INTVAL (operands[2])), true); + int budget1 = riscv_integer_cost (INTVAL (operands[2]), true); + int budget2 = riscv_integer_cost (-UINTVAL (operands[2]), true); /* If operands[2] can be split into two 12-bit signed immediates, split add into two adds. */ diff --git a/gcc/testsuite/gcc.target/riscv/add-synthesis-3.c b/gcc/testsuite/gcc.target/riscv/add-synthesis-3.c new file mode 100644 index 000000000000..ffed5735f3d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/add-synthesis-3.c @@ -0,0 +1,8 @@ +/* { dg-do compile { target rv64 } } */ +/* { dg-options "-march=rv64gcb -mabi=lp64d" } */ + +long F1 (long x) { return x + 0xffffffff; } +long F2 (long x) { return x + (1UL << (sizeof (long) * 8 - 1) ); } + +/* { dg-final { scan-assembler-times "\\tadd\\.uw\\t" 1 } } */ +/* { dg-final { scan-assembler-times "\\tbinvi\\t" 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c b/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c index 25457d260750..b250cc2e6d6c 100644 --- a/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c +++ b/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c @@ -4,7 +4,6 @@ unsigned long foo(unsigned long src) { return src ^ 0x8800000000000007; } -/* xfailed until we remove mvconst_internal. */ -/* { dg-final { scan-assembler-times "\\sbinvi\t" 2 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "\\sxori\t" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "\\sbinvi\t" 2 } } */ +/* { dg-final { scan-assembler-times "\\sxori\t" 1 } } */
