On 9/23/25 1:25 PM, Shreya Munnangi wrote:
In pr120811, we have cases where GCC is emitting an extra |addi| instruction
instead of using the 12-bit signed-immediate of |ld|.
|addi t1, t1, 1 ld t1, 0(t1) |
This problem occurs when |fp -> sp+offset| elimination results in an
out-of-range constant and we generate an address reload in LRA using
|addsi|/|adddi| expanders.
We've already adjusted the expanders to widen the set of valid operands to
allow more constants for the 2nd input operand. These expanders, rather than
constructing the constant into a register and using an |add|
instruction, will
generate two |addi| instructions (or |shNadd|) during initial RTL
generation.
We define a new pattern for cases where we need to access the current frame
and the offsets are too large. This gets reasonable code out of LRA in a
form
|fold-mem-offsets| can handle, rather than having to wait for |sched2| to do
the height reduction transformation and leaving in the unnecessary |add|
instruction in the RTL stream.
To avoid the two |addi| instructions being squashed back together in the
post-reload combine, we remove the |adddi3_const_sum_of_two_s12| pattern.
We are seeing about 100 billion dynamic instructions saved which is about 5%
on cactuBSSN and a 2% improvement in performance on the BPI.
PR target/120811
gcc/
* config/riscv/riscv.cc (synthesize_add): Exchange constant terms when
generating addi pairs.
(synthesize_addsi): Similarly.
* config/riscv/riscv.md (addptr<mode>3): New define_expand.
(*add<mode>3_const_sum_of_two_s12): Remove pattern.
gcc/testsuite/
* gcc.target/riscv/add-synthesis-1.c: Adjust const to fit in range.
* gcc.target/riscv/pr120811.c: Add new test case.
* gcc.target/riscv/sum-of-two-s12-const-1.c: Adjust const to fit in
range.
I've added a comment per Vineet's suggestion and ixed the pr120811
testcase which was inadvertently added a little while back to match what
you posted and pushed this to the trunk!
Thanks again. 100b instructions saved!
Jeff