On Wed, Oct 8, 2025 at 8:57 PM Jeff Law <[email protected]> wrote: > > > > On 10/8/25 12:54 PM, Vineet Gupta wrote: > > > > > > On 10/8/25 09:47, Shreya Munnangi wrote: > >> In pr120811, we have cases where GCC is emitting an extra |addi| > >> instruction > >> instead of using the 12-bit signed-immediate of |ld|. > >> |addi t1, t1, 1 ld t1, 0(t1) | > >> > >> This problem occurs when |fp -> sp+offset| elimination results in an > >> out-of-range constant and we generate an address reload in LRA using > >> |addsi|/|adddi| expanders. > >> > >> We've already adjusted the expanders to widen the set of valid operands to > >> allow more constants for the 2nd input operand. These expanders, rather > >> than > >> constructing the constant into a register and using an |add| instruction, > >> will > >> generate two |addi| instructions (or |shNadd|) during initial RTL > >> generation. > >> > >> We define a new pattern for cases where we need to access the current frame > >> and the offsets are too large. This gets reasonable code out of LRA in a > >> form > >> |fold-mem-offsets| can handle, rather than having to wait for |sched2| to > >> do > >> the height reduction transformation and leaving in the unnecessary |add| > >> instruction in the RTL stream. > >> > >> To avoid the two |addi| instructions being squashed back together in the > >> post-reload combine, we remove the |adddi3_const_sum_of_two_s12| pattern. > >> > > > > RIP |adddi3_const_sum_of_two_s12, you served well :-)| > It definitely served us well for its relatively brief time in the tree. > > > > > > > So it seems -4096 is a special little snowflake. > > > > Prev: > > minus1: > > addi a0,a0,-2048 > > addi a0,a0,-2048 > > ret > > > > W/o Patch: > > minus1: > > li a5,-4096 > > add a0,a0,a5 > > ret > It is. I consider the newer code marginally better on OOO cores. > Essentially the li+add variant has one less data dependency than the > addi+addi variant. So the "li" can execute whenever it's convenient in > an OOO core. In contrast the first addi in the old code has to wait for > its input operand to become available. In an ideal case the li would > bubble up into an unused execution slot and execute "for free".
And of course the compiler is more likely to be able to reschedule the li, so it helps with in-order cores, too. (Modulo register pressure, that is. But my sense is that the scheduling opportunity is more often more important than the extra live register.) > > It probably doesn't matter in any meaningful way. > > > > > We are burning an extra register, perhaps something to improve for future ? > > > >> TM (4100) > >> TM (8200) > >> > >> diff --git a/gcc/testsuite/gcc.target/riscv/pr120811.c > > b/gcc/testsuite/gcc.target/riscv/pr120811.c > > > > This causes a collision as the the file already sneaked into upstream from > > commit > > afdf44154fd "(RISC-V: Only Save/Restore required registers for > > ILP32E/LP64E)" > Yea, I think that was a mistake of mine. I'll clean that up. > > Jeff
