https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #29 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #28)
> With my latest patch I bootstrapped a configuration with
> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
> --with-float=hard
>
> I noticed a single regression in gcc.target/arm/pr53447-*.c
>
> That is caused by disabling the adddi3 expansion.
>
> void t0p(long long * p)
> {
> *p += 0x100000001;
> }
>
> used to get compiled to this at -O2:
>
> ldrd r2, [r0]
> adds r2, r2, #1
> adc r3, r3, #1
> strd r2, [r0]
> bx lr
>
> but without the adddi3 pattern I have at -O2:
>
> ldr r3, [r0]
> ldr r1, [r0, #4]
> cmn r3, #1
> add r3, r3, #1
> movcc r2, #0
> movcs r2, #1
> add r1, r1, #1
> str r3, [r0]
> add r3, r2, r1
> str r3, [r0, #4]
> bx lr
That's because your patch disables adddi3 completely, which is not correct. We
want to use the existing integer sequence, just expanded earlier. Instead of
your change, removing the "&& reload_completed" from the arm_adddi3 instruction
means we expand before register allocation:
ldr r3, [r0]
ldr r2, [r0, #4]
adds r3, r3, #1
str r3, [r0]
adc r2, r2, #16
str r2, [r0, #4]
bx lr
> Note that also the ldrd instructions are not there.
Yes that's yet another bug...
> I think this is the effect on the ldrd that you already mentioned,
> and it gets worse when the expansion breaks the di registers up
> into two si registers.
Indeed, splitting early means we end up with 2 loads. However in most cases we
should be able to gather the loads and emit LDRD/STRD on Thumb-2 (ARM's
LDRD/STRD is far more limited so not as useful). Combine could help with
merging 2 loads/stores into a single instruction.