On 10/28/23 07:05, Roger Sayle wrote:

This patch improves the code generated for X << 1 (and for X + X) when
X is 64-bit DImode, using the same two instruction code sequence used
for DImode addition.

For the test case:

long long foo(long long x) { return x << 1; }

GCC -O2 currently generates the following code:

foo:    lsr     r2,r0,31
         asl_s   r1,r1,1
         asl_s   r0,r0,1
         j_s.d   [blink]
         or_s    r1,r1,r2

and on CPU without a barrel shifter, i.e. -mcpu=em

foo:    add.f   0,r0,r0
         asl_s   r1,r1
         rlc     r2,0
         asl_s   r0,r0
         j_s.d   [blink]
         or_s    r1,r1,r2

with this patch (both with and without a barrel shifter):

foo:    add.f   r0,r0,r0
         j_s.d   [blink]
         adc     r1,r1,r1

[For Jeff Law's benefit a similar optimization is also applicable to
H8300H, that could also use a two instruction sequence (plus rts) but
currently GCC generates 16 instructions (plus an rts) for foo above.]

Tested with a cross-compiler to arc-linux hosted on x86_64,
with no new (compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's nightly testing?
WRT H8. Bug filed so we don't lose track of it. We don't have DImode operations defined on the H8. First step would be DImode loads/stores and basic arithmetic.

Jeff

Reply via email to