https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

            Bug ID: 66511
           Summary: [avr] whole-byte shifts not optimized away for
                    uint64_t
           Product: gcc
           Version: 4.8.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matthijs at stdin dot nl
  Target Milestone: ---

When doing whole-byte shifts, gcc usually optimizes away the shifts and
ends up moving data between registers instead. However, it seems this
doesn't happen when uint64_t is used.

Here's an example (assembler output slightly trimmed of unrelated
comments and annotations etc.):

matthijs@grubby:~$ cat test.cpp
#include <stdint.h>

uint8_t foo64_8(uint64_t a) {
        return a >> 8;
}

uint16_t foo64_16(uint64_t a) {
        return a >> 8;
}

uint8_t foo32_8(uint32_t a) {
        return a >> 8;
}

uint16_t foo32_16(uint32_t a) {
        return (a >> 8);
}

matthijs@grubby:~$ avr-gcc -Os test.cpp -S -o -
_Z7foo64_8y:
        push r16
        ldi r16,lo8(8)
        rcall __lshrdi3
        mov r24,r18
        pop r16
        ret

_Z8foo64_16y:
        push r16
        ldi r16,lo8(8)
        rcall __lshrdi3
        mov r24,r18
        mov r25,r19
        pop r16
        ret


_Z7foo32_8m:
        mov r24,r23
        ret

_Z8foo32_16m:
        clr r27
        mov r26,r25
        mov r25,r24
        mov r24,r23
        ret

        .ident  "GCC: (GNU) 4.9.2 20141224 (prerelease)"

The output is identical for 4.8.1 on Debian, and the above 4.9.2 on
Arch. I haven't found a readily available 5.x package yet to test.

As you can see, the versions operating on 64 bit values preserve the
8-bit shift (which is very inefficient on AVR), while the versions
running on 32 bit values simply copy the right registers.

The foo32_16 function still has some useless instructions (r27 and r26
are not part of the return value, not sure why these are set) but that
is probably an unrelated problem.

I've marked this with component "target", since I think these
optimizations are avr-specific (or at least not applicable to bigger
architectures).

Reply via email to