https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511
Bug ID: 66511 Summary: [avr] whole-byte shifts not optimized away for uint64_t Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl Target Milestone: --- When doing whole-byte shifts, gcc usually optimizes away the shifts and ends up moving data between registers instead. However, it seems this doesn't happen when uint64_t is used. Here's an example (assembler output slightly trimmed of unrelated comments and annotations etc.): matthijs@grubby:~$ cat test.cpp #include <stdint.h> uint8_t foo64_8(uint64_t a) { return a >> 8; } uint16_t foo64_16(uint64_t a) { return a >> 8; } uint8_t foo32_8(uint32_t a) { return a >> 8; } uint16_t foo32_16(uint32_t a) { return (a >> 8); } matthijs@grubby:~$ avr-gcc -Os test.cpp -S -o - _Z7foo64_8y: push r16 ldi r16,lo8(8) rcall __lshrdi3 mov r24,r18 pop r16 ret _Z8foo64_16y: push r16 ldi r16,lo8(8) rcall __lshrdi3 mov r24,r18 mov r25,r19 pop r16 ret _Z7foo32_8m: mov r24,r23 ret _Z8foo32_16m: clr r27 mov r26,r25 mov r25,r24 mov r24,r23 ret .ident "GCC: (GNU) 4.9.2 20141224 (prerelease)" The output is identical for 4.8.1 on Debian, and the above 4.9.2 on Arch. I haven't found a readily available 5.x package yet to test. As you can see, the versions operating on 64 bit values preserve the 8-bit shift (which is very inefficient on AVR), while the versions running on 32 bit values simply copy the right registers. The foo32_16 function still has some useless instructions (r27 and r26 are not part of the return value, not sure why these are set) but that is probably an unrelated problem. I've marked this with component "target", since I think these optimizations are avr-specific (or at least not applicable to bigger architectures).