The following C function: unsigned long long foo( unsigned a, unsigned b ) { return a | ((unsigned long long) b << 32 ); }
is actually a no-op on a 32-bit ARM target. Argument a is received in r0, argument b is received in r1 and the returned long long's lower half should be in r0 and the upper half should be in r1. Yet with -O2, -O3 or -Os the compiler generates the following code; the comments tell you what it does if you don't know ARM assembly: mov r3, #0 // tmp1 = 0 orr r3, r3, r0 // tmp1 |= a str r4, [sp, #-4]! // push a register to create tmp2 mov r0, r3 // return value low = tmp1 mov r4, r1 // tmp2 = b ldmfd sp!, {r4} // (void) tmp2, pop the register bx lr // return While it is significantly better than what 4.3.x and earlier generated (they saved 3 registers and went through the OR with 0 business for both a and b), it is still 7 instructions and 2 memory accesses instead of the required 1 instruction and 0 memory access. Furthermore, it is interesting that the compiler decides to use r4, which must be saved, when r2 and r12 are not callee-saved registers and could be used freely as temporaries. It should also be noted that this problem is ARM instruction set specific. If the compiler generates code for the THUMB set, then nothing of the above is there and the function compiles, as expected, to a single bx lr instruction. -- Summary: Very inefficient code generated Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: zoltan at bendor dot com dot au GCC host triplet: x86-elf-linux GCC target triplet: arm-elf-unknown http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41366