The following C function:
unsigned long long foo( unsigned a, unsigned b )
{
return a | ((unsigned long long) b << 32 );
}
is actually a no-op on a 32-bit ARM target. Argument a is received in r0,
argument b is received in r1 and the returned long long's lower half should be
in r0 and the upper half should be in r1. Yet with -O2, -O3 or -Os the compiler
generates the following code; the comments tell you what it does if you don't
know ARM assembly:
mov r3, #0 // tmp1 = 0
orr r3, r3, r0 // tmp1 |= a
str r4, [sp, #-4]! // push a register to create tmp2
mov r0, r3 // return value low = tmp1
mov r4, r1 // tmp2 = b
ldmfd sp!, {r4} // (void) tmp2, pop the register
bx lr // return
While it is significantly better than what 4.3.x and earlier generated (they
saved 3 registers and went through the OR with 0 business for both a and b),
it is still 7 instructions and 2 memory accesses instead of the required 1
instruction and 0 memory access.
Furthermore, it is interesting that the compiler decides to use r4, which must
be saved, when r2 and r12 are not callee-saved registers and could be used
freely as temporaries.
It should also be noted that this problem is ARM instruction set specific. If
the compiler generates code for the THUMB set, then nothing of the above is
there and the function compiles, as expected, to a single
bx lr
instruction.
--
Summary: Very inefficient code generated
Product: gcc
Version: 4.4.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: zoltan at bendor dot com dot au
GCC host triplet: x86-elf-linux
GCC target triplet: arm-elf-unknown
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41366