The following C function:

unsigned long long foo( unsigned a, unsigned b )
{
   return a | ((unsigned long long) b << 32 );
}

is actually a no-op on a 32-bit ARM target. Argument a is received in r0,
argument b is received in r1 and the returned long long's lower half should be
in r0 and the upper half should be in r1. Yet with -O2, -O3 or -Os the compiler
generates the following code; the comments tell you what it does if you don't
know ARM assembly:

   mov     r3, #0             // tmp1 = 0
   orr     r3, r3, r0         // tmp1 |= a
   str     r4, [sp, #-4]!     // push a register to create tmp2
   mov     r0, r3             // return value low = tmp1
   mov     r4, r1             // tmp2 = b
   ldmfd   sp!, {r4}          // (void) tmp2, pop the register
   bx      lr                 // return

While it is significantly better than what 4.3.x and earlier generated (they
saved 3 registers and went through the OR with 0 business for both a and b),
it is still 7 instructions and 2 memory accesses instead of the required 1
instruction and 0 memory access. 

Furthermore, it is interesting that the compiler decides to use r4, which must
be saved, when r2 and r12 are not callee-saved registers and could be used
freely as temporaries.

It should also be noted that this problem is ARM instruction set specific. If
the compiler generates code for the THUMB set, then nothing of the above is
there and the function compiles, as expected, to a single

   bx   lr

instruction.


-- 
           Summary: Very inefficient code generated
           Product: gcc
           Version: 4.4.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: zoltan at bendor dot com dot au
  GCC host triplet: x86-elf-linux
GCC target triplet: arm-elf-unknown


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41366

Reply via email to