[Bug middle-end/40887] New: GCC generates suboptimal code for indirect function calls on ARM

2009-07-27 Thread lessen42 at gmail dot com
Consider the following code:

int (*indirect_func)();

int indirect_call()
{
return indirect_func();
}

gcc 4.4.0 generates the following with -O2 -mcpu=cortex-a8 -S:

indirect_call:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
movwr3, #:lower16:indirect_func
stmfd   sp!, {r4, lr}
movtr3, #:upper16:indirect_func
mov lr, pc
ldr pc, [r3, #0]
ldmfd   sp!, {r4, pc}

The problem is that the instruction "ldr pc, [r3, #0]" is not considered a
function call by the Cortex-A8's branch predictor, as noted in DDI0344J section
5.2.1, Return stack predictions. Thus, the return from the called function is
mispredicted resulting in a penalty of 13 cycles compared to a direct call.

Rather than doing
mov lr, pc
ldr pc, [r3]
it should instead use the blx instruction as so:
ldr lr, [r3]
blx lr
which is considered a function call by the branch predictor, and has an
overhead of only one cycle compared to a direct call.

gcc -v:
Using built-in specs.
Target: arm-none-linux-gnueabi
Configured with: ../gcc-4.4.0/configure --target=arm-none-linux-gnueabi
--prefix=/usr/local/arm --enable-threads
--with-sysroot=/usr/local/arm/arm-none-linux-gnueabi/libc
Thread model: posix
gcc version 4.4.0 (GCC)


-- 
   Summary: GCC generates suboptimal code for indirect function
calls on ARM
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: lessen42 at gmail dot com
  GCC host triplet: i386-apple-darwin
GCC target triplet: arm-none-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40887



[Bug middle-end/40887] GCC generates suboptimal code for indirect function calls on ARM

2009-07-27 Thread lessen42 at gmail dot com


--- Comment #1 from lessen42 at gmail dot com  2009-07-28 05:14 ---
Created an attachment (id=18261)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18261&action=view)
Use blx for indirect function calls on armv5+

This fixes the test case and the obvious cases of this I found in x264; there
may be more instances of not calling/returning from a function that doesn't
match Cortex-A8 and A9's branch predictors (and maybe more)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40887