------- Comment #2 from ramana at gcc dot gnu dot org 2009-07-28 08:29 ------- (In reply to comment #0) > Consider the following code: > > int (*indirect_func)(); > > int indirect_call() > { > return indirect_func(); > } > > gcc 4.4.0 generates the following with -O2 -mcpu=cortex-a8 -S: > > indirect_call: > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > movw r3, #:lower16:indirect_func > stmfd sp!, {r4, lr} > movt r3, #:upper16:indirect_func > mov lr, pc > ldr pc, [r3, #0] > ldmfd sp!, {r4, pc} > > The problem is that the instruction "ldr pc, [r3, #0]" is not considered a > function call by the Cortex-A8's branch predictor, as noted in DDI0344J > section > 5.2.1, Return stack predictions. Thus, the return from the called function is > mispredicted resulting in a penalty of 13 cycles compared to a direct call > > Rather than doing > mov lr, pc > ldr pc, [r3] > it should instead use the blx instruction as so: > ldr lr, [r3] > blx lr > which is considered a function call by the branch predictor, and has an > overhead of only one cycle compared to a direct call.
The point made is correct but there is something you've missed in your patch ! loading lr with the address of the function you want to call, destroys the return address ,- so your code is never going to return ! Instead you want - ldr r3,[r3] blx r3 Or better still bx r3 but that is PR19599 :) > > gcc -v: > Using built-in specs. > Target: arm-none-linux-gnueabi > Configured with: ../gcc-4.4.0/configure --target=arm-none-linux-gnueabi > --prefix=/usr/local/arm --enable-threads > --with-sysroot=/usr/local/arm/arm-none-linux-gnueabi/libc > Thread model: posix > gcc version 4.4.0 (GCC) > -- ramana at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|0000-00-00 00:00:00 |2009-07-28 08:29:41 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40887