Compile the following code with options -march=armv7-a -mthumb -Os
extern void foo(int*);
void tr(int array[], int n)
{
int i;
for (i=0; i<n; i++)
foo(&array[i]);
}
GCC 4.6 generates:
push {r4, r5, r6, lr}
mov r6, r1
mov r5, r0
movs r4, #0
b .L2
.L3:
mov r0, r5
adds r4, r4, #1
bl foo
adds r5, r5, #4
.L2:
cmp r4, r6
blt .L3
pop {r4, r5, r6, pc}
We can see that both r4 and r5 are loop induction variables, and r4 is used for
loop counter only. So we can transform it to
push {r4, r5, r6, lr}
mov r5, r0
add r6, r5, r1 << 2
b .L2
.L3:
mov r0, r5
bl foo
adds r5, r5, #4
.L2:
cmp r5, r6
blt .L3
pop {r4, r5, r6, pc}
This new code is shorter and faster than original result, it uses one less
register at the same time.
Both tree-ssa and rtl loop optimizations missed this optimization.
--
Summary: Missed induction variable optimization
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: carrot at google dot com
GCC build triplet: i686-linux
GCC host triplet: i686-linux
GCC target triplet: arm-eabi
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45098