https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89332
Bug ID: 89332
Summary: Missed detection of dead stores to array in a loop
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: prathamesh3492 at gcc dot gnu.org
Target Milestone: ---
Hi,
For the following test-case:
#define ARR_MAX 6
__attribute__((const)) int f(int);
int foo()
{
int arr[ARR_MAX];
for (int i = 0; i < ARR_MAX; i++)
arr[i] = f(i);
return arr[0];
}
With -O3, gcc generates call to f(i) and store to arr[i] on every iteration,
while clang detects the stores to arr are dead (except for arr[0]), removes the
loop and emits a tail-call to f(0).
aarch64-linux-gnu-gcc -O3:
foo:
.LFB0:
.cfi_startproc
stp x29, x30, [sp, -64]!
.cfi_def_cfa_offset 64
.cfi_offset 29, -64
.cfi_offset 30, -56
mov x29, sp
stp x19, x20, [sp, 16]
.cfi_offset 19, -48
.cfi_offset 20, -40
add x20, sp, 40
mov w19, 0
.p2align 3,,7
.L2:
mov w0, w19
bl f
str w0, [x20], 4
add w19, w19, 1
cmp w19, 6
bne .L2
ldr w0, [sp, 40]
ldp x19, x20, [sp, 16]
ldp x29, x30, [sp], 64
.cfi_restore 30
.cfi_restore 29
.cfi_restore 19
.cfi_restore 20
.cfi_def_cfa_offset 0
ret
clang -O3 --target=aarch64-linux-gnu:
foo: // @foo
// %bb.0:
mov w0, wzr
b f
It seems, clang takes advantage of loop unrolling for the above-case,
while gcc doesn't seem to. After increasing ARR_MAX from 6 to 512, clang
generates same/similar code as gcc.
I doubt tho if such code is written in practice or can result due to
abstraction lowering ? It was just a contrived test-case I made up.
Thanks,
Prathamesh