https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526
Bug ID: 69526
Summary: ivopts candidate strangeness
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: minor
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: rdapp at linux dot vnet.ibm.com
Target Milestone: ---
Target: s390, x86
While inspecting loop generation code on s390, I saw ivopts choose an IV
candidate which does something peculiar. On x86 the same candidate is never
chosen because of different cost estimations, yet the behavior can be evoked on
x86, through "forcing" GCC to use the same candidate as on s390. I did this to
confirm my suspicions and I'll show the x86 version here:
This source
void v(unsigned long *in, unsigned long *out, unsigned int n)
{
int i;
for (i = 0; i < n; i++)
{
out[i] = in[i];
}
}
results in the following assembly, when ivopts candidate 7 is used:
v:
testl %edx, %edx
je .L1
leal -1(%rdx), %eax
leaq 8(,%rax,8), %rcx
xorl %eax, %eax
.L3:
movq (%rdi,%rax), %rdx
movq %rdx, (%rsi,%rax)
addq $8, %rax
cmpq %rcx, %rax
jne .L3
.L1:
rep ret
Should the following be happening?
leal -1(%rdx), %eax
leaq 8(,%rax,8), %rcx
i.e. %eax = n - 1
%rcx = 8 * (n + 1)
The pattern can already be observed in ivopts' GIMPLE:
<bb 4>:
_15 = n_5(D) + 4294967295;
_2 = (sizetype) _15;
_1 = _2 + 1;
_24 = _1 * 8;
Why do we need the - 1 and subsequent + 1 when the %eax is zeroed afterwards
anyway? Granted, this exact situation won't ever be observed on x86 as another
ivopts candidate is chosen but on s390 this situation will amount to three
instructions.
If I see it correctly, the n - 1 comes from estimating the number of loop
iterations, while the +1 is then correctly added by cand_value_at() because the
loop counter is incremented before the exit test. Perhaps this is intended
behavior and there is nothing wrong with it?
Regards
Robin