https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66598
Bug ID: 66598
Summary: With -O3 gcc incorrectly assumes aligned SSE
instructions (e.g. movapd) can be used
Product: gcc
Version: 4.9.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: michael.l...@uni-ulm.de
Target Milestone: ---
Compiled with gcc-4.9 and gcc-5.0 and -O3 the following code causes a
"Segmentation fault: 11" on all my Intel machines with SSE:
---
double Q[4*64];
double P[5*64];
int
main()
{
int i, j;
double *p = P;
double *q = Q;
for (j=0; j<32; ++j) {
for (i=0; i<4; ++i) {
q[i] = p[i];
}
q += 4;
p += 5;
}
return 0;
}
---
Looking at the assembly code the problem is in
---
L2:
movapd 16(%rax), %xmm0
addq$40, %rax
addq$32, %rdx
movapd -40(%rax), %xmm1
movaps %xmm0, -16(%rdx)
movaps %xmm1, -32(%rdx)
cmpq%rcx, %rax
jne L2
---
So %rax contains the address of p. But even if p=P is initially alined
correctly on a 16-Byte address P+5 is not. So movapd must not be used.
Changing the assembly code manually to
---
L2:
movupd 16(%rax), %xmm0
addq$40, %rax
addq$32, %rdx
movupd -40(%rax), %xmm1
movaps %xmm0, -16(%rdx)
movaps %xmm1, -32(%rdx)
cmpq%rcx, %rax
jne L2
---
fixed the problem.
Cheers,
Michael