https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67167
Bug ID: 67167
Summary: cilkplus vectorization problems
Product: gcc
Version: 5.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: marcin.krotkiewski at gmail dot com
Target Milestone: ---
I think there is a problem with vectorization of arithmetic operations in the
cilkplus implementation in gcc. I have inspected generated asm of the following
two implementations of vector addition (a = a + b). The code is compiled with
'gcc -O3 -mavx -ftree-vectorize -fopt-info-vec -fcilkplus test.c'.
// ICC compatibility - alignment hint
#ifdef __GNUC__
#define __assume_aligned(lvalueptr, align) lvalueptr = __builtin_assume_aligned
(lvalueptr, align)
#endif
#define RESTRICT __restrict__
typedef double Double;
void test(Double * RESTRICT a, Double * RESTRICT b, int size)
{
int i;
__assume_aligned(a, 64);
__assume_aligned(b, 64);
for(i=0; i<size; i++)
a[i] = a[i] + b[i];
}
void test_cilkplus1(Double * RESTRICT a, Double * RESTRICT b, int size)
{
__assume_aligned(a, 64);
__assume_aligned(b, 64);
a[0:size] = a[0:size] + b[0:size];
}
The first code (test) is vectorized as expected - here comes the ASM:
.L4:
vmovapd (%rdi,%r8), %ymm0
addl $1, %r9d
vaddpd (%rsi,%r8), %ymm0, %ymm0
vmovapd %ymm0, (%rdi,%r8)
addq $32, %r8
cmpl %r9d, %ecx
ja .L4
On the contrary, the second function (test_cilkplus1) is not vectorized:
.L21:
vmovsd (%rdi,%rax), %xmm0
movl %ecx, %r8d
addl $1, %ecx
vaddsd (%rsi,%rax), %xmm0, %xmm0
vmovsd %xmm0, (%rdi,%rax)
addq $8, %rax
cmpl %r8d, %edx
jg .L21
Now I have made sure that the compiler understands that there is no aliasing
(restrict) and that the vectors are aligned in memory. Clearly this is enough
for the standard implementation to produce a vectorized code, but not for the
CilkPlus array notation.