https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84234
Bug ID: 84234 Summary: #pragma omp declare simd is ignored Product: gcc Version: 7.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: gcc.account at lemaitre dot re Target Milestone: --- Created attachment 43344 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43344&action=edit Simple example showing the bug: gcc -O3 -fopenmp-simd When I try to use #pragma omp declare simd on a forward declaration, it seems to be ignored during vectorization at the call site. ex: #pragma omp declare simd float add2(float a, float b); void ADD2() { for (int i = 0; i < 1024; i++) { A[i] = add2(A[i], B[i]); } } is compiled into: ADD2: .LFB2: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 xorl %ebx, %ebx .p2align 4,,10 .p2align 3 .L19: movss B(%rbx), %xmm1 addq $4, %rbx movss A-4(%rbx), %xmm0 call add2 movss %xmm0, A-4(%rbx) cmpq $4096, %rbx jne .L19 popq %rbx .cfi_def_cfa_offset 8 ret .cfi_endproc where #pragma omp declare simd float __attribute((noinline)) add1(float a, float b) { return a+b; } void ADD1() { for (int i = 0; i < 1024; i++) { A[i] = add1(A[i], B[i]); } } is compiled into: ADD1: .LFB1: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 xorl %ebx, %ebx .p2align 4,,10 .p2align 3 .L15: movaps A(%rbx), %xmm0 addq $16, %rbx movaps B-16(%rbx), %xmm1 call _ZGVbN4vv_add1 movaps %xmm0, A-16(%rbx) cmpq $4096, %rbx jne .L15 popq %rbx .cfi_def_cfa_offset 8 ret .cfi_endproc When the function has no definition, the compiler doesn't use the vectorized variant of the function. This also happens if one tries to give the definition of the function, but defines the symbol as weak. This is really annoying as we have to put the definition of such a function within the same translation unit as it uses, with all problems that might occur. This bug is present on all gcc versions I tested, namely: GCC 4.9 x86, GCC 5.5 x86, GCC 6.4 x86, GCC 7.3 x86 and GCC trunk x86 (from godbolt.org). On other architectures, the pragma seems to be always ignored, even when a definition is available (GCC 7.2 ARM, GCC 6.3 AARCH64, GCC 6.3 PPC64). For information, this works as expected on ICC.