https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87804
Bug ID: 87804
Summary: Omp simd loop with sin calls not vectorized when
inside omp parallel region and the sin parameter uses
value from shared array
Product: gcc
Version: 8.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: pavel.ondracka at gmail dot com
Target Milestone: ---
Created attachment 44926
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44926&action=edit
testcase
First of all I'm not sure if this is the right component, however selecting
fortran fronted for now, since equivalent C code works fine.
I need to use vectorized math functions from libmvec (with omp simd) inside an
omp parallel region. The omp simd part works (not by default but using the
instructions from here: https://gcc.gnu.org/ml/gcc/2017-11/msg00014.html),
however when the parameter to the function depends on some value from an shared
array it doesn't vectorize. It doesn't matter if the array is declared with
constant dimension or allocated on the heap. See the attached minimal testcase.
gfortran -O2 -fopenmp -fopt-info-omp-vec-optimized-all test.f03
Analyzing loop at test.f03:23
test.f03:23:0: note: = analyze_loop_nest =
test.f03:23:0: note: === vect_analyze_loop_form ===
test.f03:23:0: note: === get_loop_niters ===
test.f03:23:0: note: === vect_analyze_data_refs ===
test.f03:23:0: note: got vectype for stmt: _37 = *.omp_data_i_36(D).b;
vector(2) unsigned long
test.f03:23:0: note: got vectype for stmt: _38 = *_37.data;
vector(2) unsigned long
test.f03:23:0: note: not vectorized: not suitable for gather load _38 =
*_37.data;
test.f03:23:0: note: bad data references.
test.f03:19:0: note: vectorized 0 loops in function.
If I remove the outer "omp parallel do" the inner loop vectorizes fine. So far
the only solution I have found which makes it work together is to place the
array on the stack and make it firstprivate in the parallel region. However
this IMO should not be needed as I'm using it only for reading inside the loops
(and this workaround has some overhead).
Gfortran: 8.2.1
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
This is my first bug report for gcc, so please let me know if more info is
needed or if I made some obvious mistake in my testcase.