https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87804
Bug ID: 87804 Summary: Omp simd loop with sin calls not vectorized when inside omp parallel region and the sin parameter uses value from shared array Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: pavel.ondracka at gmail dot com Target Milestone: --- Created attachment 44926 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44926&action=edit testcase First of all I'm not sure if this is the right component, however selecting fortran fronted for now, since equivalent C code works fine. I need to use vectorized math functions from libmvec (with omp simd) inside an omp parallel region. The omp simd part works (not by default but using the instructions from here: https://gcc.gnu.org/ml/gcc/2017-11/msg00014.html), however when the parameter to the function depends on some value from an shared array it doesn't vectorize. It doesn't matter if the array is declared with constant dimension or allocated on the heap. See the attached minimal testcase. gfortran -O2 -fopenmp -fopt-info-omp-vec-optimized-all test.f03 Analyzing loop at test.f03:23 test.f03:23:0: note: ===== analyze_loop_nest ===== test.f03:23:0: note: === vect_analyze_loop_form === test.f03:23:0: note: === get_loop_niters === test.f03:23:0: note: === vect_analyze_data_refs === test.f03:23:0: note: got vectype for stmt: _37 = *.omp_data_i_36(D).b; vector(2) unsigned long test.f03:23:0: note: got vectype for stmt: _38 = *_37.data; vector(2) unsigned long test.f03:23:0: note: not vectorized: not suitable for gather load _38 = *_37.data; test.f03:23:0: note: bad data references. test.f03:19:0: note: vectorized 0 loops in function. If I remove the outer "omp parallel do" the inner loop vectorizes fine. So far the only solution I have found which makes it work together is to place the array on the stack and make it firstprivate in the parallel region. However this IMO should not be needed as I'm using it only for reading inside the loops (and this workaround has some overhead). Gfortran: 8.2.1 Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz This is my first bug report for gcc, so please let me know if more info is needed or if I made some obvious mistake in my testcase.