https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87804

            Bug ID: 87804
           Summary: Omp simd loop with sin calls not vectorized when
                    inside omp parallel region and the sin parameter uses
                    value from shared array
           Product: gcc
           Version: 8.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pavel.ondracka at gmail dot com
  Target Milestone: ---

Created attachment 44926
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44926&action=edit
testcase

First of all I'm not sure if this is the right component, however selecting
fortran fronted for now, since equivalent C code works fine.

I need to use vectorized math functions from libmvec (with omp simd) inside an
omp parallel region. The omp simd part works (not by default but using the
instructions from here: https://gcc.gnu.org/ml/gcc/2017-11/msg00014.html),
however when the parameter to the function depends on some value from an shared
array it doesn't vectorize. It doesn't matter if the array is declared with
constant dimension or allocated on the heap. See the attached minimal testcase.

gfortran -O2 -fopenmp -fopt-info-omp-vec-optimized-all test.f03

Analyzing loop at test.f03:23
test.f03:23:0: note: ===== analyze_loop_nest =====
test.f03:23:0: note: === vect_analyze_loop_form ===
test.f03:23:0: note: === get_loop_niters ===
test.f03:23:0: note: === vect_analyze_data_refs ===
test.f03:23:0: note: got vectype for stmt: _37 = *.omp_data_i_36(D).b;
vector(2) unsigned long
test.f03:23:0: note: got vectype for stmt: _38 = *_37.data;
vector(2) unsigned long
test.f03:23:0: note: not vectorized: not suitable for gather load _38 =
*_37.data;
test.f03:23:0: note: bad data references.
test.f03:19:0: note: vectorized 0 loops in function.

If I remove the outer "omp parallel do" the inner loop vectorizes fine. So far
the only solution I have found which makes it work together is to place the
array on the stack and make it firstprivate in the parallel region. However
this IMO should not be needed as I'm using it only for reading inside the loops
(and this workaround has some overhead).

Gfortran: 8.2.1
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz

This is my first bug report for gcc, so please let me know if more info is
needed or if I made some obvious mistake in my testcase.

Reply via email to