https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89371
Bug ID: 89371
Summary: missed vectorisation with "#pragma omp simd
collapse(2)"
Product: gcc
Version: 8.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: arnaud02 at users dot sourceforge.net
Target Milestone: ---
void ff(double* res, double const* a, double const* b, int ncell, int neq)
{
#pragma omp simd collapse(2)
for(int icell=0; icell < ncell; ++icell)
{
for(int ieq=0; ieq<neq; ++ieq)
{
res[icell*neq+ieq] = a[icell*neq+ieq]-b[icell*neq+ieq];
}
}
}
built by gcc 8.2 on x86_64 with "-std=c++14 -O3 -mavx -fopenmp-simd" results in
simd instruction emitted. Run time tests with ncell=100'000 and neq=3 for
instance confirm that the code is slower with "#pragma omp simd collapse(2)".
Am I missing something?
Ideally, I would like to be able to flatten the loop:
void ff(double* res, double const* a, double const* b, int ncell, int neq)
{
for(int j=0; j < ncell*neq; ++j)
res[j] = a[j]-b[j];
}