http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46032

Feng Chen <fchen0000 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fchen0000 at gmail dot com

--- Comment #7 from Feng Chen <fchen0000 at gmail dot com> 2012-07-06 16:17:28 
UTC ---
Any update on this? I do see loops getting slower even for large nx*ny
sometimes after omp on gcc 4.6.2, e.g.,

#pragma omp parallel for
for(int iy=0; iy<ny; iy++) {
  for(int ix=0; ix<nx; ix++) {
    dest[(size_t)iy*nx + ix] = src[(size_t)iy*nx + ix] * 2;
  }
}

Sometimes gcc won't vectorize the inner loop, i have to put it into an inline
function to force it.  The performance is only marginally better after that.
ps: I break the loop because I noticed previously that omp parallel inhibits
auto-vectorization, forgot which gcc version I used ...

Graphite did improve the scalability of openmp programs from my experience, so
the fix (with tests) is important ...

(In reply to comment #6)
> Good. But it Graphite breaks it, let's add Sebastian in CC..

Reply via email to