http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46032
Feng Chen <fchen0000 at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fchen0000 at gmail dot com --- Comment #7 from Feng Chen <fchen0000 at gmail dot com> 2012-07-06 16:17:28 UTC --- Any update on this? I do see loops getting slower even for large nx*ny sometimes after omp on gcc 4.6.2, e.g., #pragma omp parallel for for(int iy=0; iy<ny; iy++) { for(int ix=0; ix<nx; ix++) { dest[(size_t)iy*nx + ix] = src[(size_t)iy*nx + ix] * 2; } } Sometimes gcc won't vectorize the inner loop, i have to put it into an inline function to force it. The performance is only marginally better after that. ps: I break the loop because I noticed previously that omp parallel inhibits auto-vectorization, forgot which gcc version I used ... Graphite did improve the scalability of openmp programs from my experience, so the fix (with tests) is important ... (In reply to comment #6) > Good. But it Graphite breaks it, let's add Sebastian in CC..