http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789
vincenzo Innocente <vincenzo.innocente at cern dot ch> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vincenzo.innocente at cern | |dot ch --- Comment #10 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2013-04-02 16:49:53 UTC --- I was trying to see how gcc behaves w.r.t. this example http://software.intel.com/en-us/articles/bkm-coaxing-the-compiler-to-vectorize-structured-data-via-gathers So I started from the example in comment 6 and "evolved" as follows f21() and f22() are equivalent to my eyes f21 vectorize, f22 not also the variant f21b does not vectorize… c++ -v Using built-in specs. COLLECT_GCC=c++ COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-apple-darwin12.2.0/4.8.0/lto-wrapper Target: x86_64-apple-darwin12.2.0 Configured with: ./configure --enable-languages=c,c++,fortran --disable-multilib --disable-bootstrap --enable-lto -disable-libitm : (reconfigured) ./configure --enable-languages=c,c++,fortran --disable-multilib --disable-bootstrap --enable-lto -disable-libitm : (reconfigured) ./configure --enable-languages=c,c++,fortran --disable-multilib --disable-bootstrap --enable-lto -disable-libitm Thread model: posix gcc version 4.8.0 20130313 (experimental) [trunk revision 196633] (GCC) c++ -std=c++11 -Ofast -mavx2 -S gather.cc -ftree-vectorizer-verbose=2 struct float3 { float x; float y; float z; }; #define N 1024 float fx[N], g[N]; float fy[N]; float fz[N]; int k[N]; float ff[3*N]; float3 f3[N]; void f20 (void) { int i; for (i = 0; i < N; i++) g[i] = fx[k[i]]+fy[k[i]]+fz[k[i]]; } void f21 (void) { int i; for (i = 0; i < N; i++) g[i] = ff[3*k[i]]+ff[3*k[i]+1]+ff[3*k[i]+2]; } void f22 (void) { int i; for (i = 0; i < N; i++) g[i] = f3[k[i]].x+f3[k[i]].y+f3[k[i]].z; } void f21b (void) { int i; for (i = 0; i < N; i++) { auto j = ff+3*k[i]; g[i] = j[0]+j[1]+j[2]; } }