http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55645
Bug #: 55645 Summary: skipping unlike branch in vectorized loops using movmsk or equivalent Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: vincenzo.innoce...@cern.ch I'm wondering if the vectorization engine could accommodate some mechanism to skip unlike branches using a global test based on movmsk or similar below a trivial example including a possible SLP implementation that happens to compile with 4.8 as c++ -std=c++11 -Ofast -mavx2 -S divergent.cc; less divergent.s float a[1024]; float b[1024]; float c[1024]; #define likely(x) (__builtin_expect(x, true)) // possible syntax void compute() { for (int i=0;i!=1024;++i) { if likely(a[i]<b[i]) // very often c[i]=a[i]+b[i]; else // rare c[i]=a[i]-b[i]; } } // hand-made implementation that compile with 4.8 today #include <x86intrin.h> typedef float __attribute__( ( vector_size( 32 ) ) ) float32x8_t; typedef int __attribute__( ( vector_size( 32 ) ) ) int32x8_t; float32x8_t va[1024]; float32x8_t vb[1024]; float32x8_t vc[1024]; void computeV() { for (int i=0;i!=1024;++i) { float32x8_t mask = va[i]<vb[i]; if likely(_mm256_movemask_ps(mask) == 255) { vc[i]=va[i]+vb[i]; } else { vc[i]= va[i]<vb[i] ? va[i]+vb[i] : va[i]-vb[i]; } } }