http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55645



             Bug #: 55645

           Summary: skipping unlike branch in vectorized loops using

                    movmsk or equivalent

    Classification: Unclassified

           Product: gcc

           Version: 4.8.0

            Status: UNCONFIRMED

          Severity: enhancement

          Priority: P3

         Component: tree-optimization

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: vincenzo.innoce...@cern.ch





I'm wondering if the vectorization engine could accommodate some mechanism to

skip unlike branches using a global test based on movmsk or similar



below a trivial example including a possible SLP implementation that happens to

compile with 4.8 as

c++ -std=c++11 -Ofast -mavx2 -S divergent.cc; less divergent.s



float a[1024];

float b[1024];

float c[1024];



 #define likely(x) (__builtin_expect(x, true))



// possible syntax

void compute() {

  for (int i=0;i!=1024;++i) {

    if likely(a[i]<b[i])  // very often

    c[i]=a[i]+b[i];

    else // rare

    c[i]=a[i]-b[i];

  }

}





// hand-made implementation that compile with 4.8 today

#include <x86intrin.h>



typedef float __attribute__( ( vector_size( 32 ) ) ) float32x8_t;

typedef int __attribute__( ( vector_size( 32 ) ) ) int32x8_t;





float32x8_t va[1024];

float32x8_t vb[1024];

float32x8_t vc[1024];



void computeV() {

  for (int i=0;i!=1024;++i) {

    float32x8_t mask = va[i]<vb[i];

    if likely(_mm256_movemask_ps(mask) == 255) {

    vc[i]=va[i]+vb[i];

      } else {

      vc[i]= va[i]<vb[i] ? va[i]+vb[i] : va[i]-vb[i];

    }

  }

}

Reply via email to