[Bug target/52568] New: suboptimal __builtin_shuffle on cycles with AVX

marc.glisse at normalesup dot org Mon, 12 Mar 2012 12:00:44 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52568


             Bug #: 52568
           Summary: suboptimal __builtin_shuffle on cycles with AVX
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: marc.gli...@normalesup.org


Hello,
I compiled the following with -O3 (or -Os) and -mavx

#include <x86intrin.h>
__m256d left(__m256d x){
  __m256i mask={1,2,3,0};
  return __builtin_shuffle(x,mask);
}

(by the way, for some reason, gcc insists that 'mask' is set but not used with
-Wall)

and got:
    vunpckhpd    %xmm0, %xmm0, %xmm3
    vmovapd    %xmm0, %xmm1
    vextractf128    $0x1, %ymm0, %xmm0
    vmovaps    %xmm0, %xmm2
    vunpckhpd    %xmm0, %xmm0, %xmm0
    vunpcklpd    %xmm1, %xmm0, %xmm1
    vunpcklpd    %xmm2, %xmm3, %xmm0
    vinsertf128    $0x1, %xmm1, %ymm0, %ymm0
    ret

That doesn't really match the code I currently use to do this:
#ifdef __AVX2__
        __m256d d=_mm256_permute4x64_pd(x,1+2*4+3*16+0*64);
#else
        __m256d b=_mm256_shuffle_pd(x,x,5);
        __m256d c=_mm256_permute2f128_pd(b,b,1);
        __m256d d=_mm256_blend_pd(b,c,10);
#endif

Could something recognizing this permutation pattern (and the right cyclic
shift) be added? I know there are too many shuffles to hand-code them all, but
cycles seem like they shouldn't be too uncommon.

With -mavx2, I get a single vpermq, which is close enough to the expected
vpermpd.

[Bug target/52568] New: suboptimal __builtin_shuffle on cycles with AVX

Reply via email to