I've noticed that GCC (my current version is 4.4.1) doesn't fully optimize SSE
shuffle merges, as seen in this example: 

#include <xmmintrin.h>

extern void printv(__m128 m);

int main()
{
        m = _mm_shuffle_ps(m, m, 0xC9); // Those two shuffles together swap
pairs
        m = _mm_shuffle_ps(m, m, 0x2D); // And could be optimized to 0x4E
        printv(m);

        return 0;
}

This code generates the following assembly:

        movaps  .LC1, %xmm1
        shufps  $201, %xmm1, %xmm1
        shufps  $45, %xmm1, %xmm1    ; <-- Both should merge to 78
        movaps  %xmm1, %xmm0
        movaps  %xmm1, -24(%ebp)

        .LC0:
                .long   1065353216 ; 1.0f
                .long   1073741824 ; 2.0f
                .long   1077936128 ; 3.0f
                .long   1082130432 ; 4.0f

Would be nice to see it as an enhancement!


-- 
           Summary: SSE shuffle merge
           Product: gcc
           Version: 4.4.1
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: liranuna at gmail dot com
 GCC build triplet: x86_64-linux-gnu
  GCC host triplet: x86_64-linux-gnu
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

Reply via email to