https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119464

            Bug ID: 119464
           Summary: VEC_PERM_EXPR not optimized to pslldq instruction for
                    AVX2 and AVX512BW
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mkretz at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

Test case (https://compiler-explorer.com/z/Pro5W6e4f):
---

typedef unsigned long long V2 __attribute__((vector_size(16)));
typedef unsigned long long V4 __attribute__((vector_size(32)));
typedef unsigned long long V8 __attribute__((vector_size(64)));

V2 shift(V2 x)
{ return __builtin_shufflevector(x, V2(), 2, 0); }

V4 shift(V4 x)
{ return __builtin_shufflevector(x, V4(), 4, 0, 4, 2); }

V8 shift(V8 x)
{ return __builtin_shufflevector(x, V8(), 8, 0, 8, 2, 8, 4, 8, 6); 

---
Clang translates this to the expected

shift(unsigned long long vector[2]):
        vpslldq xmm0, xmm0, 8
        ret

shift(unsigned long long vector[4]):
        vpslldq ymm0, ymm0, 8
        ret

shift(unsigned long long vector[8]):
        vpslldq zmm0, zmm0, 8
        ret


GCC only recognizes vpslldq for vector_size(16), the other two patterns are
missing.

Reply via email to