https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119464
Bug ID: 119464 Summary: VEC_PERM_EXPR not optimized to pslldq instruction for AVX2 and AVX512BW Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: mkretz at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (https://compiler-explorer.com/z/Pro5W6e4f): --- typedef unsigned long long V2 __attribute__((vector_size(16))); typedef unsigned long long V4 __attribute__((vector_size(32))); typedef unsigned long long V8 __attribute__((vector_size(64))); V2 shift(V2 x) { return __builtin_shufflevector(x, V2(), 2, 0); } V4 shift(V4 x) { return __builtin_shufflevector(x, V4(), 4, 0, 4, 2); } V8 shift(V8 x) { return __builtin_shufflevector(x, V8(), 8, 0, 8, 2, 8, 4, 8, 6); --- Clang translates this to the expected shift(unsigned long long vector[2]): vpslldq xmm0, xmm0, 8 ret shift(unsigned long long vector[4]): vpslldq ymm0, ymm0, 8 ret shift(unsigned long long vector[8]): vpslldq zmm0, zmm0, 8 ret GCC only recognizes vpslldq for vector_size(16), the other two patterns are missing.