https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95762
Bug ID: 95762 Summary: Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- typedef int8_t v16i8 __attribute__((vector_size(16))); typedef int16_t v16i16 __attribute__((vector_size(32))); auto f(v16i8 a) { return __builtin_convertvector(a, v16i16); } With -O3 -mavx2, LLVM outputs this : f(signed char __vector(16)): vpmovsxbw ymm0, xmm0 ret GCC outputs this : f(signed char __vector(16)): vpmovsxbw xmm1, xmm0 vpsrldq xmm0, xmm0, 8 vpmovsxbw xmm0, xmm0 vinserti128 ymm0, ymm1, xmm0, 0x1 ret