https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95762

            Bug ID: 95762
           Summary: Failure to optimize __builtin_convertvector from
                    vector of 16 chars to vector of 16 shorts in a single
                    instruction on AVX2
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

typedef int8_t v16i8 __attribute__((vector_size(16)));
typedef int16_t v16i16 __attribute__((vector_size(32)));

auto f(v16i8 a)
{
    return __builtin_convertvector(a, v16i16);
}

With -O3 -mavx2, LLVM outputs this :

f(signed char __vector(16)):
  vpmovsxbw ymm0, xmm0
  ret

GCC outputs this :

f(signed char __vector(16)):
  vpmovsxbw xmm1, xmm0
  vpsrldq xmm0, xmm0, 8
  vpmovsxbw xmm0, xmm0
  vinserti128 ymm0, ymm1, xmm0, 0x1
  ret

Reply via email to