https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121892
Bug ID: 121892
Summary: Optimize AVX2 VEC_CONVERT from short to char
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: mkretz at gcc dot gnu.org
Target Milestone: ---
Target: x86-64-*-*, i686-*-*
Test case (https://compiler-explorer.com/z/arn73Ps9f):
using From = short;
using To = char;
constexpr int N = 16;
using V0 [[gnu::vector_size(sizeof(From) * N)]] = From;
using V1 [[gnu::vector_size(sizeof(To) * N)]] = To;
V1 a(V0 x) { return __builtin_convertvector(x, V1); }
With -O2 -march=x86-64-v3, it compiles to:
vpshufb ymm1, ymm0, YMMWORD PTR .LC0[rip]
vpshufb ymm0, ymm0, YMMWORD PTR .LC1[rip]
vpermq ymm1, ymm1, 78
vpor ymm0, ymm0, ymm1
It can be optimized to:
vpshufb ymm0, ymm0, YMMWORD PTR .LC0[rip]
vpermq ymm0, ymm0, 0xd8
With .LC0:
.byte 0
.byte 2
.byte 4
.byte 6
.byte 8
.byte 10
.byte 12
.byte 14
.byte -128
.byte -128
.byte -128
.byte -128
.byte -128
.byte -128
.byte -128
.byte -128
.byte 0
.byte 2
.byte 4
.byte 6
.byte 8
.byte 10
.byte 12
.byte 14
.byte -128
.byte -128
.byte -128
.byte -128
.byte -128
.byte -128
.byte -128
.byte -128
I.e. use pshufb to move the low bytes of each short to the lower 64 bits in
each 128-bit part. Then use permq to swap the inner 64-bit parts. The result
has the upper 128 bits zeroed already.