https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #36897|0                           |1
        is obsolete|                            |

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 36898
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36898&action=edit
gcc6-pr68655.patch

Second attempt, this one tries it only for a single insn if we couldn't get a
single insn otherwise, and then as a final fallback if nothing else worked.
Running the same command, I see only beneficial changes this time:
vshuf-v16qi.c -msse2 test_2, scalar to punpcklqdq
vshuf-v64qi.c -mavx512bw test_2
-       vpermi2w        %zmm1, %zmm1, %zmm0
-       vpshufb .LC3(%rip), %zmm0, %zmm1
-       vpshufb .LC4(%rip), %zmm0, %zmm0
-       vporq   %zmm0, %zmm1, %zmm0
+       vpermi2q        %zmm1, %zmm1, %zmm0
vshuf-v8hi.c -msse2 test_2, scalar to punpcklqdq
and that is it.
Also on:
typedef unsigned short v8hi __attribute__((vector_size(16)));
typedef int v4si __attribute__((vector_size(16)));
typedef long long v2di __attribute__((vector_size(16)));

v2di
f1 (v2di a, v2di b)
{
  return __builtin_shuffle (a, b, (v2di) { 0, 2 });
}

v8hi
f2 (v8hi a, v8hi b)
{
  return __builtin_shuffle (a, b, (v8hi) { 0, 1, 2, 3, 8, 9, 10, 11 });
}

v4si
f3 (v4si a, v4si b)
{
  return __builtin_shuffle (a, b, (v4si) { 0, 1, 4, 5 });
}

v8hi
f4 (v8hi a, v8hi b)
{
  return __builtin_shuffle (a, b, (v8hi) { 0, 1, 2, 3, 12, 13, 14, 15 });
}

with -O2 -msse2 (or -msse3) the diff in f2 is scalar code to punpcklqdq,
with -O2 -mssse3 the diff is f2:
- punpcklwd %xmm1, %xmm0
- pshufb .LC0(%rip), %xmm0
+ punpcklqdq %xmm1, %xmm0
f4:
- palignr $8, %xmm1, %xmm0
- palignr $8, %xmm0, %xmm0
+ shufpd $2, %xmm1, %xmm0
for -O2 -msse4 just the f2 change, for each of -O2 -mavx{,2,512f,512vl} f2:
- vpunpcklwd %xmm1, %xmm0, %xmm1
- vpshufb .LC0(%rip), %xmm1, %xmm0
+ vpunpcklqdq %xmm1, %xmm0, %xmm0

Reply via email to