https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #36897|0 |1 is obsolete| | --- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Created attachment 36898 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36898&action=edit gcc6-pr68655.patch Second attempt, this one tries it only for a single insn if we couldn't get a single insn otherwise, and then as a final fallback if nothing else worked. Running the same command, I see only beneficial changes this time: vshuf-v16qi.c -msse2 test_2, scalar to punpcklqdq vshuf-v64qi.c -mavx512bw test_2 - vpermi2w %zmm1, %zmm1, %zmm0 - vpshufb .LC3(%rip), %zmm0, %zmm1 - vpshufb .LC4(%rip), %zmm0, %zmm0 - vporq %zmm0, %zmm1, %zmm0 + vpermi2q %zmm1, %zmm1, %zmm0 vshuf-v8hi.c -msse2 test_2, scalar to punpcklqdq and that is it. Also on: typedef unsigned short v8hi __attribute__((vector_size(16))); typedef int v4si __attribute__((vector_size(16))); typedef long long v2di __attribute__((vector_size(16))); v2di f1 (v2di a, v2di b) { return __builtin_shuffle (a, b, (v2di) { 0, 2 }); } v8hi f2 (v8hi a, v8hi b) { return __builtin_shuffle (a, b, (v8hi) { 0, 1, 2, 3, 8, 9, 10, 11 }); } v4si f3 (v4si a, v4si b) { return __builtin_shuffle (a, b, (v4si) { 0, 1, 4, 5 }); } v8hi f4 (v8hi a, v8hi b) { return __builtin_shuffle (a, b, (v8hi) { 0, 1, 2, 3, 12, 13, 14, 15 }); } with -O2 -msse2 (or -msse3) the diff in f2 is scalar code to punpcklqdq, with -O2 -mssse3 the diff is f2: - punpcklwd %xmm1, %xmm0 - pshufb .LC0(%rip), %xmm0 + punpcklqdq %xmm1, %xmm0 f4: - palignr $8, %xmm1, %xmm0 - palignr $8, %xmm0, %xmm0 + shufpd $2, %xmm1, %xmm0 for -O2 -msse4 just the f2 change, for each of -O2 -mavx{,2,512f,512vl} f2: - vpunpcklwd %xmm1, %xmm0, %xmm1 - vpshufb .LC0(%rip), %xmm1, %xmm0 + vpunpcklqdq %xmm1, %xmm0, %xmm0