https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655

--- Comment #11 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 3 Dec 2015, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655
> 
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>   Attachment #36897|0                           |1
>         is obsolete|                            |
> 
> --- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Created attachment 36898
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36898&action=edit
> gcc6-pr68655.patch
> 
> Second attempt, this one tries it only for a single insn if we couldn't get a
> single insn otherwise, and then as a final fallback if nothing else worked.
> Running the same command, I see only beneficial changes this time:
> vshuf-v16qi.c -msse2 test_2, scalar to punpcklqdq
> vshuf-v64qi.c -mavx512bw test_2
> -       vpermi2w        %zmm1, %zmm1, %zmm0
> -       vpshufb .LC3(%rip), %zmm0, %zmm1
> -       vpshufb .LC4(%rip), %zmm0, %zmm0
> -       vporq   %zmm0, %zmm1, %zmm0
> +       vpermi2q        %zmm1, %zmm1, %zmm0
> vshuf-v8hi.c -msse2 test_2, scalar to punpcklqdq
> and that is it.
> Also on:
> typedef unsigned short v8hi __attribute__((vector_size(16)));
> typedef int v4si __attribute__((vector_size(16)));
> typedef long long v2di __attribute__((vector_size(16)));
> 
> v2di
> f1 (v2di a, v2di b)
> {
>   return __builtin_shuffle (a, b, (v2di) { 0, 2 });
> }
> 
> v8hi
> f2 (v8hi a, v8hi b)
> {
>   return __builtin_shuffle (a, b, (v8hi) { 0, 1, 2, 3, 8, 9, 10, 11 });
> }
> 
> v4si
> f3 (v4si a, v4si b)
> {
>   return __builtin_shuffle (a, b, (v4si) { 0, 1, 4, 5 });
> }
> 
> v8hi
> f4 (v8hi a, v8hi b)
> {
>   return __builtin_shuffle (a, b, (v8hi) { 0, 1, 2, 3, 12, 13, 14, 15 });
> }
> 
> with -O2 -msse2 (or -msse3) the diff in f2 is scalar code to punpcklqdq,
> with -O2 -mssse3 the diff is f2:
> - punpcklwd %xmm1, %xmm0
> - pshufb .LC0(%rip), %xmm0
> + punpcklqdq %xmm1, %xmm0
> f4:
> - palignr $8, %xmm1, %xmm0
> - palignr $8, %xmm0, %xmm0
> + shufpd $2, %xmm1, %xmm0
> for -O2 -msse4 just the f2 change, for each of -O2 -mavx{,2,512f,512vl} f2:
> - vpunpcklwd %xmm1, %xmm0, %xmm1
> - vpshufb .LC0(%rip), %xmm1, %xmm0
> + vpunpcklqdq %xmm1, %xmm0, %xmm0

LGTM

Reply via email to