https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I guess it needs analysis.
Some examples of changes:
vshuf-v16qi.c -msse2 test_2, scalar code vs. punpcklqdq, clear win
vshuf-v16qi.c -msse4 test_2, pshufb -> punpcklqdq (is this a win or not?)
(similarly for -mavx, -mavx2, -mavx512f, -mavx512bw)
vshuf-v16si.c -mavx512{f,bw} test_2:
-       vpermi2d        %zmm1, %zmm1, %zmm0
+       vmovdqa64       .LC2(%rip), %zmm0
+       vpermi2q        %zmm1, %zmm1, %zmm0
looks like pessimization.
vshuf-v32hi.c -mavx512bw test_2, similar pessimization.
vshuf-v32hi.c -mavx512bw test_2, similarly:
-       vpermi2w        %zmm1, %zmm1, %zmm0
+       vmovdqa64       .LC2(%rip), %zmm0
+       vpermi2q        %zmm1, %zmm1, %zmm0
vshuf-v4si.c -msse2 test_183, another pessimization:
-       pshufd  $78, %xmm0, %xmm1
+       movdqa  %xmm0, %xmm1
        movd    b(%rip), %xmm4
        pshufd  $255, %xmm0, %xmm2
+       shufpd  $1, %xmm0, %xmm1
vshuf-v4si.c -msse4 test_183, another pessimization:
-       pshufd  $78, %xmm1, %xmm0
+       movdqa  %xmm1, %xmm0
+       palignr $8, %xmm0, %xmm0
vshuf-v4si.c -mavx test_183:
-       vpshufd $78, %xmm1, %xmm0
+       vpalignr        $8, %xmm1, %xmm1, %xmm0
vshuf-v64qi.c -mavx512bw, desirable change:
-       vpermi2w        %zmm1, %zmm1, %zmm0
-       vpshufb .LC3(%rip), %zmm0, %zmm1
-       vpshufb .LC4(%rip), %zmm0, %zmm0
-       vporq   %zmm0, %zmm1, %zmm0
+       vpermi2q        %zmm1, %zmm1, %zmm0
vshuf-v8hi.c -msse2 test_1 another scalar to punpcklqdq, win
vshuf-v8hi.c -msse4 test_2 (supposedly a win):
-       pshufb  .LC3(%rip), %xmm0
+       punpcklqdq      %xmm0, %xmm0
vshuf-v8hi.c -mavx test_2, similarly:
-       vpshufb .LC3(%rip), %xmm0, %xmm0
+       vpunpcklqdq     %xmm0, %xmm0, %xmm0
vshuf-v8si.c -mavx2 test_2, another win:
-       vmovdqa a(%rip), %ymm0
-       vperm2i128      $0, %ymm0, %ymm0, %ymm0
+       vpermq  $68, a(%rip), %ymm0
vshuf-v8si.c -mavx2 test_5, another win:
-       vmovdqa .LC6(%rip), %ymm0
-       vmovdqa .LC7(%rip), %ymm1
-       vmovdqa %ymm0, -48(%rbp)
        vmovdqa a(%rip), %ymm0
-       vpermd  %ymm0, %ymm1, %ymm1
-       vpshufb .LC8(%rip), %ymm0, %ymm3
-       vpshufb .LC10(%rip), %ymm0, %ymm0
-       vmovdqa %ymm1, c(%rip)
-       vmovdqa b(%rip), %ymm1
-       vpermq  $78, %ymm3, %ymm3
-       vpshufb .LC9(%rip), %ymm1, %ymm2
-       vpshufb .LC11(%rip), %ymm1, %ymm1
-       vpor    %ymm3, %ymm0, %ymm0
-       vpermq  $78, %ymm2, %ymm2
-       vpor    %ymm2, %ymm1, %ymm1
-       vpor    %ymm1, %ymm0, %ymm0
+       vmovdqa .LC7(%rip), %ymm2
+       vmovdqa .LC6(%rip), %ymm1
+       vpermd  %ymm0, %ymm2, %ymm2
+       vpermd  b(%rip), %ymm1, %ymm3
+       vmovdqa %ymm1, -48(%rbp)
+       vmovdqa %ymm2, c(%rip)
+       vpermd  %ymm0, %ymm1, %ymm0
+       vmovdqa .LC8(%rip), %ymm2
+       vpand   %ymm2, %ymm1, %ymm1
+       vpcmpeqd        %ymm2, %ymm1, %ymm1
+       vpblendvb       %ymm1, %ymm3, %ymm0, %ymm0
vshuf-v8si.c -mavx512f test_2, another win?
-       vmovdqa a(%rip), %ymm0
-       vperm2i128      $0, %ymm0, %ymm0, %ymm0
+       vpermq  $68, a(%rip), %ymm0

The above does not list all changes, I've been often ignoring further changes
in the file if say one change adds or removes a .LC*, then everything else is
renumbered (and doesn't sometimes list cases where the same or similar change
appears with multiple ISAs). So the results are clearly mixed.

Perhaps I should just try doing this at the end of expand_vec_perm_1 (i.e. if
we (most likely) couldn't get a single insn normally, see if we would get it
otherwise), and at the end of ix86_expand_vec_perm_const_1 (as the fallback
after all sequences).  It won't catch some beneficial one insn to one insn
changes (e.g. where in the original case the insn needs a constant operand in
memory) though.

Reply via email to