On Wed, Oct 1, 2014 at 2:56 PM, Jakub Jelinek <[email protected]> wrote: > On Wed, Oct 01, 2014 at 02:25:01PM +0200, Uros Bizjak wrote: >> OK. > > And now the expand_vec_perm_palignr improvement, tested > with GCC_TEST_RUN_EXPENSIVE=1 make check-gcc \ > RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c' > E.g. > typedef unsigned long long V __attribute__ ((vector_size (32))); > extern void abort (void); > V a, b, c, d; > void test_14 (void) > { > V mask = { 6, 1, 3, 4 }; > int i; > c = __builtin_shuffle (a, mask); > d = __builtin_shuffle (a, b, mask); > } > (distilled from test 15 in vshuf-v4di.c) results in: > - vmovdqa a(%rip), %ymm0 > - vpermq $54, %ymm0, %ymm1 > - vpshufb .LC1(%rip), %ymm0, %ymm0 > - vmovdqa %ymm1, c(%rip) > - vmovdqa b(%rip), %ymm1 > - vpshufb .LC0(%rip), %ymm1, %ymm1 > - vpermq $78, %ymm1, %ymm1 > - vpor %ymm1, %ymm0, %ymm0 > + vmovdqa a(%rip), %ymm1 > + vpermq $54, %ymm1, %ymm0 > + vmovdqa %ymm0, c(%rip) > + vmovdqa b(%rip), %ymm0 > + vpalignr $8, %ymm1, %ymm0, %ymm0 > + vpermq $99, %ymm0, %ymm0 > vmovdqa %ymm0, d(%rip) > vzeroupper > ret > change (and two fewer .rodata constants).
On a related note, I would like to point out that gcc.target/i386/pr61403.c also fails to generate blend insn with -mavx2. The new insn sequence includes lots of new vpshufb insns with memory access. Uros.
