Ping.
On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko <evstu...@gmail.com> wrote: > On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson <r...@redhat.com> wrote: >> On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote: >>> -expand_vec_perm_palignr (struct expand_vec_perm_d *d) >>> +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num) >> >> insn_num might as well be "bool avx2", since it's only ever set to two >> values. > > Agree. However: > after the alignment, one operand permutation could be just move and > take 2 instructions for AVX2 as well > for AVX2 there could be other cases when the scheme takes 4 or 5 instructions > we can leave it for potential avx512 extension > >> >>> - /* Even with AVX, palignr only operates on 128-bit vectors. */ >>> - if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16) >>> + /* SSSE3 is required to apply PALIGNR on 16 bytes operands. */ >>> + if (GET_MODE_SIZE (d->vmode) == 16) >>> + { >>> + if (!TARGET_SSSE3) >>> + return false; >>> + } >>> + /* AVX2 is required to apply PALIGNR on 32 bytes operands. */ >>> + else if (GET_MODE_SIZE (d->vmode) == 32) >>> + { >>> + if (!TARGET_AVX2) >>> + return false; >>> + } >>> + /* Other sizes are not supported. */ >>> + else >>> return false; >> >> And you'd better check it up here because... >> > > Correct. The following should resolve the issue: > /* For AVX2 we need more than 2 instructions when the alignment > by itself does not produce the desired permutation. */ > if (TARGET_AVX2 && insn_num <= 2) > return false; > >>> + /* For SSSE3 we need 1 instruction for palignr plus 1 for one >>> + operand permutaoin. */ >>> + if (insn_num == 2) >>> + { >>> + ok = expand_vec_perm_1 (&dcopy); >>> + gcc_assert (ok); >>> + } >>> + /* For AVX2 we need 2 instructions for the shift: vpalignr and >>> + vperm plus 4 instructions for one operand permutation. */ >>> + else if (insn_num == 6) >>> + { >>> + ok = expand_vec_perm_vpshufb2_vpermq (&dcopy); >>> + gcc_assert (ok); >>> + } >>> + else >>> + ok = false; >>> return ok; >> >> ... down here you'll simply ICE from the gcc_assert. > > > > >> >> >> r~