On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson <[email protected]> wrote:
> On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
>> -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>> +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)
>
> insn_num might as well be "bool avx2", since it's only ever set to two values.
Agree. However:
after the alignment, one operand permutation could be just move and
take 2 instructions for AVX2 as well
for AVX2 there could be other cases when the scheme takes 4 or 5 instructions
we can leave it for potential avx512 extension
>
>> - /* Even with AVX, palignr only operates on 128-bit vectors. */
>> - if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>> + /* SSSE3 is required to apply PALIGNR on 16 bytes operands. */
>> + if (GET_MODE_SIZE (d->vmode) == 16)
>> + {
>> + if (!TARGET_SSSE3)
>> + return false;
>> + }
>> + /* AVX2 is required to apply PALIGNR on 32 bytes operands. */
>> + else if (GET_MODE_SIZE (d->vmode) == 32)
>> + {
>> + if (!TARGET_AVX2)
>> + return false;
>> + }
>> + /* Other sizes are not supported. */
>> + else
>> return false;
>
> And you'd better check it up here because...
>
Correct. The following should resolve the issue:
/* For AVX2 we need more than 2 instructions when the alignment
by itself does not produce the desired permutation. */
if (TARGET_AVX2 && insn_num <= 2)
return false;
>> + /* For SSSE3 we need 1 instruction for palignr plus 1 for one
>> + operand permutaoin. */
>> + if (insn_num == 2)
>> + {
>> + ok = expand_vec_perm_1 (&dcopy);
>> + gcc_assert (ok);
>> + }
>> + /* For AVX2 we need 2 instructions for the shift: vpalignr and
>> + vperm plus 4 instructions for one operand permutation. */
>> + else if (insn_num == 6)
>> + {
>> + ok = expand_vec_perm_vpshufb2_vpermq (&dcopy);
>> + gcc_assert (ok);
>> + }
>> + else
>> + ok = false;
>> return ok;
>
> ... down here you'll simply ICE from the gcc_assert.
>
>
> r~