Ping.

On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko <evstu...@gmail.com> wrote:
> On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson <r...@redhat.com> wrote:
>> On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
>>> -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>>> +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)
>>
>> insn_num might as well be "bool avx2", since it's only ever set to two 
>> values.
>
> Agree. However:
>  after the alignment, one operand permutation could be just move and
> take 2 instructions for AVX2 as well
>  for AVX2 there could be other cases when the scheme takes 4 or 5 instructions
>  we can leave it for potential avx512 extension
>
>>
>>> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
>>> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>>> +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
>>> +  if (GET_MODE_SIZE (d->vmode) == 16)
>>> +    {
>>> +      if (!TARGET_SSSE3)
>>> +       return false;
>>> +    }
>>> +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
>>> +  else if (GET_MODE_SIZE (d->vmode) == 32)
>>> +    {
>>> +      if (!TARGET_AVX2)
>>> +       return false;
>>> +    }
>>> +  /* Other sizes are not supported.  */
>>> +  else
>>>      return false;
>>
>> And you'd better check it up here because...
>>
>
> Correct. The following should resolve the issue:
>   /* For AVX2 we need more than 2 instructions when the alignment
>      by itself does not produce the desired permutation.  */
>   if (TARGET_AVX2 && insn_num <= 2)
>     return false;
>
>>> +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
>>> +     operand permutaoin.  */
>>> +  if (insn_num == 2)
>>> +    {
>>> +      ok = expand_vec_perm_1 (&dcopy);
>>> +      gcc_assert (ok);
>>> +    }
>>> +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
>>> +     vperm plus 4 instructions for one operand permutation.  */
>>> +  else if (insn_num == 6)
>>> +    {
>>> +      ok = expand_vec_perm_vpshufb2_vpermq (&dcopy);
>>> +      gcc_assert (ok);
>>> +    }
>>> +  else
>>> +    ok = false;
>>>    return ok;
>>
>> ... down here you'll simply ICE from the gcc_assert.
>
>
>
>
>>
>>
>> r~

Reply via email to