> I think this now conflicts a bit with what I just pushed (sorry).
>
>>        && loop_vinfo)
>>      {
>> +      unsigned i, j;
>> +      bool simple_perm_series = true;
>> +      FOR_EACH_VEC_ELT (SLP_TREE_LOAD_PERMUTATION (slp_node), i, j)
>> +       if (i != j)
>> +         simple_perm_series = false;
>
> In particular I disallow all load permutes, since we are going to elide it.

The load permutation in the x264 case is simple, {0, 1, 2, 3}, {4, 5, 6, 7}, 
etc.  If we're disallowing these as well here there is no point in having the 
new function ;)

So we're not allowing load permutations here because we would elide (=not 
generate code for) them via 
>   /* ???  The following checks should really be part of
>      get_load_store_type.  */
>   if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
even though perm_ok = true?  

Isn't this all single-element, single-lane anyway?

Can I re-allow simple permutations (as they are nops anyway)?  Or rather add a 
vect_transform_slp_perm_load in the mat_gather_scatter_p part of 
vectorizable_load?

The case I was trying to avoid (for now) when disabling load permutations
is

  for (int i = 0; i < 512; ++i)
    {
      x[2*i] = y[1023 - (2*i)];
      x[2*i+1] = y[1023 - (2*i+1)];
    }

(load permutation {1, 0}).

> Re-doing this after the above is a bit ugly.  Likewise having pun_vectype.
> It's also an opportunity to move the VMAT_STRIDED_SLP punning and
> alignment (re-)computation code here.  OTOH this is complicated enough
> already.

I can try but I'm not sure it will be any less complex.

> I believe you now have to change
>
>   /* ???  The following checks should really be part of
>      get_load_store_type.  */
>   if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
>       && !((memory_access_type == VMAT_ELEMENTWISE
>             || mat_gather_scatter_p (memory_access_type))
>            && SLP_TREE_LANES (slp_node) == 1
>            && (!grouped_load
>                || !DR_GROUP_NEXT_ELEMENT (first_stmt_info))))
>     {
>
> as otherwise you'll get slp_perm set but the permute will be silently
> elided.  This is all a bit ugly already :/

So we're setting slp_perm but know that we don't act on it in VMAT_ELEMENTWISE?

>
> Now I wonder if/how we handle
>
>  for (i = 0; i < n; ++i)
>    {
>       int j = offset[i];
>       sum += data[2*i] + data[2*i+1];
>    }
>
> aka a gather grouped load.  Maybe there's an opportunity to handle
> the "punning" higher up, representing this as a single-element group
> in the first place.  Hmm.
>
> Anyway, I think the general direction of the patch is OK.  You'll have to
> figure what I just broke though and I'm still somewhat missing the
> "beauty" moment when thinking of how the VMAT_STRIDED_SLP
>
>       /* ???  Modify local copies of alignment_support_scheme and
>          misalignment, but this part of analysis should be done
>          earlier and remembered, likewise the chosen load mode.  */
>
> parts resolve themselves into happiness with this ... ;)

I'm not sure I'm capable of producing a beauty moment in the vectorizer ;)
but I'll try how ugly it gets when consolidating the type punning.

-- 
Regards
 Robin

Reply via email to