> I think this now conflicts a bit with what I just pushed (sorry).
>
>> && loop_vinfo)
>> {
>> + unsigned i, j;
>> + bool simple_perm_series = true;
>> + FOR_EACH_VEC_ELT (SLP_TREE_LOAD_PERMUTATION (slp_node), i, j)
>> + if (i != j)
>> + simple_perm_series = false;
>
> In particular I disallow all load permutes, since we are going to elide it.
The load permutation in the x264 case is simple, {0, 1, 2, 3}, {4, 5, 6, 7},
etc. If we're disallowing these as well here there is no point in having the
new function ;)
So we're not allowing load permutations here because we would elide (=not
generate code for) them via
> /* ??? The following checks should really be part of
> get_load_store_type. */
> if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
even though perm_ok = true?
Isn't this all single-element, single-lane anyway?
Can I re-allow simple permutations (as they are nops anyway)? Or rather add a
vect_transform_slp_perm_load in the mat_gather_scatter_p part of
vectorizable_load?
The case I was trying to avoid (for now) when disabling load permutations
is
for (int i = 0; i < 512; ++i)
{
x[2*i] = y[1023 - (2*i)];
x[2*i+1] = y[1023 - (2*i+1)];
}
(load permutation {1, 0}).
> Re-doing this after the above is a bit ugly. Likewise having pun_vectype.
> It's also an opportunity to move the VMAT_STRIDED_SLP punning and
> alignment (re-)computation code here. OTOH this is complicated enough
> already.
I can try but I'm not sure it will be any less complex.
> I believe you now have to change
>
> /* ??? The following checks should really be part of
> get_load_store_type. */
> if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
> && !((memory_access_type == VMAT_ELEMENTWISE
> || mat_gather_scatter_p (memory_access_type))
> && SLP_TREE_LANES (slp_node) == 1
> && (!grouped_load
> || !DR_GROUP_NEXT_ELEMENT (first_stmt_info))))
> {
>
> as otherwise you'll get slp_perm set but the permute will be silently
> elided. This is all a bit ugly already :/
So we're setting slp_perm but know that we don't act on it in VMAT_ELEMENTWISE?
>
> Now I wonder if/how we handle
>
> for (i = 0; i < n; ++i)
> {
> int j = offset[i];
> sum += data[2*i] + data[2*i+1];
> }
>
> aka a gather grouped load. Maybe there's an opportunity to handle
> the "punning" higher up, representing this as a single-element group
> in the first place. Hmm.
>
> Anyway, I think the general direction of the patch is OK. You'll have to
> figure what I just broke though and I'm still somewhat missing the
> "beauty" moment when thinking of how the VMAT_STRIDED_SLP
>
> /* ??? Modify local copies of alignment_support_scheme and
> misalignment, but this part of analysis should be done
> earlier and remembered, likewise the chosen load mode. */
>
> parts resolve themselves into happiness with this ... ;)
I'm not sure I'm capable of producing a beauty moment in the vectorizer ;)
but I'll try how ugly it gets when consolidating the type punning.
--
Regards
Robin