https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116583

--- Comment #3 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> Another example this shows is for gcc.dg/vect/slp-42.c - we definitely can
> do the interleaving scheme as non-SLP vectorization shows.
> 
> gcc.dg/vect/slp-42.c also shows we're not yet "lowering" all SLP load
> permutes.
> The original SLP attempt still has
> 
>    node 0x45d5050 (max_nunits=4, refcnt=2) vector([4,4]) int
>    op template: _2 = q[_1];
>         stmt 0 _2 = q[_1];
>         stmt 1 _8 = q[_7];
>         stmt 2 _14 = q[_13];
>         stmt 3 _20 = q[_19];
>         load permutation { 0 1 2 3 }
>    node 0x45d50e8 (max_nunits=4, refcnt=2) vector([4,4]) int
>    op template: _4 = q[_3];
>         stmt 0 _4 = q[_3];
>         stmt 1 _10 = q[_9];
>         stmt 2 _16 = q[_15];
>         stmt 3 _22 = q[_21];
>         load permutation { 4 5 6 7 }
> 
> instead of a single contiguous load and two VEC_PERM_EXPR nodes to extract
> the lo/hi parts (which is also extract even/odd, but with a larger mode
> encompassing 4 elements).
> 
> I'd say for VLA operation this is one of the major blockers for all-SLP.

I'll take a look if Richard hasn't yet once I finish early break transition :)
.

Reply via email to