https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93771

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 17 Feb 2020, pinskia at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93771
> 
> --- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #2)
> > Confirmed.  I'm not sure if we should try to "fix" SLP here or rather
> > appropriately optimize
> > 
> >   v2df tem1 = *(v2df *)&t[0];
> >   v2df tem2 = *(v2df *)&t[2];
> >   __builtin_shuffle (tem1, tem2 (v2di) { 0, 3 });
> > 
> > which the user could write itself.  forwprop does some related transforms
> > splitting loads in "Rewrite loads used only in BIT_FIELD_REF extractions to
> > component-wise loads."
> 
> I was thinking about originally filing the bug that way but I decided against
> it; though I don't remember my reasoning besides I saw SLP not doing it for
> unrelated loads.

The vectorizer sees the two loads from t as grouped at a time it doesn't
yet know the vectorization factor.  General handling of non-contiguous
loads then emits the permutation.  Structure of that code makes it
quite hard to do what you desire and changing the decision of whether
it's a group or not "late" is also going to hurt.

There are pending changes (in my mind only ... :/) that would make such
a change much more straight-forward of course.

So I think an ad-hoc solution in forwprop is better for now.

Reply via email to