[Bug tree-optimization/116463] [15 Regression] complex multiply vectorizer detection failures after r15-3087-gb07f8a301158e5

tnfchris at gcc dot gnu.org via Gcc-bugs Sun, 24 Nov 2024 07:07:08 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463


--- Comment #24 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to rguent...@suse.de from comment #23)
> > Am 23.11.2024 um 13:20 schrieb tnfchris at gcc dot gnu.org 
> > <gcc-bugzi...@gcc.gnu.org>:
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
> > 
> > --- Comment #22 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > Ok, so the problem with the ones on trunk isn't necessarily the
> > canonicalization itself but that our externals handling is a bit shallow.
> > 
> > On externals we determine that we have no information on the DF and return 
> > TOP.
> > This is because DR analysis doesn't try to handle externals since they're 
> > not
> > part of the loop.
> > 
> > However all we need to know for complex numbers is whether the externals are
> > loaded from the same place and the order of them.
> > 
> > concretely the loop pre-header is:
> > 
> >  <bb 2> [local count: 10737416]:
> >  b$real_11 = REALPART_EXPR <b_15(D)>;
> >  b$imag_10 = IMAGPART_EXPR <b_15(D)>;
> >  _53 = -b$imag_10;
> > 
> > and the loop body:
> > 
> >  <bb 3> [local count: 1063004408]:
> >  ...
> > 
> >  _23 = REALPART_EXPR <*_5>;
> >  _24 = IMAGPART_EXPR <*_5>;
> >  _27 = _24 * _53;
> >  _28 = _23 * _53;
> > 
> > codegen before after:
> > 
> > {_24, _23} * { _53, _53 }
> > 
> > and after
> > 
> > { _24, _24 } * { _53, b$real_11 }
> > 
> > Before we were able to easily tell that the order for the multiply would be
> > IMAG, REAL.
> > In the after (GCC 15) case that information is there, but requires us to 
> > follow
> > the externals.
> > 
> > Richi what do you think about extending externals handling in 
> > linear_loads_p to
> > follow all external ops and if they load from the same memref to figure out 
> > the
> > "virtual lane permute"?
> 
> Externs do not have a permute as we build them from scalars.  So any permute
> can be trivially imposed on them - rather than TOP they should be BOTTOM. 
> Of course there’s also no advantage of imposing a permute on them.
> 

But the scalars can access memory that we can tell what they are. 

My point with the above was that it doesn't make sense to me that we know that
{a[0],a[1]} reads a linearly but that with 

a1 = a[0]
a2 = a[1]

{a1,a2} we say "sorry we know nothing about you". 

Yes they're externals but they have a defined order of use in the SLP tree.
This isn't about imposing a permute. I said virtual permute since linear_load_p
uses the lane permutes on loads to determine the memory access order.

We DO already impose any order on them, but the other operand is oddodd, so the
overall order ends up being oddodd because any known permute overrides unknown
ones.

So the question is, can we not follow externals in a constructor to figure out
if how they are used they all read from the same base and in which order?

[Bug tree-optimization/116463] [15 Regression] complex multiply vectorizer detection failures after r15-3087-gb07f8a301158e5

Reply via email to