[Bug tree-optimization/116463] [15 Regression] complex multiply vectorizer detection failures after r15-3087-gb07f8a301158e5

tnfchris at gcc dot gnu.org via Gcc-bugs Sat, 23 Nov 2024 04:21:03 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463


--- Comment #22 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Ok, so the problem with the ones on trunk isn't necessarily the
canonicalization itself but that our externals handling is a bit shallow.

On externals we determine that we have no information on the DF and return TOP.
This is because DR analysis doesn't try to handle externals since they're not
part of the loop.

However all we need to know for complex numbers is whether the externals are
loaded from the same place and the order of them.

concretely the loop pre-header is:

  <bb 2> [local count: 10737416]:
  b$real_11 = REALPART_EXPR <b_15(D)>;
  b$imag_10 = IMAGPART_EXPR <b_15(D)>;
  _53 = -b$imag_10;

and the loop body:

  <bb 3> [local count: 1063004408]:
  ...

  _23 = REALPART_EXPR <*_5>;
  _24 = IMAGPART_EXPR <*_5>;
  _27 = _24 * _53;
  _28 = _23 * _53;

codegen before after:

{_24, _23} * { _53, _53 }

and after

{ _24, _24 } * { _53, b$real_11 }

Before we were able to easily tell that the order for the multiply would be
IMAG, REAL.
In the after (GCC 15) case that information is there, but requires us to follow
the externals.

Richi what do you think about extending externals handling in linear_loads_p to
follow all external ops and if they load from the same memref to figure out the
"virtual lane permute"?

We can store the info in a new externals cache (to avoid re-walking externals
we already walked, as perm_cache stores slp nodes)  and the permute for the
node in the perm_cache as we do for any cached lookups today?

This would also fix the other tests Andrew added in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463#c4

[Bug tree-optimization/116463] [15 Regression] complex multiply vectorizer detection failures after r15-3087-gb07f8a301158e5

Reply via email to