[Bug tree-optimization/114324] [13/14/15 Regression] AVX2 vectorisation performance regression with gfortran 13/14

rguenth at gcc dot gnu.org via Gcc-bugs Wed, 29 Jan 2025 23:56:12 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324


--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
With -Ofast there's still grouped stores we fail two-lane SLP discovery for,
possibly due to re-association.  In particular for example

  _54 = REALPART_EXPR <(*a_19(D))[_27]>;
  _55 = IMAGPART_EXPR <(*a_19(D))[_27]>;

  _56 = REALPART_EXPR <(*a_19(D))[_30]>;
  _57 = IMAGPART_EXPR <(*a_19(D))[_30]>;

  _60 = _54 - _56;
  _8 = _57 - _55;

cannot be discovered, we fail to consider swapping operands of a minus by
negating it in the parent (of its parent)

  b2$real_63 = _8 * ci$imag_10;
  b2$imag_64 = ci$imag_10 * _60;

  _71 = a2$real_52 - b2$real_63;
  _72 = a2$imag_53 - b2$imag_64;

the parent (of its parent) would then turn into a plusminus operation.

The less efficient way to swap operands in the {_60, _8} compute would
be to insert a conditional negate - either via negate + merge or via
multiplication by { 1, -1 }, for FP a lane-speific negate can be
implemented as XOR.  This might be more efficient for targets without
a native plusminus op.

Even simply not giving up when running into two different DR groups
would be possible - we can insert an interleaving operation.


That said, I think we have accumulated some duplicates around SLP discovery
issues for _Complex ops after re-association.

[Bug tree-optimization/114324] [13/14/15 Regression] AVX2 vectorisation performance regression with gfortran 13/14

Reply via email to