https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- With -Ofast there's still grouped stores we fail two-lane SLP discovery for, possibly due to re-association. In particular for example _54 = REALPART_EXPR <(*a_19(D))[_27]>; _55 = IMAGPART_EXPR <(*a_19(D))[_27]>; _56 = REALPART_EXPR <(*a_19(D))[_30]>; _57 = IMAGPART_EXPR <(*a_19(D))[_30]>; _60 = _54 - _56; _8 = _57 - _55; cannot be discovered, we fail to consider swapping operands of a minus by negating it in the parent (of its parent) b2$real_63 = _8 * ci$imag_10; b2$imag_64 = ci$imag_10 * _60; _71 = a2$real_52 - b2$real_63; _72 = a2$imag_53 - b2$imag_64; the parent (of its parent) would then turn into a plusminus operation. The less efficient way to swap operands in the {_60, _8} compute would be to insert a conditional negate - either via negate + merge or via multiplication by { 1, -1 }, for FP a lane-speific negate can be implemented as XOR. This might be more efficient for targets without a native plusminus op. Even simply not giving up when running into two different DR groups would be possible - we can insert an interleaving operation. That said, I think we have accumulated some duplicates around SLP discovery issues for _Complex ops after re-association.