https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119209
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW CC| |rguenth at gcc dot gnu.org Last reconfirmed| |2025-03-11 Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- The issue is that the lane-combining pattern recognitions are restricted to loop reductions because lane-order isn't preserved (or even well-defined). The decision to recognize the SLP as BB reduction comes after this. The fix is probably to apply the reduction restriction only during SLP build and vectorizable_* checking. Nailing down which lanes are combined for V16QI->V4SI for the optab would also allow to use dot_prod in non-reduction cases (when the V4SI intermediate result isn't reduced to a single lane in the end). There's a related PR about this, but IIRC for the SAD patterns.