https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71992
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed| |2016-07-25 Version|tree-ssa |7.0 Blocks| |53947 Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. I think doing it as [a, b, b, b] * [a, b, 3., 3.] + [3., c, a, a] would be "optimal" (not factoring in vector construction cost of course). The issue is how SLP construction works and the number of swaps / builds from scalars do. One issue is that we even try with a group-size of 5. Fixing that doesn't fix it though as we do not consider building a vector from scalars until we tried to swap the parent op (and if that fails we don't go back building children from scalars). Only trying with a group size of 4 would also regress the case where we'd have split after the first element. That said, the whole SLP discovery needs a different algorithmic approach to fix cases like this. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations