https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84512
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Version|tree-ssa |8.0 Keywords| |missed-optimization Last reconfirmed| |2018-02-27 Blocks| |53947 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Summary|Missed optimization: should |[8 Regression] Missed |be precalculated in |optimization: should be |compile-time |precalculated in | |compile-time Target Milestone|--- |8.0 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. This is another case where we vectorize one loop but not the other and DOM doesn't handle removing vector loads against scalar stores. Later store-merging aggregates the stores but nothig performs CSE after it. The vectorizer decides that vectorizing the reduction is profitable while vectorizing the init is not: t.c:4:3: note: Cost model analysis: Vector inside of loop cost: 68 Vector prologue cost: 8 Vector epilogue cost: 128 Scalar iteration cost: 16 Scalar outside cost: 0 Vector outside cost: 136 prologue iterations: 0 epilogue iterations: 2 t.c:4:3: note: cost model: the vector iteration cost = 68 divided by the scalar iteration cost = 16 is greater or equal to the vectorization factor = 4. t.c:4:3: note: not vectorized: vectorization not profitable. With -fno-vect-cost-model we vectorize both loops and optimize the function like clang does. The issue with the cost model here is that for the scalar iteration cost we end up using builtin_vectorization_cost () while for the vector cost we use add_stmt_cost. Only the latter makes a difference between the different kind of operations. I have a patch. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations