https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 CC| |rguenth at gcc dot gnu.org Status|UNCONFIRMED |NEW Last reconfirmed| |2021-09-17 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- The issue is that we tame PRE because it tends to inhibit vectorization. /* Inhibit the use of an inserted PHI on a loop header when the address of the memory reference is a simple induction variable. In other cases the vectorizer won't do anything anyway (either it's loop invariant or a complicated expression). */ if (sprime && TREE_CODE (sprime) == SSA_NAME && do_pre && (flag_tree_loop_vectorize || flag_tree_parallelize_loops > 1) && loop_outer (b->loop_father) && has_zero_uses (sprime) && bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (sprime)) && gimple_assign_load_p (stmt)) the heuristic would either need to become much more elaborate (do more checks whether vectorization is likely) or we could make the behavior depend on the cost model as well, for example exclude very-cheap here. That might have an influence on the performance benefit seen from -O2 default vectorization though. IIRC we suggested to enable predictive commoning at -O2 but avoid unroll factors > 1 when it was not explicitely enabled. Note that the issue for this testcase is that w/o PRE the predcom behaves differently (but the testcase comment suggests that we'd have to undo PRE).