https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383
Kewen Lin <linkw at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |linkw at gcc dot gnu.org --- Comment #4 from Kewen Lin <linkw at gcc dot gnu.org> --- (In reply to Richard Biener from comment #2) > The issue is that we tame PRE because it tends to inhibit vectorization. > > /* Inhibit the use of an inserted PHI on a loop header when > the address of the memory reference is a simple induction > variable. In other cases the vectorizer won't do anything > anyway (either it's loop invariant or a complicated > expression). */ > if (sprime > && TREE_CODE (sprime) == SSA_NAME > && do_pre > && (flag_tree_loop_vectorize || flag_tree_parallelize_loops > 1) > && loop_outer (b->loop_father) > && has_zero_uses (sprime) > && bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (sprime)) > && gimple_assign_load_p (stmt)) > > the heuristic would either need to become much more elaborate (do more > checks whether vectorization is likely) or we could make the behavior > depend on the cost model as well, for example exclude very-cheap here. > That might have an influence on the performance benefit seen from > -O2 default vectorization though. > > IIRC we suggested to enable predictive commoning at -O2 but avoid > unroll factors > 1 when it was not explicitely enabled. > Yeah, it's PR100794. I also collected some data for different approaches at that time. Recently I opened another issue PR102054 which is also related to that we restrict PRE due to loop-vect.