https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
--- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> --- (In reply to Richard Biener from comment #1) Thanks for the comments! > There's predictive commoning which can do similar transforms and runs after > vectorization. It might be it doesn't handle these "simple" cases or that > loop dependence info is not up to the task there. > pcom does fix this problem, but it's enabled by default at -O3. Could it be considered to be run at O2? Or enabled at O2 at some conditions such as: only for one loop which skips loop carried optimization and isn't vectorized further? > Another option is to avoid the PRE guard with the (very) cheap cost model > at the expense of not vectorizing affected loops. > OK, I will benchmark this to see its impact. For the particular loops in 554.roms_r, they can be vectorized at cheap cost model, this bmk got improved at cheap cost model on both Power8 and Power9 (a bit though). So I will just test the impact on very cheap cost model.