https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104935
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- So in 554.roms_r I see cases (like for mod_grid.F90:allocate_grid) where we now vectorize more V4DI stores from a CTOR of scalars which reduces code size so jump-threading now goes wild (from DOM threading), threading across the long repetition of if (div == 0) ; else ... = ... / div; <vectorized blob> if (div == 0) ; else ... = ... / div; where the vectorized blob is now smaller than the threading threshold. For extract_sta.F90 we now vectorize two more loops with low VF (high VF is not profitable) but using only strided loads (they are reductions) which has extra size cost on the scalar epilogues plus we are vectorizing conditional reductions here. It doesn't look overly bad here. There's also a TU with a size win btw, but overall we vectorize more.