https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104935

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
So in 554.roms_r I see cases (like for mod_grid.F90:allocate_grid) where
we now vectorize more V4DI stores from a CTOR of scalars which reduces code
size so jump-threading now goes wild (from DOM threading), threading across
the long repetition of

  if (div == 0)
    ;
  else
    ... = ... / div;


  <vectorized blob>
  if (div == 0)
    ;
  else
    ... = ... / div;

where the vectorized blob is now smaller than the threading threshold.

For extract_sta.F90 we now vectorize two more loops with low VF (high VF
is not profitable) but using only strided loads (they are reductions)
which has extra size cost on the scalar epilogues plus we are vectorizing
conditional reductions here.  It doesn't look overly bad here.

There's also a TU with a size win btw, but overall we vectorize more.

Reply via email to