https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 43289
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43289&action=edit
patch limiting growth

So I played with a simple hack limiting the amount of growth in a vectorized
loop
based on our exisiting unroller parameters.  This causes AVX256 vectorization
to
be rejected on that round with an estimated scalar loop body size of 43 and
a vector loop body size of 289 (with SLP and AVX256), 259 (without SLP and
AVX256) .  But we then still accept vectorization with SSE (vector size with
SLP 147).

I simply count the number of stmts costed for the estimate and added

  if (vect_body_size / scalar_size > PARAM_VALUE (PARAM_MAX_UNROLL_TIMES)
      || ((vect_body_size / scalar_size > 1)
          && (vect_body_size > PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS))))
    {
      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                       "not vectorized: loop grows too much.\n");
      return -1;
    }

where PARAM_MAX_UNROLLED_INSNS is 200 at the moment.  With SSE vectorization
the LSD.UOPS counter still doesn't trigger on the loop so it's still too big
(it's 958 bytes, the AVX2 variant is 1365 bytes, the scalar loop is already 363
bytes).

So to fix the regression we'd need to lower the unroll parameter to 146.

Reply via email to