https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> --- Created attachment 43289 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43289&action=edit patch limiting growth So I played with a simple hack limiting the amount of growth in a vectorized loop based on our exisiting unroller parameters. This causes AVX256 vectorization to be rejected on that round with an estimated scalar loop body size of 43 and a vector loop body size of 289 (with SLP and AVX256), 259 (without SLP and AVX256) . But we then still accept vectorization with SSE (vector size with SLP 147). I simply count the number of stmts costed for the estimate and added if (vect_body_size / scalar_size > PARAM_VALUE (PARAM_MAX_UNROLL_TIMES) || ((vect_body_size / scalar_size > 1) && (vect_body_size > PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS)))) { dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "not vectorized: loop grows too much.\n"); return -1; } where PARAM_MAX_UNROLLED_INSNS is 200 at the moment. With SSE vectorization the LSD.UOPS counter still doesn't trigger on the loop so it's still too big (it's 958 bytes, the AVX2 variant is 1365 bytes, the scalar loop is already 363 bytes). So to fix the regression we'd need to lower the unroll parameter to 146.