https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #20) > (In reply to Jiu Fu Guo from comment #18) > > Currently, I'm thinking to enhance GCC 'cunroll' as: > > if the loop has multi-exits or upbound is not a fixed number, we may not do > > 'complete unroll' for the loop, except -funroll-all-loops is specified. > > That doens't make much sense (-funroll-all-loops is RTL unroller only). > > I think the growth limits are simply too large unless we compute a "win" > which we in this case do not. So I'd say the growth limits should scale > with win ^ (1/new param) thus if we estimate to eliminate 20% of the > loop stmts due to unrolling then the limit to apply is > limit * (0.2 ^ (1/X)) with X maybe defaulting to 2. > > I'd only apply this new limit for peeling (peeling is when the loop count > is not constant and thus we keep the exit tests). > > Of course people want more peeling (hello POWER people!) Btw, the issue with the rs6000 code at present is that it uses unroll_only_small_loops but that only affects the RTL unroller while the enablement of -funroll-loops at -O2 affects GIMPLE as well but unconstrained (with -O3 params). For the main unroll pass (not cunrolli) this triggers code size growth: unsigned int val = tree_unroll_loops_completely (flag_unroll_loops || flag_peel_loops || optimize >= 3, true); the "original" patch also adjusted parameters. If the intent is to only affect the RTL unroller then we need a separate flag controlling it (yeah, using the same flags as heuristic trigger was probably bad).