https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #25 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> --- (In reply to Richard Biener from comment #23) > (In reply to Richard Biener from comment #20) > > (In reply to Jiu Fu Guo from comment #18) > > > Currently, I'm thinking to enhance GCC 'cunroll' as: > > > if the loop has multi-exits or upbound is not a fixed number, we may not > > > do > > > 'complete unroll' for the loop, except -funroll-all-loops is specified. > > > > That doens't make much sense (-funroll-all-loops is RTL unroller only). > > > > I think the growth limits are simply too large unless we compute a "win" > > which we in this case do not. So I'd say the growth limits should scale > > with win ^ (1/new param) thus if we estimate to eliminate 20% of the > > loop stmts due to unrolling then the limit to apply is > > limit * (0.2 ^ (1/X)) with X maybe defaulting to 2. > > > > I'd only apply this new limit for peeling (peeling is when the loop count > > is not constant and thus we keep the exit tests). > > > > Of course people want more peeling (hello POWER people!) > > Btw, the issue with the rs6000 code at present is that it uses > unroll_only_small_loops but that only affects the RTL unroller > while the enablement of -funroll-loops at -O2 affects GIMPLE > as well but unconstrained (with -O3 params). For the main > unroll pass (not cunrolli) this triggers code size growth: > > unsigned int val = tree_unroll_loops_completely (flag_unroll_loops > || flag_peel_loops > || optimize >= 3, true); > > the "original" patch also adjusted parameters. If the intent is to only > affect the RTL unroller then we need a separate flag controlling it > (yeah, using the same flags as heuristic trigger was probably bad). Yes, the patch controls RTL unroller for small loops, and also enabled cunroll (through flag_unroll_loops). This cause cunroll may increase size as you explained.