14 Regression] Loop unrolling increases code size with -Os

segher at gcc dot gnu.org via Gcc-bugs Wed, 15 Jan 2025 06:54:06 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825


--- Comment #24 from Segher Boessenkool <segher at gcc dot gnu.org> ---
> > Okay, two insns, there's an add insn as well.  But *not* unrolling this 
> > likely
> > makes bigger code already!
> 
> This is because
> 
>           /* If there is call on a hot path through the loop, then
>              there is most probably not much to optimize.  */ 
>           else if (size.num_non_pure_calls_on_hot_path) 
>             {
>               if (dump_file && (dump_flags & TDF_DETAILS))
>                 fprintf (dump_file, "Not unrolling loop %d: "
>                          "contains call and code would grow.\n",
>                          loop->num);
>               return false;
>           /* If there is pure/const call in the function, then we can
>              still optimize the unrolled loop body if it contains some
>              other interesting code than the calls and code storing or
>              cumulating the return value.  */
>           else if (size.num_pure_calls_on_hot_path
>                    /* One IV increment, one test, one ivtmp store and
>                       one useful stmt.  That is about minimal loop
>                       doing pure call.  */
>                    && (size.non_call_stmts_on_hot_path
>                        <= 3 + size.num_pure_calls_on_hot_path))
>             {
>               if (dump_file && (dump_flags & TDF_DETAILS))
>                 fprintf (dump_file, "Not unrolling loop %d: " 
>                          "contains just pure calls and code would 
> grow.\n",
>                          loop->num);
>               return false;
>             }

There are no calls at all here.  Or, do builtins count as calls?  That
makes no sense.  Most builtins end up as simple single insns.

> this _only_ gets pre-empted if the loop appears to shrink (from the
> removal of 1/3rd of stmts or other heuristics).  Not estimating the
> calls to go away now makes the loop not shrinking and hit the call
> exception (otherwise we allow some growth).

Completely unrolling this loop gets rid of all the loop setup etc. insns.
That is three insns setup, and one to do the actual looping, statically; and
the loop insn (each iteration) dynamically.  Completely unrolling by 4x always
is a win.

> Not sure if DARN is properly marked pure (aka not writing to memory),
> probably not, since otherwise it would not have been considered
> having side-effects.

It does not touch memory at all.  "pure" and "const" do not mean anything
for insns that do not touch memory.

> So - fix your builtins!

How, what, where?

> Not sure what the point about this
> testcase is - a sum of 4 random numbers should be replaceable
> with one random number & ~0x3.  But sure, darn() probably
> cannot be pure (otherwise we'd CSE multiple calls).

The point of the testcase is ensuring that for every instance of the builtin
a darn insn is generated, instead of doing only one and multiplying by four,
as before (it was an unspec instead of an unspecv).

> So - bad testcase.  Do you have another one to show?

Lots and lots and lots.  But if you cannot agree that what we did before was
good and what we do now is bad, i.e. this is a regression, for *this* testcase,
I don't know how to ever convince you of anything :-(


Segher

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

Reply via email to