https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95760

Jim Wilson <wilson at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson <wilson at gcc dot gnu.org> ---
You are compiling with -Os.  I get the expected result if I compile with -O2.

Looking at tree dumps, I see the first difference between -O2 and -Os dumps is
in the ch2 (copy loop header 2) pass, which explicitly disables loop header
copying when -Os is used.  Note the optimize_loop_for_size_p check in
should_duplicate_loop_header_p in tree-ssa-loop-ch.c.  You can see the
difference if you add -ftree-dump-ch2-all.  In the -O2 ch2 dump file, I see

Loop 1 is not do-while loop: latch is not empty.
    Will duplicate bb 7
  Not duplicating bb 3: it is single succ.
Duplicating header of the loop 1 up to edge 7->3, 4 insns.
Loop 1 is do-while loop
Loop 1 is now do-while loop.

and in the -Os ch2 dump file, I see

Loop 1 is not do-while loop: latch is not empty.
  Not duplicating bb 7: optimizing for size.

The difference in loop optimization here then affects the later ivopt pass. 
Normally, duplicating basic blocks will make code bigger.  But in this case the
duplicated blocks enable better loop optimization which results in smaller code
at the end.  This kind of thing is hard to handle with the heuristics.  We
would have to optimize both ways and check to see which one is smaller at the
end to get this right every time, and the compiler doesn't work that way
currently.

I haven't checked older sources to see if/when a heuristic changed.

This isn't risc-v specific.  I see the same issue with x86_64.

Reply via email to