https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95760
Jim Wilson <wilson at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson <wilson at gcc dot gnu.org> --- You are compiling with -Os. I get the expected result if I compile with -O2. Looking at tree dumps, I see the first difference between -O2 and -Os dumps is in the ch2 (copy loop header 2) pass, which explicitly disables loop header copying when -Os is used. Note the optimize_loop_for_size_p check in should_duplicate_loop_header_p in tree-ssa-loop-ch.c. You can see the difference if you add -ftree-dump-ch2-all. In the -O2 ch2 dump file, I see Loop 1 is not do-while loop: latch is not empty. Will duplicate bb 7 Not duplicating bb 3: it is single succ. Duplicating header of the loop 1 up to edge 7->3, 4 insns. Loop 1 is do-while loop Loop 1 is now do-while loop. and in the -Os ch2 dump file, I see Loop 1 is not do-while loop: latch is not empty. Not duplicating bb 7: optimizing for size. The difference in loop optimization here then affects the later ivopt pass. Normally, duplicating basic blocks will make code bigger. But in this case the duplicated blocks enable better loop optimization which results in smaller code at the end. This kind of thing is hard to handle with the heuristics. We would have to optimize both ways and check to see which one is smaller at the end to get this right every time, and the compiler doesn't work that way currently. I haven't checked older sources to see if/when a heuristic changed. This isn't risc-v specific. I see the same issue with x86_64.