https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112508
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Loop store-motion is a difficult thing to cost - it's a critical enabler for many of our loop optimizations, including scalar evolution analysis. Now, this might not hold true so much for the cases where we end up using an extra flag to avoid store data races and this example also shows we're doing a bad job in trying to unify flags for variables stored in the same blocks (we don't try to do this at all ...). Value-numbering has difficulties getting from zero flags to "same flags", it only manages to elide one flag (but maybe that's all we can do - I didn't exactly analyze). Conditionally set (conditionally within a loop, not so much conditionally executed subloops) vars at least less likely will help SCEV, so cost modeling (aka estimating register pressure in a simplistic way, like counting the number of IVs) of store-motion of those might be a way to combat this. Or, for example, disable conditional store-motion for -Os entirely. For targets where -Os matters likely -fallow-store-data-races would be a way to rescue. With that I get on x86_64 main1: .LFB1: .cfi_startproc movb h(%rip), %sil movl d(%rip), %edx movl g(%rip), %edi movl e(%rip), %ecx movl f(%rip), %eax .L2: testb %sil, %sil je .L5 movl %eax, %ecx .L6: movl %ecx, %eax cmpl $9, %ecx jg .L9 testl %edx, %edx je .L3 xorl %edi, %edi .L3: incl %ecx jmp .L6 .L9: decl %esi xorl %ecx, %ecx xorl %edx, %edx jmp .L2 .L5: movb $0, h(%rip) movl %eax, f(%rip) movl %ecx, e(%rip) movl %edi, g(%rip) movl %edx, d(%rip) ret Actionable items: a) disable flag store motion for cold loops (or stores only happening in cold parts of the loop) b) optimize flag variable allocation (try to use the same flag for multiple vars) c) some kind of register pressure estimation, possibly only for non-innermost loops