https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112508
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Loop store-motion is a difficult thing to cost - it's a critical enabler for
many of our loop optimizations, including scalar evolution analysis.
Now, this might not hold true so much for the cases where we end up
using an extra flag to avoid store data races and this example also shows
we're doing a bad job in trying to unify flags for variables stored in the
same blocks (we don't try to do this at all ...).
Value-numbering has difficulties getting from zero flags to "same flags",
it only manages to elide one flag (but maybe that's all we can do - I
didn't exactly analyze).
Conditionally set (conditionally within a loop, not so much conditionally
executed subloops) vars at least less likely will help SCEV, so cost
modeling (aka estimating register pressure in a simplistic way, like
counting the number of IVs) of store-motion of those might be a way to
combat this.
Or, for example, disable conditional store-motion for -Os entirely.
For targets where -Os matters likely -fallow-store-data-races would be
a way to rescue. With that I get on x86_64
main1:
.LFB1:
.cfi_startproc
movb h(%rip), %sil
movl d(%rip), %edx
movl g(%rip), %edi
movl e(%rip), %ecx
movl f(%rip), %eax
.L2:
testb %sil, %sil
je .L5
movl %eax, %ecx
.L6:
movl %ecx, %eax
cmpl $9, %ecx
jg .L9
testl %edx, %edx
je .L3
xorl %edi, %edi
.L3:
incl %ecx
jmp .L6
.L9:
decl %esi
xorl %ecx, %ecx
xorl %edx, %edx
jmp .L2
.L5:
movb $0, h(%rip)
movl %eax, f(%rip)
movl %ecx, e(%rip)
movl %edi, g(%rip)
movl %edx, d(%rip)
ret
Actionable items:
a) disable flag store motion for cold loops (or stores only happening in
cold parts of the loop)
b) optimize flag variable allocation (try to use the same flag for multiple
vars)
c) some kind of register pressure estimation, possibly only for non-innermost
loops