[Bug tree-optimization/112508] [14 Regression] Size regression when using -Os starting with r14-4089-gd45ddc2c04e

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 16 Feb 2024 00:11:18 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112508


--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Loop store-motion is a difficult thing to cost - it's a critical enabler for
many of our loop optimizations, including scalar evolution analysis.

Now, this might not hold true so much for the cases where we end up
using an extra flag to avoid store data races and this example also shows
we're doing a bad job in trying to unify flags for variables stored in the
same blocks (we don't try to do this at all ...).

Value-numbering has difficulties getting from zero flags to "same flags",
it only manages to elide one flag (but maybe that's all we can do - I
didn't exactly analyze).

Conditionally set (conditionally within a loop, not so much conditionally
executed subloops) vars at least less likely will help SCEV, so cost
modeling (aka estimating register pressure in a simplistic way, like
counting the number of IVs) of store-motion of those might be a way to
combat this.

Or, for example, disable conditional store-motion for -Os entirely.

For targets where -Os matters likely -fallow-store-data-races would be
a way to rescue.  With that I get on x86_64

main1:
.LFB1:
        .cfi_startproc
        movb    h(%rip), %sil
        movl    d(%rip), %edx
        movl    g(%rip), %edi
        movl    e(%rip), %ecx
        movl    f(%rip), %eax
.L2:
        testb   %sil, %sil
        je      .L5
        movl    %eax, %ecx
.L6:
        movl    %ecx, %eax
        cmpl    $9, %ecx
        jg      .L9
        testl   %edx, %edx
        je      .L3
        xorl    %edi, %edi
.L3:
        incl    %ecx
        jmp     .L6
.L9:
        decl    %esi
        xorl    %ecx, %ecx
        xorl    %edx, %edx
        jmp     .L2
.L5:
        movb    $0, h(%rip)
        movl    %eax, f(%rip)
        movl    %ecx, e(%rip)
        movl    %edi, g(%rip)
        movl    %edx, d(%rip)
        ret

Actionable items:

 a) disable flag store motion for cold loops (or stores only happening in
    cold parts of the loop)
 b) optimize flag variable allocation (try to use the same flag for multiple
    vars)
 c) some kind of register pressure estimation, possibly only for non-innermost
    loops

[Bug tree-optimization/112508] [14 Regression] Size regression when using -Os starting with r14-4089-gd45ddc2c04e

Reply via email to