https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119839
Bug ID: 119839
Summary: RISC-V gobmk performance regression with Node clones
share order patch (bad LTO partitioning)
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: lto
Assignee: unassigned at gcc dot gnu.org
Reporter: anton at ozlabs dot org
CC: mjires at gcc dot gnu.org, pinskia at gcc dot gnu.org
Target Milestone: ---
We are seeing a performance regression on RISC-V when building gobmk from
cpu2006 with LTO. Victor Ying narrowed it down to this loop, where we
continually load and store change_stack_pointer (a static variable):
15b42: ff87a783 lw a5,-8(a5)
15b46: c31c sw a5,0(a4)
15b48: 3581b783 ld a5,856(gp) # 3868b0
<change_stack_pointer.lto_priv.0>
15b4c: ff078713 addi a4,a5,-16
15b50: 34e1bc23 sd a4,856(gp) # 3868b0
<change_stack_pointer.lto_priv.0>
15b54: ff07b703 ld a4,-16(a5)
15b58: f76d bnez a4,15b42 <popgo+0x2e>
With
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0895aef01c64c317b489811dbe4ac55f9c13aab3
reverted we see the expected behaviour, change_stack_pointer is moved out of
the loop:
17f90: ff078693 addi a3,a5,-16
17f94: ff07b703 ld a4,-16(a5)
17f98: 3ad1b423 sd a3,936(gp) # 381900
<change_stack_pointer>
17f9c: 1781 addi a5,a5,-32
17f9e: cb09 beqz a4,17fb0 <popgo+0x4c>
17fa0: 4f94 lw a3,24(a5)
17fa2: 863e mv a2,a5
17fa4: 17c1 addi a5,a5,-16
17fa6: c314 sw a3,0(a4)
17fa8: 6b98 ld a4,16(a5)
17faa: fb7d bnez a4,17fa0 <popgo+0x3c>
This looks to be an issue with LTO partitioning, because change_stack_pointer
was promoted to change_stack_pointer.lto_priv.0. This issue goes away if we use
-flto-partition=one, which seems to confirm this.
I'm not sure if this is just bad luck, but the patch is definitely changes how
LTO partitions things.