https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119839
Bug ID: 119839 Summary: RISC-V gobmk performance regression with Node clones share order patch (bad LTO partitioning) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: anton at ozlabs dot org CC: mjires at gcc dot gnu.org, pinskia at gcc dot gnu.org Target Milestone: --- We are seeing a performance regression on RISC-V when building gobmk from cpu2006 with LTO. Victor Ying narrowed it down to this loop, where we continually load and store change_stack_pointer (a static variable): 15b42: ff87a783 lw a5,-8(a5) 15b46: c31c sw a5,0(a4) 15b48: 3581b783 ld a5,856(gp) # 3868b0 <change_stack_pointer.lto_priv.0> 15b4c: ff078713 addi a4,a5,-16 15b50: 34e1bc23 sd a4,856(gp) # 3868b0 <change_stack_pointer.lto_priv.0> 15b54: ff07b703 ld a4,-16(a5) 15b58: f76d bnez a4,15b42 <popgo+0x2e> With https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0895aef01c64c317b489811dbe4ac55f9c13aab3 reverted we see the expected behaviour, change_stack_pointer is moved out of the loop: 17f90: ff078693 addi a3,a5,-16 17f94: ff07b703 ld a4,-16(a5) 17f98: 3ad1b423 sd a3,936(gp) # 381900 <change_stack_pointer> 17f9c: 1781 addi a5,a5,-32 17f9e: cb09 beqz a4,17fb0 <popgo+0x4c> 17fa0: 4f94 lw a3,24(a5) 17fa2: 863e mv a2,a5 17fa4: 17c1 addi a5,a5,-16 17fa6: c314 sw a3,0(a4) 17fa8: 6b98 ld a4,16(a5) 17faa: fb7d bnez a4,17fa0 <popgo+0x3c> This looks to be an issue with LTO partitioning, because change_stack_pointer was promoted to change_stack_pointer.lto_priv.0. This issue goes away if we use -flto-partition=one, which seems to confirm this. I'm not sure if this is just bad luck, but the patch is definitely changes how LTO partitions things.