On Mon, Nov 19, 2012 at 9:34 PM, Steven Bosscher wrote: > On Mon, Nov 19, 2012 at 12:24 PM, Eric Botcazou wrote: >>> Yes, I'll be looking into this soon. >> >> We have a related regression on SPARC: >> >> FAIL: gfortran.dg/minmaxloc_5.f90 -O3 -fomit-frame-pointer -funroll-loops >> execution test >> FAIL: gfortran.dg/minmaxloc_5.f90 -O3 -fomit-frame-pointer >> -funroll-all-loops >> -finline-functions execution test >> FAIL: gfortran.dg/minmaxloc_6.f90 -O3 -fomit-frame-pointer -funroll-loops >> execution test >> FAIL: gfortran.dg/minmaxloc_6.f90 -O3 -fomit-frame-pointer >> -funroll-all-loops >> -finline-functions execution test >> >> whose failure mode is exactly the same as for Honza's testcase. I even have >> a >> more reduced testcase (6 lines, attached) in case you'd prefer working on it. >> >> Reproducible with a cross to sparc-linux with -mcpu=v8 -O3 -funroll-loops and >> grepping for mem:SF (const_int 0 [0]) in the RTL dumps. > > Thanks for this reduced test case, that's saving me a lot of work! > > Can you please try and see if the following C test case also fails? > > It looks like the memcpy is expanded to a loop that is unrolled, and > then mis-optimized. I haven't found out yet why...
It's another case of an invalid REG_EQUAL note. Shown below is an excerpt from the C test case .loop2_done dump (after calling df_analyze to update liveness). The note on insn 172 uses (reg:SI 132) which is not live on entry to basic block 25. ;; basic block 25, loop depth 0, count 0, freq 2485, maybe hot ;; prev block 20, next block 26, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 11 [50.5%] ;; bb 25 artificial_defs: { } ;; bb 25 artificial_uses: { u-1(14){ }u-1(30){ }u-1(101){ }} ;; lr in 14 [%sp] 30 [%fp] 31 [%i7] 101 [%sfp] 138 158 165 ;; lr use 14 [%sp] 30 [%fp] 101 [%sfp] 138 158 ;; lr def 131 133 ;; live in 14 [%sp] 101 [%sfp] 138 158 165 ;; live gen 131 133 ;; live kill (code_label 179 149 173 25 22 "" [1 uses]) (note 173 179 171 25 [bb 25] NOTE_INSN_BASIC_BLOCK) (insn 171 173 172 25 (set (reg/v:SI 133 [ idsD.1386 ]) (reg/v:SI 138 [ idsD.1386 ])) 40 {*movsi_insn} (expr_list:REG_UNUSED (reg/v:SI 133 [ idsD.1386 ]) (nil))) (insn 172 171 176 25 (set (reg/v:SF 131 [ limit3D.1388 ]) (reg:SF 158 [ limit3D.1388 ])) t.c:62 -1 (expr_list:REG_EQUAL (mem:SF (reg:SI 132 [ ivtmp.7D.1419 ]) [2 MEM[base: _23, offset: 0B]+0 S4 A32]) (expr_list:REG_DEAD (reg:SF 158 [ limit3D.1388 ]) (nil)))) ;; lr out 14 [%sp] 30 [%fp] 31 [%i7] 101 [%sfp] 131 133 138 165 ;; live out 14 [%sp] 101 [%sfp] 131 133 138 165 The use of the dead reg is renamed in the webizer: (insn 172 171 176 25 (set (reg/v:SF 131 [ limit3D.1388 ]) (reg:SF 172 [ limit3D.1388 ])) t.c:62 66 {*movsf_insn} (expr_list:REG_EQUAL (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23, offset: 0B]+0 S4 A32]) (expr_list:REG_DEAD (reg:SF 158 [ limit3D.1388 ]) (nil)))) Next, cse2 thinks it's a good idea to actually use the REG_EQUAL note, turning the EQ-use of the uninitialized reg 174 into a real use: (insn 172 171 176 17 (set (reg/v:SF 131 [ limit3D.1388 ]) (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23, offset: 0B]+0 S4 A32])) t.c:62 66 {*movsf_insn} (expr_list:REG_EQUAL (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23, offset: 0B]+0 S4 A32]) (expr_list:REG_DEAD (reg:SF 158 [ limit3D.1388 ]) (nil)))) Then init-regs concludes that reg 174 is used ininitialized: adding initialization in ga4076 of reg 174 at in block 16 for insn 172. (insn 206 171 172 16 (set (reg:SI 174 [ ivtmp.7D.1419 ]) (const_int 0 [0])) t.c:62 -1 (nil)) (insn 172 206 176 16 (set (reg/v:SF 131 [ limit3D.1388 ]) (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23, offset: 0B]+0 S4 A32])) t.c:62 66 {*movsf_insn} (expr_list:REG_DEAD (reg:SI 174 [ ivtmp.7D.1419 ]) (expr_list:REG_EQUAL (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23, offset: 0B]+0 S4 A32]) (nil)))) And finally, combine merges the above to yield the final result: Trying 206 -> 172: Successfully matched this instruction: (set (reg/v:SF 131 [ limit3D.1388 ]) (mem:SF (const_int 0 [0]) [2 MEM[base: _23, offset: 0B]+0 S4 A32])) The root cause is the bad REG_EQUAL note. I think the most robust solution is to make the webizer re-compute notes before renaming. Patch for that is attached. Ciao! Steven PR rtl-optimization/55006 * web.c (web_main): Add the DF_NOTE problem, and explain whatfor. Index: web.c =================================================================== --- web.c (revision 193454) +++ web.c (working copy) @@ -335,9 +335,16 @@ web_main (void) unsigned int uses_num = 0; rtx insn; + /* Add the flags and problems we need. The DF_EQ_NOTES flag is set so + that uses of registers in REG_EQUAL and REG_EQUIV notes are included + in the web that their DEF belongs to, so that these uses are also + properly renamed. The DF_NOTE problem is added to make sure that + all notes are up-to-date and valid: Re-computing the notes problem + also cleans up all dead REG_EQUAL notes. */ df_set_flags (DF_NO_HARD_REGS + DF_EQ_NOTES); df_set_flags (DF_RD_PRUNE_DEAD_DEFS); df_chain_add_problem (DF_UD_CHAIN); + df_note_add_problem (); df_analyze (); df_set_flags (DF_DEFER_INSN_RESCAN);