On Tue, 2 Apr 2019, Richard Sandiford wrote: > Richard Sandiford <richard.sandif...@arm.com> writes: > > Richard Biener <rguent...@suse.de> writes: > >> After the fix to RPO VN in loop header copying DF via RTL invariant > >> motion takes 50% of the compile-time for the second testcase in > >> PR46590. This is caused by the DF live problem iterating over > >> all dirty blocks for the local problem which is all blocks of > >> the function because while loop-invariant uses df_analyze_loop > >> it always marks all blocks of the function as dirty via > >> df_live_set_all_dirty. One of the possible fixes is to add > >> a df_live_set_loop_dirty () function which causes the problem > >> to only iterate over one loop blocks. > >> > >> It turns out the LIVE problem is always present so we cannot > >> remove it which in turn means we have to mark blocks possibly > >> not visited by invaraint motion as dirty (thus the new > >> df_live_set_all_dirty at the end of move_loop_invariants). > >> > >> Anyhow - I'm not really into how DF should ideally work here > > > > Me neither, but as discussed on IRC, here's alternative that seems > > to give a similar speed-up. > > > > - df_live is already present at -O2, so we only need to add it and > > mark all blocks dirty for -O > > > > - df_process_deferred_rescans should be enough to force a rescan of > > blocks affected by moving invariants, but calling it in find_defs > > means that we don't do any rescans for the final loop > > > > Still testing, but posting for comments. > > > > Thanks, > > Richard > > > > > > 2019-04-01 Richard Sandiford <richard.sandif...@arm.com> > > > > gcc/ > > PR rtl-optimization/46950 > > * loop-invariant.c (find_defs): Move df_remove_problem and > > df_process_deferred_rescans to move_invariants. > > Move df_live_add_problem and df_live_set_all_dirty calls > > to move_invariants. > > (move_invariants): Likewise. > > (move_loop_invariants): Likewise, making the df_live calls > > conditional on -O. Remove the problem again if we added it > > locally. > > > > Index: gcc/loop-invariant.c > > =================================================================== > > --- gcc/loop-invariant.c 2019-03-08 18:15:33.704751739 +0000 > > +++ gcc/loop-invariant.c 2019-04-01 14:09:13.553206345 +0000 > > @@ -681,11 +681,7 @@ find_defs (struct loop *loop) > > loop->num); > > } > > > > - df_remove_problem (df_chain); > > - df_process_deferred_rescans (); > > df_chain_add_problem (DF_UD_CHAIN); > > - df_live_add_problem (); > > - df_live_set_all_dirty (); > > df_set_flags (DF_RD_PRUNE_DEAD_DEFS); > > df_analyze_loop (loop); > > check_invariant_table_size (); > > @@ -1891,6 +1887,10 @@ move_invariants (struct loop *loop) > > GENERAL_REGS, NO_REGS, GENERAL_REGS); > > } > > } > > + /* Remove the DF_UD_CHAIN problem added in find_defs before rescanning, > > + to save a bit of compile time. */ > > + df_remove_problem (df_chain); > > + df_process_deferred_rescans (); > > } > > > > /* Initializes invariant motion data. */ > > @@ -2254,6 +2254,11 @@ move_loop_invariants (void) > > { > > struct loop *loop; > > > > + if (optimize == 1) > > + { > > + df_live_add_problem (); > > + df_live_set_all_dirty (); > > + } > > Gah, this is a mess. The patch as posted regresses gcc.dg/pr65779.c, > which is an -O2 -fcompare-debug test. This is because the DF_INSN_LUIDs > are out-of-date on entry to the pass, and without the df_live_set_all_dirty, > they stay out-of-date after a full df_analyze, even though the comment > above luid suggests otherwise: > > /* The logical uid of the insn in the basic block. This is valid > after any call to df_analyze but may rot after insns are added, > deleted or moved. */ > int luid; > > This affects this condition in can_move_invariant_reg: > > /* Don't move if a use is not dominated by def in insn. */ > if (use_bb == bb && DF_INSN_LUID (insn) >= DF_INSN_LUID (use_insn)) > return false; > > After hacking up some extra testing, it seems that it's often the > case that DF_INSN_LUID is out-of-date even after df_analyze. > E.g. the LUIDs are not in-oder after sched1 and the df_analyze > in early-remat.c doesn't fix them up. There are also cases in > other passes. > > Yet it seems several passes (e.g. combine) do rely on valid LUIDs, > without doing anything obvious to ensure that they're valid. > So whether LUIDs are valid after the initial df_analyze seems > to be a pass-specific property (whether it was meant to be or not). > > In gcc.dg/pr65779.c, the insn with an invalid LUID is a debug insn > that's actually *generated* by df_analyze (in regcprop, as a side-effect > of DF_LR_RUN_DCE). AFAIK, LUIDs for debug insns should be just as > reliable as LUIDs for ordinary insns, but we're less likely to > rescan the block and recompute the luids due to: > > if (!DEBUG_INSN_P (insn)) > df_set_bb_dirty (bb); > > in df_insn_rescan. > > So one "fix" would be to check !NONDEBUG_INSN_P in can_move_invariant_reg. > In one sense that's valid, since debug insns mustn't affect register > numbering. But doing it for this reason feels like a hack. If we do > skip debug insns in can_move_invariant_reg, we would in principle need > to update or invalidate any that are not dominated by the definition. > But in this case the debug insn *is* dominated by the definition, > so any fix-up code would be wrong/unnecessary. > > Another fix would be to call df_recompute_luids on the definition block > before using its LUIDs. > > Alternatively, we could just make the df_live_set_all_dirty > unconditional. Maybe that's best at this stage, since it's > closer to what we did before.
I guess that's what is appropriate at this state. Thus OK. Thanks, Richard. > WDYT? > > Richard > > > 2019-04-02 Richard Sandiford <richard.sandif...@arm.com> > > gcc/ > PR rtl-optimization/46590 > * loop-invariant.c (find_defs): Move df_remove_problem and > df_process_deferred_rescans to move_invariants. > Move df_live_add_problem and df_live_set_all_dirty calls > to move_invariants. > (move_invariants): Likewise. > (move_loop_invariants): Likewise, making the df_live calls > conditional on -O. Remove the problem again if we added it > locally. > > Index: gcc/loop-invariant.c > =================================================================== > --- gcc/loop-invariant.c 2019-04-02 16:09:31.000000000 +0000 > +++ gcc/loop-invariant.c 2019-04-02 16:09:55.969718043 +0000 > @@ -681,11 +681,7 @@ find_defs (struct loop *loop) > loop->num); > } > > - df_remove_problem (df_chain); > - df_process_deferred_rescans (); > df_chain_add_problem (DF_UD_CHAIN); > - df_live_add_problem (); > - df_live_set_all_dirty (); > df_set_flags (DF_RD_PRUNE_DEAD_DEFS); > df_analyze_loop (loop); > check_invariant_table_size (); > @@ -1891,6 +1887,10 @@ move_invariants (struct loop *loop) > GENERAL_REGS, NO_REGS, GENERAL_REGS); > } > } > + /* Remove the DF_UD_CHAIN problem added in find_defs before rescanning, > + to save a bit of compile time. */ > + df_remove_problem (df_chain); > + df_process_deferred_rescans (); > } > > /* Initializes invariant motion data. */ > @@ -2254,6 +2254,14 @@ move_loop_invariants (void) > { > struct loop *loop; > > + if (optimize == 1) > + df_live_add_problem (); > + /* ??? This is a hack. We should only need to call df_live_set_all_dirty > + for optimize == 1, but can_move_invariant_reg relies on DF_INSN_LUID > + being up-to-date. That isn't always true (even after df_analyze) > + because df_process_deferred_rescans doesn't necessarily cause > + blocks to be rescanned. */ > + df_live_set_all_dirty (); > if (flag_ira_loop_pressure) > { > df_analyze (); > @@ -2286,5 +2294,8 @@ move_loop_invariants (void) > invariant_table = NULL; > invariant_table_size = 0; > > + if (optimize == 1) > + df_remove_problem (df_live); > + > checking_verify_flow_info (); > } > -- Richard Biener <rguent...@suse.de> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)