Richard Sandiford <richard.sandif...@arm.com> writes: > Richard Biener <rguent...@suse.de> writes: >> After the fix to RPO VN in loop header copying DF via RTL invariant >> motion takes 50% of the compile-time for the second testcase in >> PR46590. This is caused by the DF live problem iterating over >> all dirty blocks for the local problem which is all blocks of >> the function because while loop-invariant uses df_analyze_loop >> it always marks all blocks of the function as dirty via >> df_live_set_all_dirty. One of the possible fixes is to add >> a df_live_set_loop_dirty () function which causes the problem >> to only iterate over one loop blocks. >> >> It turns out the LIVE problem is always present so we cannot >> remove it which in turn means we have to mark blocks possibly >> not visited by invaraint motion as dirty (thus the new >> df_live_set_all_dirty at the end of move_loop_invariants). >> >> Anyhow - I'm not really into how DF should ideally work here > > Me neither, but as discussed on IRC, here's alternative that seems > to give a similar speed-up. > > - df_live is already present at -O2, so we only need to add it and > mark all blocks dirty for -O > > - df_process_deferred_rescans should be enough to force a rescan of > blocks affected by moving invariants, but calling it in find_defs > means that we don't do any rescans for the final loop > > Still testing, but posting for comments. > > Thanks, > Richard > > > 2019-04-01 Richard Sandiford <richard.sandif...@arm.com> > > gcc/ > PR rtl-optimization/46950 > * loop-invariant.c (find_defs): Move df_remove_problem and > df_process_deferred_rescans to move_invariants. > Move df_live_add_problem and df_live_set_all_dirty calls > to move_invariants. > (move_invariants): Likewise. > (move_loop_invariants): Likewise, making the df_live calls > conditional on -O. Remove the problem again if we added it > locally. > > Index: gcc/loop-invariant.c > =================================================================== > --- gcc/loop-invariant.c 2019-03-08 18:15:33.704751739 +0000 > +++ gcc/loop-invariant.c 2019-04-01 14:09:13.553206345 +0000 > @@ -681,11 +681,7 @@ find_defs (struct loop *loop) > loop->num); > } > > - df_remove_problem (df_chain); > - df_process_deferred_rescans (); > df_chain_add_problem (DF_UD_CHAIN); > - df_live_add_problem (); > - df_live_set_all_dirty (); > df_set_flags (DF_RD_PRUNE_DEAD_DEFS); > df_analyze_loop (loop); > check_invariant_table_size (); > @@ -1891,6 +1887,10 @@ move_invariants (struct loop *loop) > GENERAL_REGS, NO_REGS, GENERAL_REGS); > } > } > + /* Remove the DF_UD_CHAIN problem added in find_defs before rescanning, > + to save a bit of compile time. */ > + df_remove_problem (df_chain); > + df_process_deferred_rescans (); > } > > /* Initializes invariant motion data. */ > @@ -2254,6 +2254,11 @@ move_loop_invariants (void) > { > struct loop *loop; > > + if (optimize == 1) > + { > + df_live_add_problem (); > + df_live_set_all_dirty (); > + }
Gah, this is a mess. The patch as posted regresses gcc.dg/pr65779.c, which is an -O2 -fcompare-debug test. This is because the DF_INSN_LUIDs are out-of-date on entry to the pass, and without the df_live_set_all_dirty, they stay out-of-date after a full df_analyze, even though the comment above luid suggests otherwise: /* The logical uid of the insn in the basic block. This is valid after any call to df_analyze but may rot after insns are added, deleted or moved. */ int luid; This affects this condition in can_move_invariant_reg: /* Don't move if a use is not dominated by def in insn. */ if (use_bb == bb && DF_INSN_LUID (insn) >= DF_INSN_LUID (use_insn)) return false; After hacking up some extra testing, it seems that it's often the case that DF_INSN_LUID is out-of-date even after df_analyze. E.g. the LUIDs are not in-oder after sched1 and the df_analyze in early-remat.c doesn't fix them up. There are also cases in other passes. Yet it seems several passes (e.g. combine) do rely on valid LUIDs, without doing anything obvious to ensure that they're valid. So whether LUIDs are valid after the initial df_analyze seems to be a pass-specific property (whether it was meant to be or not). In gcc.dg/pr65779.c, the insn with an invalid LUID is a debug insn that's actually *generated* by df_analyze (in regcprop, as a side-effect of DF_LR_RUN_DCE). AFAIK, LUIDs for debug insns should be just as reliable as LUIDs for ordinary insns, but we're less likely to rescan the block and recompute the luids due to: if (!DEBUG_INSN_P (insn)) df_set_bb_dirty (bb); in df_insn_rescan. So one "fix" would be to check !NONDEBUG_INSN_P in can_move_invariant_reg. In one sense that's valid, since debug insns mustn't affect register numbering. But doing it for this reason feels like a hack. If we do skip debug insns in can_move_invariant_reg, we would in principle need to update or invalidate any that are not dominated by the definition. But in this case the debug insn *is* dominated by the definition, so any fix-up code would be wrong/unnecessary. Another fix would be to call df_recompute_luids on the definition block before using its LUIDs. Alternatively, we could just make the df_live_set_all_dirty unconditional. Maybe that's best at this stage, since it's closer to what we did before. WDYT? Richard 2019-04-02 Richard Sandiford <richard.sandif...@arm.com> gcc/ PR rtl-optimization/46590 * loop-invariant.c (find_defs): Move df_remove_problem and df_process_deferred_rescans to move_invariants. Move df_live_add_problem and df_live_set_all_dirty calls to move_invariants. (move_invariants): Likewise. (move_loop_invariants): Likewise, making the df_live calls conditional on -O. Remove the problem again if we added it locally. Index: gcc/loop-invariant.c =================================================================== --- gcc/loop-invariant.c 2019-04-02 16:09:31.000000000 +0000 +++ gcc/loop-invariant.c 2019-04-02 16:09:55.969718043 +0000 @@ -681,11 +681,7 @@ find_defs (struct loop *loop) loop->num); } - df_remove_problem (df_chain); - df_process_deferred_rescans (); df_chain_add_problem (DF_UD_CHAIN); - df_live_add_problem (); - df_live_set_all_dirty (); df_set_flags (DF_RD_PRUNE_DEAD_DEFS); df_analyze_loop (loop); check_invariant_table_size (); @@ -1891,6 +1887,10 @@ move_invariants (struct loop *loop) GENERAL_REGS, NO_REGS, GENERAL_REGS); } } + /* Remove the DF_UD_CHAIN problem added in find_defs before rescanning, + to save a bit of compile time. */ + df_remove_problem (df_chain); + df_process_deferred_rescans (); } /* Initializes invariant motion data. */ @@ -2254,6 +2254,14 @@ move_loop_invariants (void) { struct loop *loop; + if (optimize == 1) + df_live_add_problem (); + /* ??? This is a hack. We should only need to call df_live_set_all_dirty + for optimize == 1, but can_move_invariant_reg relies on DF_INSN_LUID + being up-to-date. That isn't always true (even after df_analyze) + because df_process_deferred_rescans doesn't necessarily cause + blocks to be rescanned. */ + df_live_set_all_dirty (); if (flag_ira_loop_pressure) { df_analyze (); @@ -2286,5 +2294,8 @@ move_loop_invariants (void) invariant_table = NULL; invariant_table_size = 0; + if (optimize == 1) + df_remove_problem (df_live); + checking_verify_flow_info (); }