On Tue, Nov 23, 2021 at 4:20 PM Martin Liška <mli...@suse.cz> wrote: > > On 11/23/21 14:58, Richard Biener wrote: > > On Mon, Nov 22, 2021 at 4:07 PM Martin Liška <mli...@suse.cz> wrote: > >> > >> On 11/19/21 11:00, Richard Biener wrote: > >>> On Tue, Nov 16, 2021 at 3:40 PM Martin Liška <mli...@suse.cz> wrote: > >>>> > >>>> On 11/11/21 08:15, Richard Biener wrote: > >>>>> So I'd try to do no functional change first, improving the costing and > >>>>> setting up the transform to simply pick up the stmts to "fold" as > >>>>> discovered > >>>>> during analysis (as I hinted you possibly can use gimple_uid to mark > >>>>> the stmts that simplify, IIRC gimple_uid is preserved during copying. > >>>>> gimple_uid would also scale better than gimple_plf in case we do > >>>>> the analysis for all candidates at once). > >>>> > >>>> Thinking about the analysis. Am I correct that we want to properly > >>>> calculate > >>>> loop size for true and false edge of a potential gcond before the > >>>> actually unswitching? > >>> > >>> Yes. > >>> > >>>> We can do that by finding a first gcond candidate, evaluate (symbolic + > >>>> irange approache) > >>>> all other gcond in the loop body and use BB_REACHABLE discovery. > >>>> Similarly to what we do now > >>>> at lines 378-446. Then tree_num_loop_insns can be adjusted for only > >>>> these reachable blocks. > >>>> Having that, we can calculate # of insns that will live in true/false > >>>> loops. > >>> > >>> So whatever we do here we should record as "this control stmt folds to > >>> {true,false}" (or {true,unknown}, > >>> or in future, "this control stmt will lead to edge {e,unknown}"), > >>> recording the simplification > >>> on the true/false loop version in a way we can apply it after the > >>> transform. > >>> > >>>> Then we can call tree_unswitch_loop and make the gcond folding as we do > >>>> in the versioned loops. > >>>> > >>>> Is it a step in good direction? Having that we can then extend it to > >>>> gswitch statements. > >>> > >>> One issue I see is that BB_REACHABLE is there only once but you could use > >>> auto_bb_flag reachable_true, reachable_false to distinguish the > >>> true/false loop version > >>> copies. > >>> > >>> So yes, I think that sounds reasonable. At the point we want to > >>> evaluate different > >>> (first) unswitching opportunities against each other storing this only > >>> as BB flag is > >>> likely to hit limits. When we want to evaluate multiple levels of > >>> unswitching before > >>> doing any transforms even more so (if there are 3 opportunities there'd be > >>> many cases to be considered when going to level 3 ;)). I _think_ that a > >>> sparse > >>> lattice of stmt UID -> edge might do the trick if we change > >>> tree_num_loop_insns > >>> do to a DFS walk from the loop header, ignoring not taken edges by > >>> consulting the > >>> lattice. Or, for speed reason, pre-compute tree_num_loop_insns for each > >>> BB > >>> so we just have to sum a different set of BBs rather than walking all > >>> stmts again. > >>> > >>> That said, the second step would definitely be to choose the "best" > >>> opportunity > >>> on the current level. > >>> > >>> Richard. > >>> > >>>> Cheers, > >>>> Martin > >> > >> Hello. > >> > >> I'm sending a new version where I changed: > >> 1) all unswitch_predicates are find for a loop > >> 2) context sensitive costing happens based on an unswitch_predicate and BB > >> reachability > >> is implemented > >> 3) folding happens in recursive invocation once we decide to unswitch > >> 4) the patch folds both symbolic gcond predicates and irange provided by > >> ranger > >> 5) debug counter was added > >> > >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > >> Plus, I tested it > >> on SPEC2006 and SPEC2017 with -size=ref. > > > > Meh, diff made a mess out if this ;) Random comments, I'm walking > > myself the optimizations > > flow. > > Sure. > > > > > tree_unswitch_single_loop: > > > > + unswitch_predicate *predicate = NULL; > > + if (num > param_max_unswitch_level) > > + { > > + if (dump_file > > + && (dump_flags & TDF_DETAILS)) > > + fprintf (dump_file, ";; Not unswitching anymore, hit max level\n"); > > + goto exit; > > + } > > > > this looks like we can do this check before find_all_unswitching_predicates? > > Makes sense. > > > > > + for (auto pred: candidates) > > + { > > + unsigned cost > > + = evaluate_loop_insns_for_predicate (loop, bbs, ranger, pred); > > ... > > > > so this searches for the first candidate that fits in > > param_max_unswitch_insns, it doesn't > > yet try to find the cheapest / best one. Please add a comment to say > > that. After we > > found one candidate we apply unswitching to such one candidate (and throw > > the > > others away). I guess that's OK - it's what the old code did - what > > you did for this > > intermediate step is actually gather all unswitching predicates > > upfront. Hopefully > > we'll be able to share some of the work done for the recursive invocations. > > > > + fprintf (dump_file, ";; Unswitching loop with condition: "); > > > > "on condition" > > > > + fprintf (dump_file, ";; Not unswitching condition, loop too big " > > + "(%d insns): ", cost); > > > > "cost too big"? I assume 'cost' is the number of stmts we'll add, > > loop-size - true-eliminated - false-eliminated? > > I'm going to adjust this. > > > > > +exit: > > + for (auto predicate: candidates) > > + delete predicate; > > > > Some refactoring should get rid of the goto ... > > You don't like it? It seems to me quite logical as one does not have to repeat > a clean up code before each return statement. > > > > > +static unsigned > > +evaluate_insns (class loop *loop, basic_block *bbs, > > + unswitch_predicate *candidate, bool true_edge, > > + gimple_ranger *ranger, auto_bb_flag &reachable_flag) > > +{ > > ... > > + unswitch_predicate *predicate > > + = tree_may_unswitch_on (bb, loop, ranger); > > + if (predicate != NULL) > > + { > > + tree folded > > + = simplify_using_entry_checks (predicate, candidate, > > + true_edge); > > > > so just looking at the implementation it seems that if you assign UIDs > > to all last_stmt() in a loop you can keep tree_may_unswitch_on () > > in a dense array and you do not have to re-compute it? UIDs should > > survive copying so even for recursive unswitchings the predicates should > > be shareable (pointing to the wrong stmt then, but then just do not record > > the stmt in there - it should be available from the caller somehow). > > I like the idea, lemme implement it. > > > > > But then I wonder how this translates to gswitch handling where the > > result is a taken edge / a set of known not taken edges. I suggest > > to try wiring in gcond support in a separate commit on top of this to > > s/gcond/gswitch, right? > > > see whether some refactoring is needed. > > Sure, so for e.g. case 1 ... 5 we would need to create a new > unswitch_predicate > with 1 <= index && index <= 5 tree predicate (and the corresponding irange > range). > Later once we unswitch on it, we should use a special unreachable_flag that > will > be used for marking of dead edges (similarly how we fold gconds to > boolean_{false/true}_node. > Does it make sense? > > > > > Looking at evaluate_insns it also seems that 'reachable_flag' > > is unused - computing the sizes could be done once you process > > It is used: > > if (dest->loop_father == loop > && !(dest->flags & reachable_flag) > && !(e->flags & flags)) > { > worklist.safe_push (dest); > dest->flags |= reachable_flag; > } > > Where's problem?
Ah, OK - it's used as 'visited' as well. Fine then. > > > a block, no? Previously I suggested to compute estimate_num_insns > > for each BB of the original loop once so you can re-use that and > > just need to sum up BB costs. > > Yep, makes sense. I could suggest to use bb->aux ... (maybe that's even copied by the versioning so that recursions do not need to recompute stuff - no, it doesn't seem so, we only copy edge->aux it seems looking at duplicate_block!?) Anyway - just compute it for each "original" loop then, in the very far end we hopefully evaluate the whole recursion before doing the transforms. > > > > > > +static bool > > +find_all_unswitching_predicates (class loop *loop, basic_block *bbs, > > + bool true_edge, > > + unswitch_predicate *parent_predicate, > > + gimple_ranger *ranger, > > + vec<unswitch_predicate *> &candidates) > > +{ > > ... > > > > so you re-do the simplification step here, in fact this function seems > > to not only find unswitching predicates but apply simplifications > > from the previous unswitching step. (yes, that's what the old code did ...) > > The costing step already did all the work though, it should be > > possible to keep a vector of stmts to simplify and the simplification > > from that analysis time. > > Makes sense. > > > > > You can get at the "other loop copy" stmt via > > > > last_stmt (get_bb_copy (gimple_bb (cond))) > > > > if you do that before free_original_copy_tables. I think it would be > > nice to have the simplification step explicit before recursing > > even if that means some duplicate walking for now > > (I do hope we can share the find_all_unswitching_predicates > > analysis step) > > Nods. > > > > > You are applying param_max_unswitch_insns with > > > > + unsigned cost > > + = evaluate_loop_insns_for_predicate (loop, bbs, ranger, pred); > > > > - cond = simplify_using_entry_checks (loop, cond); > > - stmt = last_stmt (bbs[i]); > > - if (integer_nonzerop (cond)) > > + if (cost <= (unsigned)param_max_unswitch_insns) > > > > where cost is the sum of the simplified sizes of both loop copies after > > unswitching. By its very nature this will match for all recursive > > invocations > > (loops only will get smaller) which is why we need param_max_unswitch_level. > > The idea was to have one param_max_unswitch_insns budget for each > > _original_ loop the recursive invocations will eat from, removing the need > > for param_max_unswitch_level. I'd re-interpret param_max_unswitch_insns > > as the number of insns we may add, so with cost calculated as above we'd > > have > > > > if (cost <= budget) > > ... > > > > and > > > > budget -= cost; > > > > before the recursion (and pass by reference so that the two recursions > > share the budget). > > > > Initialize by > > > > budget = original_loop_size + param_max_unswitch_insns; > > Ah, ok, lemme change that. > > > > > Hope this wasn't too many comments ;) > > It was, but they are all very useful. > > Thanks, > Martin > > > > > Thanks, > > Richard. > > > >> Thoughts? > >> Thanks, > >> Martin > >> >