On Wed, 19 Sep 2018, Richard Biener wrote: > > The second testcase in the above PR runs into our O(N) bitmap element > search limitation and spends 8s (60%) of the compile-time in the SSA > propagator > engine (when optimizing). The patch improves that to 0.9s (15%). For the > first testcase it isn't that bad but still the patch improves CCP from 36% to > 14%. > > The "solution" is to use sbitmap instead of a bitmap to avoid > the linearity when doing add_ssa_edge. We pay for that (but not > actually with the testcases) with a linear-time bitmap_first_set_bit > in process_ssa_edge_worklist. I do not (yet...) have a testcase > that overall gets slower with this approach. I suppose using > std::set<int> would "solve" the complexity issue but we'd pay > back with horribly inefficient memory use. Similarly with > our sparse bitmap implementation which lacks an ordered > first_set_bit (it only can get any set bit fast, breaking optimal > iteration order). > > If we'd only had a O(log n) search sparse bitmap implementation ... > (Steven posted patches to switch bitmap from/to such one but IIRC > that at least lacked bitmap_first_set_bit). > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > OK for trunk?
So it turns out that while the bitmap data structure isn't optimal the issue with the testcase is that we end up with a full-set-universe for the SSA worklist mostly because we are queueing PHIs via uses on backedges that are not yet executable (so we'd re-simulate the PHI anyway once that became excutable - no need to set the individual bits). So I'm testing the following instead. Bootstrap & regtest running on x86_64-unknown-linux-gnu. Richard. 2018-09-20 Richard Biener <rguent...@suse.de> PR tree-optimization/63155 * tree-ssa-propagate.c (add_ssa_edge): Avoid adding PHIs to the worklist when the edge of the respective argument isn't executable. Index: gcc/tree-ssa-propagate.c =================================================================== --- gcc/tree-ssa-propagate.c (revision 264438) +++ gcc/tree-ssa-propagate.c (working copy) @@ -168,10 +168,18 @@ add_ssa_edge (tree var) FOR_EACH_IMM_USE_FAST (use_p, iter, var) { gimple *use_stmt = USE_STMT (use_p); + basic_block use_bb = gimple_bb (use_stmt); /* If we did not yet simulate the block wait for this to happen and do not add the stmt to the SSA edge worklist. */ - if (! (gimple_bb (use_stmt)->flags & BB_VISITED)) + if (! (use_bb->flags & BB_VISITED)) + continue; + + /* If this is a use on a not yet executable edge do not bother to + queue it. */ + if (gimple_code (use_stmt) == GIMPLE_PHI + && !(EDGE_PRED (use_bb, PHI_ARG_INDEX_FROM_USE (use_p))->flags + & EDGE_EXECUTABLE)) continue; if (prop_simulate_again_p (use_stmt)