On Mon, Dec 6, 2021 at 7:39 PM Andrew MacLeod <amacl...@redhat.com> wrote: > > On 12/6/21 02:27, Richard Biener wrote: > > On Fri, Dec 3, 2021 at 9:42 PM Andrew MacLeod via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > >> When a request is made for the range of an ssa_name at some location, > >> the first thing we do is invoke range_of_stmt() to ensure we have looked > >> at the definition and have an evaluation for the name at a global > >> level. I recently added a patch which dramatically reduces the call > >> stack requirements for that call. > >> > >> Once we have confirmed the definition range has been set, a call is made > >> for the range-on-entry to the block of the use. This is performed by > >> the cache, which proceeds to walk the CFG predecessors looking for when > >> ranges are created (exported), existing range-on-entry cache hits, or > >> the definition block. Once this list of predecessors has been created, > >> a forward walk is done, pushing the range's through successor edges of > >> all the blocks were visited in the initial walk. > >> > >> If the use is far from the definition, we end up filling a lot of the > >> same value on these paths. Also uses which are far from a > >> range-modifying statement push the same value into a lot of blocks. > >> > >> This patch tries to address at least some inefficiencies. It recognizes > >> that > >> > >> First, if there is no range modifying stmt between this use and the last > >> range we saw in a dominating block, we can just use the value from the > >> dominating block and not fill in all the cache entries between here and > >> there. This is the biggest win. > >> > >> Second. if there is a range modifying statement at the end of some > >> block, we will have to do the appropriate cache walk to this point, but > >> its possible the range-on-entry to THAT block might be able to use a > >> dominating range, and we can prevent the walk from going any further > >> than this block > >> > >> Combined, this should prevent a lot of unnecessary ranges from being > >> plugging into the cache. > >> > >> ie, to visualize: > >> > >> bb4: > >> a = foo() > >> <..> > >> bb60: > >> if (a < 30) > >> <...> > >> bb110: > >> g = a + 10 > >> > >> if the first place we ask for a is in bb110, we walk the CFG from 110 > >> all the way back to bb4, on all paths leading back. then fill all those > >> cache entries. > >> With this patch, > >> a) if bb60 does not dominate bb110, the request will scan the > >> dominators, arrive at the definition block, have seen no range modifier, > >> and simply set the on-entry for 110 to the range of a. done. > >> b) if bb60 does dominate 110, we have no idea which edge out of 60 > >> dominates it, so we will revert to he existing cache algorithm. Before > >> doing so, it checks and determines that there are no modifiers between > >> bb60 and the def in bb4, and so sets the on-entry cache for bb60 to be > >> the range of a. Now when we do the cache fill walk, it only has to go > >> back as far as bb60 instead of all the way to bb4. > >> > >> Otherwise we just revert to what we do now (or if dominators are not > >> available). I have yet to see a case where we miss something we use to > >> get, but that does not mean there isn't one :-). > >> > >> The cumulative performance impact of this compiling a set of 390 GCC > >> source files at -O2 (measured via callgrind) is pretty decent: Negative > >> numbers are a compile time decrease. Thus -10% is 10% faster > >> compilation time. > >> > >> EVRP : %change from trunk is -26.31% (!) > >> VRP2 : %change from trunk is -9.53% > >> thread_jumps_full : %change from trunk is -15.8% > >> Total compilation time : %change from trunk is -1.06% > >> > >> So its not insignificant. > >> > >> Risk would be very low, unless dominators are screwed up mid-pass.. but > >> then the relation code would be screwed up too. > >> > >> Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK? > > OK. > > Committed. > > > > > > Wow - I didn't realize we have this backwards CFG walk w/o dominators ... > > I wonder if you can add a counter to visualize places we end up using this > > path. > > Well, its only does the fill now when there is range info located on an > outgoing edge of the dominator. Its still used, just on a much smaller > portion of the graph. > > We could do even better if we knew whether one edge was involved. ie > > bb3: > a = foo() > if (a > 4) > blah1; bb4 > else > blah2; bb5 > bb6: > = a; > > The use of a in bb6 will look to bb3 as the dominator, but if it knew > that both edges are used in that dominance relation (ie, neither > outgoing edge dominates bb6), it wouldn't have to calculate a range for > 'a'. > > But as it stands, it cant really tell the above situation from: > > bb3: > a = foo() > if (a > 4) > = a bb4 > > In this case, only the one edge is used, and we DO need to calculate a > range for a. The edge from 3->4 does dominate bb4 > > In both cases, bb3 is the dominator, and bb3 can generate an outgoing > range, but only in one case do we need to calculate a range. > > What would be really useful would be to be able to tell if an edge > dominates a block :-) > > I have thoughts on this for next release, and may overhaul the cache... > but I don't think there is any such facility in the dominator subsystem?
Well, dominance of an edge is dominance of the edge destination if the destination has only a single non-backedge. If the destination has more than one non-backedge then the edge does not dominate anything. So to generally answer dominance queries for edges you have to have backedges marked. Richard. > > Andrew >