Re: [PATCH][RFH] Wire ranger into FRE

Andrew MacLeod via Gcc-patches Wed, 21 Sep 2022 11:52:58 -0700


On 9/21/22 06:13, Richard Biener wrote:

On Mon, 19 Sep 2022, Andrew MacLeod wrote:

It looks like you created a fur_source to manually adjust PHIs within the
fold_stmt query to ignore edges that are not marked executable.

Yes, and use the current values from the VN lattice when looking at
statement operands.


yes, that is exactly how its intended to be used.

That would then just leave you with the stale cache state to deal with?   And
if we can resolve that, would all just work?  at least in theory?

In theory, yes.  Besides that the use-def walking of the cache it not
wired up with fur_*

Well, yes. hmm, you want to set cache values based on the VN lattice aswell. yes. OK, let me do a bit of cache explanation since I haven't donethat yet. It does not need a fur_source of any kind, and I'll explain why.


The cache has 2 primary functions..

1) maintain the global definition table (used to decide if a name hasbeen processed). This is local and not the one the rest of GCC uses. and 2) maintain the range-on-entry cache andresolve queries to thatefficiently.

The cache does not actually create any NEW information. This is one ofits key features in preventing any kind of cascading cyclic updates. All it does is propagate existing information from the definition table,with values extracted from the global value table. So your example isnot good for this, as there isn't much in the cache for it. so letstweak it and add another block. example:


n_2 = 1
  i_4 = 0
  val_5 = 0
<bb 3>:
  # i_1 = PHI <i_4(2), i_7(3)>
  #val_2 = PHI <val_5(2), val_6(3) >
  val_6 = val_2 + 1;
  i_7 = i_1 + 1
  if (i_7 > 22)
     goto <bb 12>
  else
     goto <bb 7>
<bb 7>
  if (i_7 < n_3)
    goto <bb 3>;
  else
    goto <bb 4>;
<bb 4>
  _8 = val_6
  return _8

For the sake of simplicity, lets also assume bb2 and bb3 have beenlooked and all the ssa-names defined in those blocks have an entry inrangers defintion table.

Moving to <bb 7> if we ask for the range of "if (i_7< n_3) to beevaluated, it checks that i_7 and n_3 have been evaluated before itproceeds. Both have entries, which means the next task is to get theirvalues at this location. range_of_expr is called on each one, and asthey are not defined in this block, ranger asks the cache for the valueof i_7 on entry to bb7. (likewise when it gets an answer back, it willdo so for n_3 as well)


The cache walks back the dominators until it finds either:
  a) the block with the definition of i_7, or
  b) a block which has an on-entry cache value for i_7 already set.

During it walk, it tags any block which has i_7 in the export list,meaning an outgoing edge from that block may change the value of i_7.

There are additional complexities, but the fundamental operation is tonow take the value it saw from a) or b) as the starting value, andsupply that to GORI at every intervening outgoing edge i_7 was exportedfrom. Whenever the value changes along the way, we write a cache updateat the end of the edge to facilitate future queries. At the end, theanswer has been calculated and is stored as the on-entry value for thisblock.

So returning to the example, assume i_7 was set to VARYING in bb3, GORIwould apply !(i_7 > 22) to the value, and we would end up in <bb 7> witha range-on-entry of [0, 21] and it would be stored in bb7.

In your example, if you have disabled that back edge, you would have avalue of [1,1] for i_7. GORI would not have changed that value sinceits already < 22, and we would store [1,1] as the range-on-entry to <bb 7>

Likewise, we do something similar for n_3. The point is, the cache hasnot gone and created an new information. its *only* purpose it topropagate known values thru the CFG, adjusting them for any outgoingedges that are encountered. It uses a temporal marking in an attempt toidentify when a global value has been changed, meaning it may need to goand repopulate something, but the bottom line It never contains anythingbeyond "reductions" in the ranges of values in the global table. And itonly every works on one name at a time.

THe bottom line, Ranger effectively only every changes values via theglobal table. And the cache propagates simply those values around,adjusting them with GORI as appropriate.

So there are multiple approaches. We could simply kill the global tableand cache line for any ssa_name we want to change the value of. Thatgets a little tricker for on-entry values of secondary effects (ie,those used in the calculation of the primary names). It would probablywork, but something unforeseen could show up.

More advanced would be to "layer" the cache. ie, we use the cache, andat some point, you issue a "push". The push creates a new cache, andall queries look first to the new cache, and if it cant be answeredlooks "down" thru to the previous caches. This resolves all queries asif the cache layers are all "one" thing. Sets would always go to thelatest layer. When we "pop", we delete the latest layer.. the layersunderneath should all reflect the exact state at the time of the push. All in theory of course :-) There would be some expense to it, butpossibly not as much as one thinks since most components deferallocations and such until something is actually set. It seems like itmight not be a hard experiment that I will try once I get thru the bitsI'm on now.

That would have to drawback of losing any information that had been calculated
earlier, and possibly trigger some additional calculations as that value will
now be stale elsewhere in the IL.   We could make it more efficient if you
also provided a bitmap over the basic blocks in the region (or some way of
determining that).  Then we would just kill the entries in those blocks for
those ssa-names.

As long as you weren't making any ranger queries from outside the region
during this time, that might work.

Before I go any further, are we on the same page, or are you looking for
something different than this?

You had mentioned the path_ranger model also, which effectively uses ranger
for the values on entry to the path/region, and then uses over-rided API entry
points to choose whether to use the local cache vector, or go to rangers
on-entry values..

  This approach could also work, and one way would be to implement the
mark/reset API. The new class would maybe instantiate an additional local
cache from which values are set/picked up first for any queries within the
region blocks and when you are done, throws away the cache for the region.
This would require some tweaking from rangers primary cache lookup/set
routine, but perhaps there is some benefit to generalizing this model and
integrating it.. ie, a region_ranger which builds this facility on top and
allows you to reset regions.   Presumably you want to be able to "stack" these
marks so you can push and pop regions? would also have to figure out how to
work the relational oracle with it.

Yeah, so my thinking was that I'd keep my own range cache for in-region,
like path_ranger does, and track changes similar as to how I do for VN
so I can undo things efficiently.  That way I also get to decide
how far to do the caching use-def walks.

Of course the disadvantage is duplicating quite some ranger functionality
that's already there without too much changes besides my own cache plus
ensuring using the VNs computed values and edge execuability.  Maybe
it isn't too much work though.

Let me give the layered cache a shot.. i don't think that will be muchwork (ha. famous last words), and we can see if it works, and howexpensive it is. There might be alternatives along this line,especially if you are only querying a few things here and there. Thecache is designed to be able to plug in alternative mechanisms on aname-by-name basis (there are already a couple of variations dependingon CFG size and name density usage) , so is might be that we can dosomething really cheap for these kind of pushed layers.


I'm still mulling around options.

I can think of a couple of possible approaches for that, some more efficient
than others.    I suppose the path_ranger could even be completely replaced
with that model. huh. it would just call mark() at the start of a path, and
reset() when its done with the path.   Might not be as efficient however.   Be
a good proof of correctness test I guess at least.

Anyway, am I in the ballpark somewhere of what you are looking for?  Or am I
over engineering something? :-)

No, you basically identified the problems.

Note that when not iterating the proposed change should already work
optimally, but it's of course bad to be "better" in the weaker setting ...

The second thing would be to look at how to use conditional equivalences.
What's required to make ranger aware of them and what's required to
query them since for them the actual simplification would happen in
the folding done by value-numbering like with

   if (a == b)
     c = a - b;

which EVRP gets simplified for example but VN fails at this.
There's proposed enhancements to VN that might catch this as well
but I'm not too happy about the implementation.  Mainly because
equivalences are a bitch.

The relational oracle currently snags this one by putting a and b in anequivalence set on that outgoing edge. the fold_stmt mechanism you areusing automatically queries the relation oracle (if fur_source suppliesone) for a relation between the 2 use operands of a statement, andpasses that to the range-ops fold routine. so it will ultimately invokeoperator_minus with


  fold_range (irange, int, range_of_a, range_of_b, VREL_EQ)

Range-ops for operator minus defines the routine"op1_op2_relation_effect()" (which is automatically applied to any valuecalculated by fold_range() after the fact. That routine has:


// == and != produce [0,0] and ~[0,0] regardless of wrapping.
  if (rel == VREL_EQ)
    rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec));
  else if (rel == VREL_NE)
    rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec),
                              VR_ANTI_RANGE);

so as long as the relation was registered, fold_stmt will automaticallyget a range of [0,0] for c, all through range-ops for operator_minus.

If the fur_source used when fold_stmt is called supplies the routine"register_relation" (which I think only a fur_depend currently does),then relations are also automatically registered with the oraclesupplied by the fur_depend.

There's a lot of this that can happen auto-magically. I just need tomake sure we can reset some of the underlying cache entries (and oracleas well.. it might need to be layered so that we don't pick up invalidrelations. It should also be relatively simply to adjust I think)


Andrew

Re: [PATCH][RFH] Wire ranger into FRE

Reply via email to