On 9/21/22 06:13, Richard Biener wrote:
On Mon, 19 Sep 2022, Andrew MacLeod wrote:
It looks like you created a fur_source to manually adjust PHIs within the
fold_stmt query to ignore edges that are not marked executable.
Yes, and use the current values from the VN lattice when looking at
statement operands.
yes, that is exactly how its intended to be used.
That would then just leave you with the stale cache state to deal with? And
if we can resolve that, would all just work? at least in theory?
In theory, yes. Besides that the use-def walking of the cache it not
wired up with fur_*
Well, yes. hmm, you want to set cache values based on the VN lattice as
well. yes. OK, let me do a bit of cache explanation since I haven't done
that yet. It does not need a fur_source of any kind, and I'll explain why.
The cache has 2 primary functions..
1) maintain the global definition table (used to decide if a name has
been processed). This is local and not the one the rest of GCC uses. and
2) maintain the range-on-entry cache andresolve queries to that
efficiently.
The cache does not actually create any NEW information. This is one of
its key features in preventing any kind of cascading cyclic updates.
All it does is propagate existing information from the definition table,
with values extracted from the global value table. So your example is
not good for this, as there isn't much in the cache for it. so lets
tweak it and add another block. example:
n_2 = 1
i_4 = 0
val_5 = 0
<bb 3>:
# i_1 = PHI <i_4(2), i_7(3)>
#val_2 = PHI <val_5(2), val_6(3) >
val_6 = val_2 + 1;
i_7 = i_1 + 1
if (i_7 > 22)
goto <bb 12>
else
goto <bb 7>
<bb 7>
if (i_7 < n_3)
goto <bb 3>;
else
goto <bb 4>;
<bb 4>
_8 = val_6
return _8
For the sake of simplicity, lets also assume bb2 and bb3 have been
looked and all the ssa-names defined in those blocks have an entry in
rangers defintion table.
Moving to <bb 7> if we ask for the range of "if (i_7< n_3) to be
evaluated, it checks that i_7 and n_3 have been evaluated before it
proceeds. Both have entries, which means the next task is to get their
values at this location. range_of_expr is called on each one, and as
they are not defined in this block, ranger asks the cache for the value
of i_7 on entry to bb7. (likewise when it gets an answer back, it will
do so for n_3 as well)
The cache walks back the dominators until it finds either:
a) the block with the definition of i_7, or
b) a block which has an on-entry cache value for i_7 already set.
During it walk, it tags any block which has i_7 in the export list,
meaning an outgoing edge from that block may change the value of i_7.
There are additional complexities, but the fundamental operation is to
now take the value it saw from a) or b) as the starting value, and
supply that to GORI at every intervening outgoing edge i_7 was exported
from. Whenever the value changes along the way, we write a cache update
at the end of the edge to facilitate future queries. At the end, the
answer has been calculated and is stored as the on-entry value for this
block.
So returning to the example, assume i_7 was set to VARYING in bb3, GORI
would apply !(i_7 > 22) to the value, and we would end up in <bb 7> with
a range-on-entry of [0, 21] and it would be stored in bb7.
In your example, if you have disabled that back edge, you would have a
value of [1,1] for i_7. GORI would not have changed that value since
its already < 22, and we would store [1,1] as the range-on-entry to <bb 7>
Likewise, we do something similar for n_3. The point is, the cache has
not gone and created an new information. its *only* purpose it to
propagate known values thru the CFG, adjusting them for any outgoing
edges that are encountered. It uses a temporal marking in an attempt to
identify when a global value has been changed, meaning it may need to go
and repopulate something, but the bottom line It never contains anything
beyond "reductions" in the ranges of values in the global table. And it
only every works on one name at a time.
THe bottom line, Ranger effectively only every changes values via the
global table. And the cache propagates simply those values around,
adjusting them with GORI as appropriate.
So there are multiple approaches. We could simply kill the global table
and cache line for any ssa_name we want to change the value of. That
gets a little tricker for on-entry values of secondary effects (ie,
those used in the calculation of the primary names). It would probably
work, but something unforeseen could show up.
More advanced would be to "layer" the cache. ie, we use the cache, and
at some point, you issue a "push". The push creates a new cache, and
all queries look first to the new cache, and if it cant be answered
looks "down" thru to the previous caches. This resolves all queries as
if the cache layers are all "one" thing. Sets would always go to the
latest layer. When we "pop", we delete the latest layer.. the layers
underneath should all reflect the exact state at the time of the push.
All in theory of course :-) There would be some expense to it, but
possibly not as much as one thinks since most components defer
allocations and such until something is actually set. It seems like it
might not be a hard experiment that I will try once I get thru the bits
I'm on now.
That would have to drawback of losing any information that had been calculated
earlier, and possibly trigger some additional calculations as that value will
now be stale elsewhere in the IL. We could make it more efficient if you
also provided a bitmap over the basic blocks in the region (or some way of
determining that). Then we would just kill the entries in those blocks for
those ssa-names.
As long as you weren't making any ranger queries from outside the region
during this time, that might work.
Before I go any further, are we on the same page, or are you looking for
something different than this?
You had mentioned the path_ranger model also, which effectively uses ranger
for the values on entry to the path/region, and then uses over-rided API entry
points to choose whether to use the local cache vector, or go to rangers
on-entry values..
This approach could also work, and one way would be to implement the
mark/reset API. The new class would maybe instantiate an additional local
cache from which values are set/picked up first for any queries within the
region blocks and when you are done, throws away the cache for the region.
This would require some tweaking from rangers primary cache lookup/set
routine, but perhaps there is some benefit to generalizing this model and
integrating it.. ie, a region_ranger which builds this facility on top and
allows you to reset regions. Presumably you want to be able to "stack" these
marks so you can push and pop regions? would also have to figure out how to
work the relational oracle with it.
Yeah, so my thinking was that I'd keep my own range cache for in-region,
like path_ranger does, and track changes similar as to how I do for VN
so I can undo things efficiently. That way I also get to decide
how far to do the caching use-def walks.
Of course the disadvantage is duplicating quite some ranger functionality
that's already there without too much changes besides my own cache plus
ensuring using the VNs computed values and edge execuability. Maybe
it isn't too much work though.
Let me give the layered cache a shot.. i don't think that will be much
work (ha. famous last words), and we can see if it works, and how
expensive it is. There might be alternatives along this line,
especially if you are only querying a few things here and there. The
cache is designed to be able to plug in alternative mechanisms on a
name-by-name basis (there are already a couple of variations depending
on CFG size and name density usage) , so is might be that we can do
something really cheap for these kind of pushed layers.
I'm still mulling around options.
I can think of a couple of possible approaches for that, some more efficient
than others. I suppose the path_ranger could even be completely replaced
with that model. huh. it would just call mark() at the start of a path, and
reset() when its done with the path. Might not be as efficient however. Be
a good proof of correctness test I guess at least.
Anyway, am I in the ballpark somewhere of what you are looking for? Or am I
over engineering something? :-)
No, you basically identified the problems.
Note that when not iterating the proposed change should already work
optimally, but it's of course bad to be "better" in the weaker setting ...
The second thing would be to look at how to use conditional equivalences.
What's required to make ranger aware of them and what's required to
query them since for them the actual simplification would happen in
the folding done by value-numbering like with
if (a == b)
c = a - b;
which EVRP gets simplified for example but VN fails at this.
There's proposed enhancements to VN that might catch this as well
but I'm not too happy about the implementation. Mainly because
equivalences are a bitch.
The relational oracle currently snags this one by putting a and b in an
equivalence set on that outgoing edge. the fold_stmt mechanism you are
using automatically queries the relation oracle (if fur_source supplies
one) for a relation between the 2 use operands of a statement, and
passes that to the range-ops fold routine. so it will ultimately invoke
operator_minus with
fold_range (irange, int, range_of_a, range_of_b, VREL_EQ)
Range-ops for operator minus defines the routine
"op1_op2_relation_effect()" (which is automatically applied to any value
calculated by fold_range() after the fact. That routine has:
// == and != produce [0,0] and ~[0,0] regardless of wrapping.
if (rel == VREL_EQ)
rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec));
else if (rel == VREL_NE)
rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec),
VR_ANTI_RANGE);
so as long as the relation was registered, fold_stmt will automatically
get a range of [0,0] for c, all through range-ops for operator_minus.
If the fur_source used when fold_stmt is called supplies the routine
"register_relation" (which I think only a fur_depend currently does),
then relations are also automatically registered with the oracle
supplied by the fur_depend.
There's a lot of this that can happen auto-magically. I just need to
make sure we can reset some of the underlying cache entries (and oracle
as well.. it might need to be layered so that we don't pick up invalid
relations. It should also be relatively simply to adjust I think)
Andrew