104530 - proposed re-evaluation.

Richard Biener via Gcc-patches Tue, 22 Feb 2022 23:49:01 -0800

On Tue, Feb 22, 2022 at 8:19 PM Andrew MacLeod via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On 2/22/22 13:07, Jeff Law wrote:
> >
> >
> > On 2/22/2022 10:57 AM, Jakub Jelinek via Gcc-patches wrote:
> >> On Tue, Feb 22, 2022 at 12:39:28PM -0500, Andrew MacLeod wrote:
> >>>> That is EH, then there are calls that might not return because they
> >>>> leave
> >>>> in some other way (e.g. longjmp), or might loop forever, might
> >>>> exit, might
> >>>> abort, trap etc.
> >>> Generally speaking, calls which do not return should not now be a
> >>> problem...
> >>> as long as they do not transfer control to somewhere else in the
> >>> current
> >>> function.
> >> I thought all of those cases are very relevant to PR104530.
> >> If we have:
> >>    _1 = ptr_2(D) == 0;
> >>    // unrelated code in the same bb
> >>    _3 = *ptr_2(D);
> >> then in light of PR104288, we can optimize ptr_2(D) == 0 into true
> >> only if
> >> there are no calls inside of "// unrelated code in the same bb"
> >> or if all calls in "// unrelated code in the same bb" are guaranteed to
> >> return exactly once.  Because, if there is a call in there which could
> >> exit (that is the PR104288 testcase), or abort, or trap, or loop
> >> forever,
> >> or throw externally, or longjmp or in any other non-UB way
> >> cause the _1 = ptr_2(D) == 0; stmt to be invoked at runtime but
> >> _3 = *ptr_2(D) not being invoked, then we can't optimize the earlier
> >> comparison because ptr_2(D) could be NULL in a valid program.
> >> While if there are no calls (and no problematic inline asms) and no
> >> trapping
> >> insns in between, we can and PR104530 is asking that we continue to
> >> optimize
> >> that.
> > Right.  This is similar to some of the restrictions we deal with in
> > the path isolation pass.  Essentially we have a path, when traversed,
> > would result in a *0.  We would like to be able to find the edge
> > upon-which the *0 is control dependent and optimize the test so that
> > it always went to the valid path rather than the *0 path.
> >
> > The problem is there may be observable side effects on the *0 path
> > between the test and the actual *0 -- including calls to nonreturning
> > functions, setjmp/longjmp, things that could trap, etc.  This case is
> > similar.  We can't back-propagate the non-null status through any
> > statements with observable side effects.
> >
> > Jeff
> >
> We can't back propagate, but we can alter our forward view.  Any
> ssa-name defined before the observable side effect can be recalculated
> using the updated values, and all uses of those names after the
> side-effect would then appear to be "up-to-date"
>
> This does not actually change anything before the side-effect statement,
> but the lazy re-evalaution ranger employs makes it appear as if we do a
> new computation when _1 is used afterwards. ie:
>
>     _1 = ptr_2(D) == 0;
>     // unrelated code in the same bb
>     _3 = *ptr_2(D);
>     _4 = ptr_2(D) == 0;      // ptr_2 is known to be [+1, +INF] now.
> And we use _4 everywhere _1 was used.   This is the effect.
>
> so we do not actually change anything in the unrelated code, just
> observable effects afterwards.  We already do these recalculations on
> outgoing edges in other blocks, just not within the definition block
> because non-null wasn't visible within the def block.
>
> Additionally, In the testcase, there is a store to C before the side
> effects.
> these patches get rid of the branch and thus the call in the testcase as
> requested, but we still have to compute _3 in order to store it into
> global C since it occurs  pre side-effect.
>
>      b.0_1 = b;
>      _2 = b.0_1 == 0B;
>      _3 = (int) _2;
>      c = _3;
>      _5 = *b.0_1;
>
> No matter how you look at it, you are going to need to process a block
> twice in order to handle any code pre-side-effect.  Whether it be
> assigning stmt uids, or what have you.


Yes.  I thought that is what ranger already does when it discovers new
ranges from edges.  Say we have

  _1 = 10 / _2;
  if (_2 == 1)
    {
       _3 = _1 + 1;

then when evaluating _1 + 1 we re-evaluate 10 / _2 using _2 == 1 and
can compute _3 to [11, 11]?

That obviously extends to any stmt-level ranges we discover for uses
(not defs because defs are never used upthread).  And doing that is
_not_ affected by any function/BB terminating calls or EH or whatnot
as long as the updated ranges are only affecting stmts dominating the
current one.

What complicates all this reasoning is that it is straight-forward when
you work with a traditional IL walking pass but it gets hard (and possibly
easy to get wrong) with on-demand processing and caching because
everything you cache will now be context dependent (valid only
starting after stmt X and for stmts dominated by it).

> VRP could pre-process the block, and if it gets to the end of the block,
> and it had at least one statement with a side effect and no calls which
> may not return you could process the block with all the side effects
> already active.   I'm not sure if that buys as much as the cost, but it
> would change the value written to C to be 1, and it would change the
> global values exported for _2 and _3.
>
> Another option would be flag the ssa-names instead of/as well as marking
> them as stale.  If we get to the end of the block and there were no
> non-returning functions or EH edges, then re-calculate and export those
> ssa_names using the latest values..   That would export [0,0] for _2 and _3.
>
> This would have no tangible impact during the first VRP pass, but the
> *next* VRP pass, (or any other ranger pass) would pick up the new global
> ranges, and do all the right things...  so we basically let a subsequent
> pass pick up the info and do the dirty work.
>
> Andrew
>
>
>
>
>
>
>
>
>

Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.

Reply via email to