On Thu, Oct 26, 2023 at 2:29 AM Richard Biener
<[email protected]> wrote:
>
> On Wed, Oct 25, 2023 at 5:51 AM Andrew Pinski <[email protected]> wrote:
> >
> > I noticed we were missing optimizing `a / (1 << b)` when
> > we know that a is nonnegative but only due to ranger information.
> > This adds the use of the global ranger to tree_single_nonnegative_warnv_p
> > for SSA_NAME.
> > I didn't extend tree_single_nonnegative_warnv_p to use the ranger for
> > floating
> > point nor to use the local ranger since I am not 100% sure it is safe where
> > all of the uses tree_expr_nonnegative_p would be safe.
> >
> > Note pr80776-1.c testcase fails again due to vrp's bad handling of setting
> > global ranges from __builtin_unreachable. It just happened to be optimized
> > before due to global ranges not being used as much.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > PR tree-optimization/111959
> >
> > gcc/ChangeLog:
> >
> > * fold-const.cc (tree_single_nonnegative_warnv_p): Use
> > the global range to see if the SSA_NAME was nonnegative.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/forwprop-42.c: New test.
> > * gcc.dg/pr80776-1.c: xfail and update comment.
> > ---
> > gcc/fold-const.cc | 36 +++++++++++++++------
> > gcc/testsuite/gcc.dg/pr80776-1.c | 8 ++---
> > gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c | 15 +++++++++
> > 3 files changed, 46 insertions(+), 13 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index 40767736389..2a2a90230f5 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -15047,15 +15047,33 @@ tree_single_nonnegative_warnv_p (tree t, bool
> > *strict_overflow_p, int depth)
> > return RECURSE (TREE_OPERAND (t, 1)) && RECURSE (TREE_OPERAND (t,
> > 2));
> >
> > case SSA_NAME:
> > - /* Limit the depth of recursion to avoid quadratic behavior.
> > - This is expected to catch almost all occurrences in practice.
> > - If this code misses important cases that unbounded recursion
> > - would not, passes that need this information could be revised
> > - to provide it through dataflow propagation. */
> > - return (!name_registered_for_update_p (t)
> > - && depth < param_max_ssa_name_query_depth
> > - && gimple_stmt_nonnegative_warnv_p (SSA_NAME_DEF_STMT (t),
> > - strict_overflow_p,
> > depth));
> > + {
> > + /* For integral types, querry the global range if possible. */
>
> query
>
> > + if (INTEGRAL_TYPE_P (TREE_TYPE (t)))
> > + {
> > + value_range vr;
> > + if (get_global_range_query ()->range_of_expr (vr, t)
> > + && !vr.varying_p () && !vr.undefined_p ())
> > + {
> > + /* If the range is nonnegative, return true. */
> > + if (vr.nonnegative_p ())
> > + return true;
> > +
> > + /* If the range is non-positive, then return false. */
> > + if (vr.nonpositive_p ())
> > + return false;
>
> That's testing for <= 0, nonnegative for >= 0. This means when
> vr.nonpositive_p () the value could still be zero (and nonnegative),
> possibly be figured out by the recursion below.
>
> Since we don't have negative_p () do we want to test
> nonpositive_p () && nonzero_p () instead?
I was thinking about that when I was writing the patch.
If the ranger figured out the value was zero, nonnegative_p would have
returned true.
So while yes nonpositive_p() would return true but we already checked
nonnegative_p beforehand and the nonzero_p would not matter.
Now the question is if after nonnegative_p we check if the range could
contain 0 still is that worth the recursion. The whole idea of
returning false was to remove the need from recursion as much.
Thanks,
Andrew
>
> OK with that change.
>
> Richard.
>
> > + }
> > + }
> > + /* Limit the depth of recursion to avoid quadratic behavior.
> > + This is expected to catch almost all occurrences in practice.
> > + If this code misses important cases that unbounded recursion
> > + would not, passes that need this information could be revised
> > + to provide it through dataflow propagation. */
> > + return (!name_registered_for_update_p (t)
> > + && depth < param_max_ssa_name_query_depth
> > + && gimple_stmt_nonnegative_warnv_p (SSA_NAME_DEF_STMT (t),
> > + strict_overflow_p,
> > depth));
> > + }
> >
> > default:
> > return tree_simple_nonnegative_warnv_p (TREE_CODE (t), TREE_TYPE
> > (t));
> > diff --git a/gcc/testsuite/gcc.dg/pr80776-1.c
> > b/gcc/testsuite/gcc.dg/pr80776-1.c
> > index b9bce62d982..f3d47aeda36 100644
> > --- a/gcc/testsuite/gcc.dg/pr80776-1.c
> > +++ b/gcc/testsuite/gcc.dg/pr80776-1.c
> > @@ -18,14 +18,14 @@ Foo (void)
> > if (! (0 <= i && i <= 999999))
> > __builtin_unreachable ();
> >
> > - /* Legacy evrp sets the range of i to [0, MAX] *before* the first
> > conditional,
> > + /* vrp1 sets the range of i to [0, MAX] *before* the first conditional,
> > and to [0,999999] *before* the second conditional. This is because
> > both
> > - evrp and VRP use trickery to set global ranges when this particular
> > use of
> > + vrp use trickery to set global ranges when this particular use of
> > a __builtin_unreachable is in play (see uses of
> > assert_unreachable_fallthru_edge_p).
> >
> > - Setting these ranges at the definition site, causes VRP to remove the
> > + Setting these ranges at the definition site, causes other passes to
> > remove the
> > unreachable code altogether, leaving the following sprintf unguarded.
> > This
> > causes the bogus warning below. */
> > - sprintf (number, "%d", i); /* { dg-bogus "writing" "" } */
> > + sprintf (number, "%d", i); /* { dg-bogus "writing" "" { xfail *-*-* } }
> > */
> > }
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
> > new file mode 100644
> > index 00000000000..4e5421ed4d4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +/* PR tree-optimization/111959 */
> > +
> > +int divbypow2(int a, int b)
> > +{
> > + if (a & ~0xff) __builtin_unreachable();
> > + return a / (1<<b);
> > +}
> > +
> > +/* divbypow2 should be able to optimize to just a/b as a is known to be
> > always positive. */
> > +/* { dg-final { scan-tree-dump-not " / " "optimized" } } */
> > +/* { dg-final { scan-tree-dump-not " << " "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times " >> " 1 "optimized" } } */
> > +
> > --
> > 2.39.3
> >