Hi,

On Mon, Nov 03 2025, Richard Biener wrote:
> On Sun, Nov 2, 2025 at 10:24 AM Richard Biener
> <[email protected]> wrote:
>>
>> On Fri, Oct 31, 2025 at 10:47 PM Martin Jambor <[email protected]> wrote:
>> >
>> > Hi,
>> >
>> > this patch adds the ability to infer ranges from loads from global
>> > constant static aggregates which have static initializers.  Even when
>> > the load has an ARRAY_REF with an unknown index and thus we do not know
>> > the particular constant that is being loaded, we can traverse the
>> > corresponding elements of the initializer and see if we know in what
>> > range(s) the loaded value must fall - or for pointers we can sometimes
>> > infer that the value cannot be NULL.
>> >
>> > I thought this was similar to fold_using_range::range_of_address and
>> > so I decided to put my implementation alongside of it.  I hope that is
>> > the correct place.
>> >
>> > Bootstrapped, LTO-bootstrapped and tested on x86_64-linux, bootstrap
>> > on aarch64-linux is underway.  OK for master if it passes?
>>
>> Comments below.
>>
>> > Thanks,
>> >
>> > Martin
>> >
>> >
>> > gcc/ChangeLog:
>> >
>> > 2025-10-31  Martin Jambor  <[email protected]>
>> >
>> >         * gimple-range-fold.h (class fold_using_range): New member
>> >         function range_from_readonly_var.
>> >         * gimple-range-fold.cc (fold_using_range::fold_stmt): Call
>> >         range_from_readonly_var on assignments.
>> >         (add_loaded_invariant_to_range): New function.
>> >         (range_from_readonly_load): Likewise.
>> >         (fold_using_range::range_from_readonly_var): Likewise.
>> >         * params.opt (param_vrp_cstload_limit): New.
>> >         * doc/invoke.texi (vrp-cstload-limit): Likewise.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > 2025-10-31  Martin Jambor  <[email protected]>
>> >
>> >         * gcc.dg/tree-ssa/vrp-from-cst-agg-1.c: New test.
>> >         * gcc.dg/tree-ssa/vrp-from-cst-agg-2.c: Likewise.
>> >         * gcc.dg/tree-ssa/vrp-from-cst-agg-3.c: Likewise.
>> >         * gcc.dg/tree-ssa/vrp-from-cst-agg-4.c: Likewise.
>> >         * gcc.dg/ipa/vrp-from-cst-agg-5.c: Likewise.
>> > ---
>> >  gcc/doc/invoke.texi                           |   3 +
>> >  gcc/gimple-range-fold.cc                      | 214 +++++++++++++++++-
>> >  gcc/gimple-range-fold.h                       |   1 +
>> >  gcc/params.opt                                |   4 +
>> >  gcc/testsuite/gcc.dg/ipa/vrp-from-cst-agg-5.c |  33 +++
>> >  .../gcc.dg/tree-ssa/vrp-from-cst-agg-1.c      |  36 +++
>> >  .../gcc.dg/tree-ssa/vrp-from-cst-agg-2.c      |  57 +++++
>> >  .../gcc.dg/tree-ssa/vrp-from-cst-agg-3.c      |  34 +++
>> >  .../gcc.dg/tree-ssa/vrp-from-cst-agg-4.c      |  34 +++
>> >  9 files changed, 412 insertions(+), 4 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.dg/ipa/vrp-from-cst-agg-5.c
>> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-from-cst-agg-1.c
>> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-from-cst-agg-2.c
>> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-from-cst-agg-3.c
>> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-from-cst-agg-4.c
>> >
>> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> > index b40fc892fa0..c2542f8b882 100644
>> > --- a/gcc/doc/invoke.texi
>> > +++ b/gcc/doc/invoke.texi
>> > @@ -17632,6 +17632,9 @@ Increasing the cost multiplier will make vector 
>> > loops more profitable.
>> >  @item vrp-block-limit
>> >  Maximum number of basic blocks before VRP switches to a lower memory 
>> > algorithm.
>> >
>> > +@item vrp-cstload-limit
>> > +Maximum number of VRP intervals inferred from a load from a constant 
>> > aggregate.
>> > +
>> >  @item vrp-sparse-threshold
>> >  Maximum number of basic blocks before VRP uses a sparse bitmap cache.
>> >
>> > diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
>> > index 06c645f3d08..b455589428e 100644
>> > --- a/gcc/gimple-range-fold.cc
>> > +++ b/gcc/gimple-range-fold.cc
>> > @@ -639,10 +639,14 @@ fold_using_range::fold_stmt (vrange &r, gimple *s, 
>> > fur_source &src, tree name)
>> >    if (!name)
>> >      name = gimple_get_lhs (s);
>> >
>> > -  // Process addresses.
>> > -  if (gimple_code (s) == GIMPLE_ASSIGN
>> > -      && gimple_assign_rhs_code (s) == ADDR_EXPR)
>> > -    return range_of_address (as_a <prange> (r), s, src);
>> > +  // Process addresses and loads from static constructors.
>> > +  if (gimple_code (s) == GIMPLE_ASSIGN)
>> > +    {
>> > +      if (gimple_assign_rhs_code (s) == ADDR_EXPR)
>> > +       return range_of_address (as_a <prange> (r), s, src);
>> > +      if (range_from_readonly_var (r, s))
>> > +       return true;
>> > +    }
>> >
>> >    gimple_range_op_handler handler (s);
>> >    if (handler)
>> > @@ -891,6 +895,208 @@ fold_using_range::range_of_address (prange &r, 
>> > gimple *stmt, fur_source &src)
>> >    return true;
>> >  }
>> >
>> > +// VAL must be an interprocedural invariant.  If VAL is a pointer which 
>> > cannot
>> > +// be zero, then set it to nonzero if it is undefined and return true.  
>> > If VAL
>> > +// is a pointer which can be zero return false.  If VAL is of another 
>> > type, add
>> > +// the constant to the set of ranges to R and return true.
>> > +
>> > +static bool
>> > +add_loaded_invariant_to_range (vrange &r, tree val)
>> > +{
>> > +  if (POINTER_TYPE_P (TREE_TYPE (val)))
>> > +    {
>> > +      bool strict_overflow_p;
>> > +      if (tree_single_nonzero_warnv_p (val, &strict_overflow_p))
>> > +       {
>> > +         if (r.undefined_p ())
>> > +           r.set_nonzero (TREE_TYPE (val));
>> > +         return true;
>> > +       }
>> > +      else
>> > +       return false;
>> > +    }
>> > +  else
>> > +    {
>> > +      if (TREE_CODE (val) == REAL_CST
>> > +         && real_isnan (TREE_REAL_CST_PTR (val)))
>> > +       return false;
>> > +
>> > +      if (is_a <irange> (r))
>> > +       {
>> > +         irange &ir = as_a <irange> (r);
>> > +         if (!r.contains_p (val)
>
> The r.contains_p case could be an early out.

True :-) But as I am about to change this all to limit walking instead
of counting pairs, I am removing this code anyway.

> Why is this whole block not done for FP aka frange?

Because, AFAIU, FP ranges do not have multiple pairs, there is no
frange::num_pairs() method.

> I'd suggest to simply strip the patch from any non-irange handling for
> now and handle prange/frange as followups.

I think the code already works for pranges and can be useful, as
demonstrated by one of the added test-cases, so I'd like to keep them.

OTOH, a variant of vrp-from-cst-agg-3.c but with floating point numbers
does not work "out of the box" for reasons I cannot yet explain - except
that I verified that the code I'm adding does what it is supposed to.
So I am willing to handle just pranges and iranges.  I think that the
main simplification, however, will be that...

>
>> > +             && ir.num_pairs () >= (unsigned) param_vrp_cstload_limit)
>> > +           return false;
>> > +       }
>> > +
>> > +      value_range tmp (val, val, VR_RANGE);
>> > +      if (r.undefined_p ())
>> > +       r = tmp;
>> > +      else
>> > +       r.union_ (tmp);
>
> Possibly not the very most efficient way to handle incremental build
> of a range?

...I be able to rewrite this as

  r.union_(int_range<1> (type, val, val));

...which, AFAIU, is what the irange code currently does in similar
situations.

I think that attempting anything more efficient, such as extending the
closest range rather than adding another one if there are too many,
would mean extending the API of the carious ranger classes, something I
would like to avoid at this stage.

>
>> > +      return true;
>> > +    }
>> > +}
>> > +
>> > +// One step of fold_using_range::range_from_readonly_var.  Process 
>> > expressions
>> > +// in comps from index I to 0 according to the corresponding static 
>> > initializer
>> > +// in CONSTUCTOR, adding all possible loaded constants (which must be 
>> > storable
>> > +// to TYPE) to R.  Return true if the all necessary values were extracted 
>> > and
>> > +// added to R successfully.
>> > +
>> > +static bool
>> > +range_from_readonly_load (vrange &r, tree type, tree constructor,
>> > +                         const vec <tree> &comps, int i)
>> > +{
>> > +  unsigned ix;
>> > +  tree index, val;
>> > +  tree expr = comps[i];
>> > +
>> > +  gcc_assert (TREE_CODE (expr) == COMPONENT_REF
>> > +             || TREE_CODE (expr) == ARRAY_REF);
>> > +  tree op1 = TREE_OPERAND (expr, 1);
>>
>> Please use get_array_ctor_element_at_index for constant indexing into
>> an ARRAY_REF.  I think you should be able to use fold_const_aggregate_ref
>> to get you the CTOR for the "inner" part when you pass it the parent of the
>> single variable ref.  Then you can use the same function to fold you the
>> outer (also constant refs) by tacking the resulting CTOR as base to the
>> outer ref with the constant interesting index.
>>
>> IMO the recursion doesn't help readability.
>>
>> I think the param, instead of limiting the range complexity, needs to
>> limit the amount
>> of CTOR walking done.  Otherwise we'd scan a multi-gigabyte two-valued CTOR
>> like {0, 1, 0, 1 ..} with arr[i] many many times.  Having a CTOR
>> pre-classified by
>> IPA would be nice.
>>
>> > +  if (TREE_CODE (expr) == COMPONENT_REF
>> > +      || TREE_CODE (op1) == INTEGER_CST)
>> > +    {
>> > +      FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (constructor), ix, 
>> > index, val)
>> > +       {
>> > +         if (TREE_CODE (index) == FIELD_DECL)
>> > +           {
>> > +             if (index != op1)
>> > +               continue;
>> > +           }
>> > +         else if (TREE_CODE (index) == INTEGER_CST
>> > +                  && TREE_CODE (op1) == INTEGER_CST)
>> > +           {
>> > +             if (!tree_int_cst_equal (index, op1))
>> > +               continue;
>> > +           }
>> > +         else
>> > +           gcc_unreachable ();
>> > +
>> > +         if (TREE_CODE (val) == CONSTRUCTOR)
>> > +           {
>> > +             if (i > 0)
>> > +               return range_from_readonly_load (r, type, val, comps, i - 
>> > 1);
>> > +             else
>> > +               return false;
>> > +           }
>> > +         if (is_gimple_ip_invariant (val))
>> > +           {
>> > +             if (i != 0
>> > +                 || (!useless_type_conversion_p (type, TREE_TYPE (val))))
>> > +               return false;
>> > +
>> > +             return add_loaded_invariant_to_range (r, val);
>> > +           }
>> > +         return false;
>> > +       }
>> > +      return false;
>> > +    }
>> > +
>> > +  tree arr_type = TREE_TYPE (constructor);
>> > +  tree domain = TYPE_DOMAIN (arr_type);
>> > +  if (!TYPE_MIN_VALUE (domain)
>> > +      || !TYPE_MAX_VALUE (domain)
>> > +      || !tree_fits_uhwi_p (TYPE_MIN_VALUE (domain))
>> > +      || !tree_fits_uhwi_p (TYPE_MAX_VALUE (domain)))
>> > +    return false;
>> > +  unsigned HOST_WIDE_INT needed_count
>> > +    = (tree_to_uhwi (TYPE_MAX_VALUE (domain))
>> > +       - tree_to_uhwi (TYPE_MIN_VALUE (domain)) + 1);
>> > +  unsigned HOST_WIDE_INT count = 0;
>> > +
>> > +  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (constructor), ix, index, 
>> > val)
>> > +    {
>> > +      /* TODO: If the array index in the expr is an SSA_NAME with a known
>> > +        range, we could use just values loaded from the corresponding 
>> > array
>> > +        elements.  */
>>
>> Until you do this this really boils down to the complete static CTOR range 
>> which
>> should be easier to cache than the VR, CTOR pair you'd have to use otherwise.
>>
>> > +      count++;
>> > +      if (TREE_CODE (val) == CONSTRUCTOR)
>> > +       {
>> > +         if (i <= 0
>> > +             || !range_from_readonly_load (r, type, val, comps, i - 1))
>> > +           return false;
>> > +       }
>> > +      else if (is_gimple_ip_invariant (val))
>> > +       {
>> > +         if (i != 0
>> > +             || (!useless_type_conversion_p (type, TREE_TYPE (val))))
>> > +           return false;
>> > +
>> > +         if (!add_loaded_invariant_to_range (r, val))
>> > +           return false;
>> > +       }
>> > +      else
>> > +       return false;
>> > +    }
>> > +
>> > +  if (count < needed_count)
>> > +    {
>
> Since you do not handle RANGE_EXPR or RAW_DATA_CST explicitly a much faster
> way to reject incomplete CTORs would be to check against CONSTRUCTR_NELTS
> before iterating over all elements.  I'll note that even missing
> aggregate initializers
> will just result in zero initialization down-path, so it seems that
> the "missing initializers"
> case could be just handled at the outermost recursion level by adding zero.
> Likewise that supports_p (type) is then hopefully redundant.

Thanks a lot for the pointers.  I knew I was writing the code very
defensively, this will hopefully make it all a bit a clearer and useful
in more situations.

Though the recursion and the "opposite" walking will stay.  Static
constructors and tree memory references work in "opposite" way,
unfortunately.

Martin

Reply via email to