Re: [RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int

Richard Biener Fri, 29 Sep 2023 02:49:24 -0700

On Thu, 28 Sep 2023, Jakub Jelinek wrote:

> Hi!
> 
> On Tue, Aug 29, 2023 at 05:09:52PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > On Tue, Aug 29, 2023 at 11:42:48AM +0100, Richard Sandiford wrote:
> > > > I'll note tree-ssa-loop-niter.cc also uses GMP in some cases, widest_int
> > > > is really trying to be poor-mans GMP by limiting the maximum precision.
> > > 
> > > I'd characterise widest_int as "a wide_int that is big enough to hold
> > > all supported integer types, without losing sign information".  It's
> > > not big enough to do arbitrary arithmetic without losing precision
> > > (in the way that GMP is).
> > > 
> > > If the new limit on integer sizes is 65535 bits for all targets,
> > > then I think that means that widest_int needs to become a 65536-bit type.
> > > (But not with all bits represented all the time, of course.)
> > 
> > If the widest_int storage would be dependent on the len rather than
> > precision for how it is stored, then I think we'd need a new method which
> > would be called at the start of filling the limbs where we'd tell how many
> > limbs there would be (i.e. what will set_len be called with later on), and
> > do nothing for all storages but the new widest_int_storage.
> 
> So, I've spent some time on this.  While wide_int is in the patch a 
> fixed/variable
> number of limbs (aka len) storage depending on precision (precision >
> WIDE_INT_MAX_PRECISION means heap allocated limb array, otherwise it is
> inline), widest_int has always very large precision
> (WIDEST_INT_MAX_PRECISION, currently defined to the INTEGER_CST imposed
> limitation of 255 64-bit limbs) but uses inline array for length
> corresponding up to WIDE_INT_MAX_PRECISION bits and for larger one uses
> similarly to wide_int a heap allocated array of limbs.
> These changes make both wide_int and widest_int obviously non-POD, not
> trivially default constructible, nor trivially copy constructible, trivially
> destructible, trivially copyable, so not a good fit for GC and some vec
> operations.
> One common use of wide_int in GC structures was in dwarf2out.{h,cc}; but as
> large _BitInt constants don't appear in RTL, we really don't need such large
> precisions there.
> So, for wide_int the patch introduces rwide_int, restricted wide_int, which
> acts like the old wide_int (except that it is now trivially default
> constructible and has assertions precision isn't set above
> WIDE_INT_MAX_PRECISION).
> For widest_int, the nastiness is that because it always has huge precision
> of 16320 right now,
> a) we need to be told upfront in wide-int.h before calling the large
>    value internal functions in wide-int.cc how many elements we'll need for
>    the result (some reasonable upper estimate is fine)
> b) various of the wide-int.cc functions were lazy and assumed precision is
>    small enough and often used up to that many elements, which is
>    undesirable; so, it now tries to decreas that and use xi.len etc. based
>    estimates instead if possible (sometimes only if precision is above
>    WIDE_INT_MAX_PRECISION)
> c) with the higher precision, behavior changes for lrshift (-1, 2) etc. or
>    unsigned division with dividend having most significant bit set in
>    widest_int - while such values were considered to be above or equal to
>    1 << (WIDE_INT_MAX_PRECISION - 2), now they are with
>    WIDEST_INT_MAX_PRECISION and so much larger; but lrshift on widest_int
>    is I think only done in ccp and I'd strongly hope that we treat the
>    values as unsigned and so usually much smaller length; so it is just
>    when we call wi::lrshift (-1, 2) or similar that results change.
> I've noticed that for wide_int or widest_int references even simple
> operations like eq_p liked to allocate and immediately free huge buffers,
> which was caused by wide_int doing allocation on creation with a particular
> precision and e.g. get_binary_precision running into that.  So, I've
> duplicated that to avoid the allocations when all we need is just a
> precision.
> 
> The patch below doesn't actually build anymore since the vec.h asserts
> (which point to useful stuff though), so temporarily I've applied it also
> with
> --- gcc/vec.h.xx      2023-09-28 12:56:09.055786055 +0200
> +++ gcc/vec.h 2023-09-28 13:15:31.760487111 +0200
> @@ -1197,7 +1197,7 @@ template<typename T, typename A>
>  inline void
>  vec<T, A, vl_embed>::qsort (int (*cmp) (const void *, const void *))
>  {
> -  static_assert (vec_detail::is_trivially_copyable_or_pair <T>::value, "");
> +//  static_assert (vec_detail::is_trivially_copyable_or_pair <T>::value, "");
>    if (length () > 1)
>      gcc_qsort (address (), length (), sizeof (T), cmp);
>  }
> @@ -1422,7 +1422,7 @@ template<typename T>
>  void
>  gt_ggc_mx (vec<T, va_gc> *v)
>  {
> -  static_assert (std::is_trivially_destructible <T>::value, "");
> +//  static_assert (std::is_trivially_destructible <T>::value, "");
>    extern void gt_ggc_mx (T &);
>    for (unsigned i = 0; i < v->length (); i++)
>      gt_ggc_mx ((*v)[i]);
> hack.  The two spots that trigger are tree-ssa-loop-niter.cc doing qsort on
> widest_int vector (to be exact, swapping elements in the vector of


For this (besides choosing a fixed smaller widest_int as indicated in
the other mail) sorting could be done indirect by sorting a
[0, 1, 2 ... n-1 ] vector instead.

> And, now the question is what to do about this.  I guess for omp_general
> I could just use generic_wide_int <fixed_wide_int_storage <1024> > or
> something similar, after all the widest_int wasn't really great when it
> had maximum precision of WIDE_INT_MAX_PRECISION, different values on
> different targets, it has very few uses and is easy to change (thinking
> about this, makes me wonder what we do for offloading if offload host
> has different WIDE_INT_MAX_PRECISION from offload target).
> 
> But the more important question is what to do about loop/niters analysis.
> I think for number of iteration analysis it might be ok to punt somehow
> (if there is a way to tell that number of iterations is unknown) if we
> get some bound which is too large to be expressible in some reasonably small
> fixed precision (whether it is WIDE_INT_MAX_PRECISION, or something
> different is a question).  We could either introduce yet another widest_int
> like storage which would have still WIDEST_INT_MAX_PRECISION precision, but
> would ICE if length is set to something above its fixed width.  One problem
> is that the write_val estimations are often just conservatively larger and
> could trigger even if the value fits in the end.  Or we could use
> generic_wide_int <fixed_wide_int_storage <WIDE_INT_MAX_PRECISION> > (perhaps
> call that rwidest_int), the drawback would be that it would be slightly harder
> to use as it has different precision from widest_int, we'd need to do some
> from on it or the like.  Plus I really don't know the niters code to know
> how to punt.

I think when widest_int is no longer bound by something like the
largest integer mode but now has to cater for arbitrary large _BitInt
we have to get rid of widest_int or we have to make it variable-precision
and reallocate it like auto_vec<T, n>.

For GC we can have the storage still heap allocated but of course
CTOR/DTOR is going to be a pain (so better not use widest_int in GC).

> ipa_bits is even worse, because unlike niter analysis, I think it is very
> much desirable to support IPA VRP of all supported _BitInt sizes.  Shall
> we perhaps use trailing_wide_int storage in there, or conditionally
> rwidest_int vs. INTEGER_CSTs for stuff that doesn't fit, something else?

trailing_wide_int storage is the way to go here

> What about slsr?  This is after bitint lowering, so it shouldn't be
> performing opts on larger BITINT_TYPEs and so could also go with the
> rwidest_int.

Just to say I don't really like adding another "widest" int, but
slsr shouldn't need to GC any of that so widest_int should be fine?

Richard.

Re: [RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int

Reply via email to