On Thu, 28 Sep 2023, Jakub Jelinek wrote: > Hi! > > On Tue, Aug 29, 2023 at 05:09:52PM +0200, Jakub Jelinek via Gcc-patches wrote: > > On Tue, Aug 29, 2023 at 11:42:48AM +0100, Richard Sandiford wrote: > > > > I'll note tree-ssa-loop-niter.cc also uses GMP in some cases, widest_int > > > > is really trying to be poor-mans GMP by limiting the maximum precision. > > > > > > I'd characterise widest_int as "a wide_int that is big enough to hold > > > all supported integer types, without losing sign information". It's > > > not big enough to do arbitrary arithmetic without losing precision > > > (in the way that GMP is). > > > > > > If the new limit on integer sizes is 65535 bits for all targets, > > > then I think that means that widest_int needs to become a 65536-bit type. > > > (But not with all bits represented all the time, of course.) > > > > If the widest_int storage would be dependent on the len rather than > > precision for how it is stored, then I think we'd need a new method which > > would be called at the start of filling the limbs where we'd tell how many > > limbs there would be (i.e. what will set_len be called with later on), and > > do nothing for all storages but the new widest_int_storage. > > So, I've spent some time on this. While wide_int is in the patch a > fixed/variable > number of limbs (aka len) storage depending on precision (precision > > WIDE_INT_MAX_PRECISION means heap allocated limb array, otherwise it is > inline), widest_int has always very large precision > (WIDEST_INT_MAX_PRECISION, currently defined to the INTEGER_CST imposed > limitation of 255 64-bit limbs) but uses inline array for length > corresponding up to WIDE_INT_MAX_PRECISION bits and for larger one uses > similarly to wide_int a heap allocated array of limbs. > These changes make both wide_int and widest_int obviously non-POD, not > trivially default constructible, nor trivially copy constructible, trivially > destructible, trivially copyable, so not a good fit for GC and some vec > operations. > One common use of wide_int in GC structures was in dwarf2out.{h,cc}; but as > large _BitInt constants don't appear in RTL, we really don't need such large > precisions there. > So, for wide_int the patch introduces rwide_int, restricted wide_int, which > acts like the old wide_int (except that it is now trivially default > constructible and has assertions precision isn't set above > WIDE_INT_MAX_PRECISION). > For widest_int, the nastiness is that because it always has huge precision > of 16320 right now, > a) we need to be told upfront in wide-int.h before calling the large > value internal functions in wide-int.cc how many elements we'll need for > the result (some reasonable upper estimate is fine) > b) various of the wide-int.cc functions were lazy and assumed precision is > small enough and often used up to that many elements, which is > undesirable; so, it now tries to decreas that and use xi.len etc. based > estimates instead if possible (sometimes only if precision is above > WIDE_INT_MAX_PRECISION) > c) with the higher precision, behavior changes for lrshift (-1, 2) etc. or > unsigned division with dividend having most significant bit set in > widest_int - while such values were considered to be above or equal to > 1 << (WIDE_INT_MAX_PRECISION - 2), now they are with > WIDEST_INT_MAX_PRECISION and so much larger; but lrshift on widest_int > is I think only done in ccp and I'd strongly hope that we treat the > values as unsigned and so usually much smaller length; so it is just > when we call wi::lrshift (-1, 2) or similar that results change. > I've noticed that for wide_int or widest_int references even simple > operations like eq_p liked to allocate and immediately free huge buffers, > which was caused by wide_int doing allocation on creation with a particular > precision and e.g. get_binary_precision running into that. So, I've > duplicated that to avoid the allocations when all we need is just a > precision. > > The patch below doesn't actually build anymore since the vec.h asserts > (which point to useful stuff though), so temporarily I've applied it also > with > --- gcc/vec.h.xx 2023-09-28 12:56:09.055786055 +0200 > +++ gcc/vec.h 2023-09-28 13:15:31.760487111 +0200 > @@ -1197,7 +1197,7 @@ template<typename T, typename A> > inline void > vec<T, A, vl_embed>::qsort (int (*cmp) (const void *, const void *)) > { > - static_assert (vec_detail::is_trivially_copyable_or_pair <T>::value, ""); > +// static_assert (vec_detail::is_trivially_copyable_or_pair <T>::value, ""); > if (length () > 1) > gcc_qsort (address (), length (), sizeof (T), cmp); > } > @@ -1422,7 +1422,7 @@ template<typename T> > void > gt_ggc_mx (vec<T, va_gc> *v) > { > - static_assert (std::is_trivially_destructible <T>::value, ""); > +// static_assert (std::is_trivially_destructible <T>::value, ""); > extern void gt_ggc_mx (T &); > for (unsigned i = 0; i < v->length (); i++) > gt_ggc_mx ((*v)[i]); > hack. The two spots that trigger are tree-ssa-loop-niter.cc doing qsort on > widest_int vector (to be exact, swapping elements in the vector of
For this (besides choosing a fixed smaller widest_int as indicated in the other mail) sorting could be done indirect by sorting a [0, 1, 2 ... n-1 ] vector instead. > And, now the question is what to do about this. I guess for omp_general > I could just use generic_wide_int <fixed_wide_int_storage <1024> > or > something similar, after all the widest_int wasn't really great when it > had maximum precision of WIDE_INT_MAX_PRECISION, different values on > different targets, it has very few uses and is easy to change (thinking > about this, makes me wonder what we do for offloading if offload host > has different WIDE_INT_MAX_PRECISION from offload target). > > But the more important question is what to do about loop/niters analysis. > I think for number of iteration analysis it might be ok to punt somehow > (if there is a way to tell that number of iterations is unknown) if we > get some bound which is too large to be expressible in some reasonably small > fixed precision (whether it is WIDE_INT_MAX_PRECISION, or something > different is a question). We could either introduce yet another widest_int > like storage which would have still WIDEST_INT_MAX_PRECISION precision, but > would ICE if length is set to something above its fixed width. One problem > is that the write_val estimations are often just conservatively larger and > could trigger even if the value fits in the end. Or we could use > generic_wide_int <fixed_wide_int_storage <WIDE_INT_MAX_PRECISION> > (perhaps > call that rwidest_int), the drawback would be that it would be slightly harder > to use as it has different precision from widest_int, we'd need to do some > from on it or the like. Plus I really don't know the niters code to know > how to punt. I think when widest_int is no longer bound by something like the largest integer mode but now has to cater for arbitrary large _BitInt we have to get rid of widest_int or we have to make it variable-precision and reallocate it like auto_vec<T, n>. For GC we can have the storage still heap allocated but of course CTOR/DTOR is going to be a pain (so better not use widest_int in GC). > ipa_bits is even worse, because unlike niter analysis, I think it is very > much desirable to support IPA VRP of all supported _BitInt sizes. Shall > we perhaps use trailing_wide_int storage in there, or conditionally > rwidest_int vs. INTEGER_CSTs for stuff that doesn't fit, something else? trailing_wide_int storage is the way to go here > What about slsr? This is after bitint lowering, so it shouldn't be > performing opts on larger BITINT_TYPEs and so could also go with the > rwidest_int. Just to say I don't really like adding another "widest" int, but slsr shouldn't need to GC any of that so widest_int should be fine? Richard.