On Mon, May 15, 2023 at 12:35 PM Aldy Hernandez <al...@redhat.com> wrote: > > <tldr> > We can now have int_range<N, RESIZABLE=false> for automatically > resizable ranges. int_range_max is now int_range<3, true> > for a 69X reduction in size from current trunk, and 6.9X reduction from > GCC12. This incurs a 5% performance penalty for VRP that is more than > covered by our > 13% improvements recently. > </tldr> > > int_range_max is the temporary range object we use in the ranger for > integers. With the conversion to wide_int, this structure bloated up > significantly because wide_ints are huge (80 bytes a piece) and are > about 10 times as big as a plain tree. Since the temporary object > requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word. > This means the structure grew from 4112 bytes to 40912 bytes. > > This patch adds the ability to resize ranges as needed, defaulting to > no resizing, while int_range_max now defaults to 3 sub-ranges (instead > of 255) and grows to 255 when the range being calculated does not fit. > > For example: > > int_range<1> foo; // 1 sub-range with no resizing. > int_range<5> foo; // 5 sub-ranges with no resizing. > int_range<5, true> foo; // 5 sub-ranges with resizing. > > I ran some tests and found that 3 sub-ranges cover 99% of cases, so > I've set the int_range_max default to that: > > typedef int_range<3, /*RESIZABLE=*/true> int_range_max; > > We don't bother growing incrementally, since the default covers most > cases and we have a 255 hard-limit. This hard limit could be reduced > to 128, since my tests never saw a range needing more than 124, but we > could do that as a follow-up if needed. > > With 3-subranges, int_range_max is now 592 bytes versus 40912 for > trunk, and versus 4112 bytes for GCC12! The penalty is 5.04% for VRP > and 3.02% for threading, with no noticeable change in overall > compilation (0.27%). This is more than covered by our 13.26% > improvements for the legacy removal + wide_int conversion.
Thanks for doing this. > I think this approach is a good alternative, while providing us with > flexibility going forward. For example, we could try defaulting to a > 8 sub-ranges for a noticeable improvement in VRP. We could also use > large sub-ranges for switch analysis to avoid resizing. > > Another approach I tried was always resizing. With this, we could > drop the whole int_range<N> nonsense, and have irange just hold a > resizable range. This simplified things, but incurred a 7% penalty on > ipa_cp. This was hard to pinpoint, and I'm not entirely convinced > this wasn't some artifact of valgrind. However, until we're sure, > let's avoid massive changes, especially since IPA changes are coming > up. > > For the curious, a particular hot spot for IPA in this area was: > > ipcp_vr_lattice::meet_with_1 (const value_range *other_vr) > { > ... > ... > value_range save (m_vr); > m_vr.union_ (*other_vr); > return m_vr != save; > } > > The problem isn't the resizing (since we do that at most once) but the > fact that for some functions with lots of callers we end up a huge > range that gets copied and compared for every meet operation. Maybe > the IPA algorithm could be adjusted somehow??. Well, the above just wants to know whether the union_ operation changed the range. I suppose that would be an interesting (and easy to compute?) secondary output of union_ and it seems it already computes that (but maybe not correctly?). So I suggest to change the above to bool res; if (flag_checking) { value_range save (m_vr); res = m_vr.union_ (*other_vr); gcc_assert (res == (m_vr != save)); } else res = m_vr.union (*other_vr); return res; Btw, why's there a trailing underscore for union but not intersect? Richard. > Anywhooo... for now there is nothing to worry about, since value_range > still has 2 subranges and is not resizable. But we should probably > think what if anything we want to do here, as I envision IPA using > infinite ranges here (well, int_range_max) and handling frange's, etc. > > I'll hold off a day or two, as I'd appreciate feedback here. > > gcc/ChangeLog: > > PR tree-optimization/109695 > * value-range.cc (irange::operator=): Resize range. > (irange::union_): Same. > (irange::intersect): Same. > (irange::invert): Same. > (int_range_max): Default to 3 sub-ranges and resize as needed. > * value-range.h (irange::maybe_resize): New. > (~int_range): New. > (int_range::int_range): Adjust for resizing. > (int_range::operator=): Same. > --- > gcc/value-range.cc | 14 +++++++ > gcc/value-range.h | 98 ++++++++++++++++++++++++++++++++-------------- > 2 files changed, 82 insertions(+), 30 deletions(-) > > diff --git a/gcc/value-range.cc b/gcc/value-range.cc > index def9299dc0e..cea4ff59254 100644 > --- a/gcc/value-range.cc > +++ b/gcc/value-range.cc > @@ -901,6 +901,9 @@ frange::set_nonnegative (tree type) > irange & > irange::operator= (const irange &src) > { > + int needed = src.num_pairs (); > + maybe_resize (needed); > + > unsigned x; > unsigned lim = src.m_num_ranges; > if (lim > m_max_ranges) > @@ -1340,6 +1343,7 @@ irange::union_ (const vrange &v) > // Now it simply needs to be copied, and if there are too many > // ranges, merge some. We wont do any analysis as to what the > // "best" merges are, simply combine the final ranges into one. > + maybe_resize (i / 2); > if (i > m_max_ranges * 2) > { > res[m_max_ranges * 2 - 1] = res[i - 1]; > @@ -1439,6 +1443,11 @@ irange::intersect (const vrange &v) > if (r.irange_contains_p (*this)) > return intersect_nonzero_bits (r); > > + // ?? We could probably come up with something smarter than the > + // worst case scenario here. > + int needed = num_pairs () + r.num_pairs (); > + maybe_resize (needed); > + > signop sign = TYPE_SIGN (m_type); > unsigned bld_pair = 0; > unsigned bld_lim = m_max_ranges; > @@ -1646,6 +1655,11 @@ irange::invert () > m_num_ranges = 1; > return; > } > + > + // At this point, we need one extra sub-range to represent the > + // inverse. > + maybe_resize (m_num_ranges + 1); > + > // The algorithm is as follows. To calculate INVERT ([a,b][c,d]), we > // generate [-MIN, a-1][b+1, c-1][d+1, MAX]. > // > diff --git a/gcc/value-range.h b/gcc/value-range.h > index 22b0250b11b..0da2a42764a 100644 > --- a/gcc/value-range.h > +++ b/gcc/value-range.h > @@ -167,9 +167,10 @@ public: > void set_nonzero_bits (const wide_int &bits); > > protected: > + void maybe_resize (int needed); > virtual void set (tree, tree, value_range_kind = VR_RANGE) override; > virtual bool contains_p (tree cst) const override; > - irange (wide_int *, unsigned); > + irange (wide_int *, unsigned nranges, bool resizable); > > // In-place operators. > bool irange_contains_p (const irange &) const; > @@ -179,6 +180,8 @@ protected: > > void verify_range (); > > + // Hard limit on max ranges allowed. > + static const int HARD_MAX_RANGES = 255; > private: > friend void gt_ggc_mx (irange *); > friend void gt_pch_nx (irange *); > @@ -192,16 +195,22 @@ private: > > bool intersect (const wide_int& lb, const wide_int& ub); > unsigned char m_num_ranges; > - const unsigned char m_max_ranges; > + bool m_resizable; > + unsigned char m_max_ranges; > tree m_type; > wide_int m_nonzero_mask; > +protected: > wide_int *m_base; > }; > > // Here we describe an irange with N pairs of ranges. The storage for > // the pairs is embedded in the class as an array. > +// > +// If RESIZABLE is true, the storage will be resized on the heap when > +// the number of ranges needed goes past N up to a max of > +// HARD_MAX_RANGES. This new storage is freed upon destruction. > > -template<unsigned N> > +template<unsigned N, bool RESIZABLE = false> > class GTY((user)) int_range : public irange > { > public: > @@ -211,7 +220,7 @@ public: > int_range (tree type); > int_range (const int_range &); > int_range (const irange &); > - virtual ~int_range () = default; > + virtual ~int_range (); > int_range& operator= (const int_range &); > protected: > int_range (tree, tree, value_range_kind = VR_RANGE); > @@ -451,6 +460,38 @@ is_a <frange> (vrange &v) > return v.m_discriminator == VR_FRANGE; > } > > +// For resizable ranges, resize the range up to HARD_MAX_RANGES if the > +// NEEDED pairs is greater than the current capacity of the range. > + > +inline void > +irange::maybe_resize (int needed) > +{ > + if (!m_resizable || m_max_ranges == HARD_MAX_RANGES) > + return; > + > + if (needed > m_max_ranges) > + { > + m_max_ranges = HARD_MAX_RANGES; > + wide_int *newmem = new wide_int[m_max_ranges * 2]; > + memcpy (newmem, m_base, sizeof (wide_int) * num_pairs () * 2); > + m_base = newmem; > + } > +} > + > +template<unsigned N, bool RESIZABLE> > +inline > +int_range<N, RESIZABLE>::~int_range () > +{ > + if (RESIZABLE && m_base != m_ranges) > + delete m_base; > +} > + > +// This is an "infinite" precision irange for use in temporary > +// calculations. It starts with a sensible default covering 99% of > +// uses, and goes up to HARD_MAX_RANGES when needed. Any allocated > +// storage is freed upon destruction. > +typedef int_range<3, /*RESIZABLE=*/true> int_range_max; > + > class vrange_visitor > { > public: > @@ -461,10 +502,6 @@ public: > > typedef int_range<2> value_range; > > -// This is an "infinite" precision irange for use in temporary > -// calculations. > -typedef int_range<255> int_range_max; > - > // This is an "infinite" precision range object for use in temporary > // calculations for any of the handled types. The object can be > // transparently used as a vrange. > @@ -757,8 +794,9 @@ gt_pch_nx (int_range<N> *x, gt_pointer_operator op, void > *cookie) > // Constructors for irange > > inline > -irange::irange (wide_int *base, unsigned nranges) > +irange::irange (wide_int *base, unsigned nranges, bool resizable) > : vrange (VR_IRANGE), > + m_resizable (resizable), > m_max_ranges (nranges) > { > m_base = base; > @@ -767,52 +805,52 @@ irange::irange (wide_int *base, unsigned nranges) > > // Constructors for int_range<>. > > -template<unsigned N> > +template<unsigned N, bool RESIZABLE> > inline > -int_range<N>::int_range () > - : irange (m_ranges, N) > +int_range<N, RESIZABLE>::int_range () > + : irange (m_ranges, N, RESIZABLE) > { > } > > -template<unsigned N> > -int_range<N>::int_range (const int_range &other) > - : irange (m_ranges, N) > +template<unsigned N, bool RESIZABLE> > +int_range<N, RESIZABLE>::int_range (const int_range &other) > + : irange (m_ranges, N, RESIZABLE) > { > irange::operator= (other); > } > > -template<unsigned N> > -int_range<N>::int_range (tree min, tree max, value_range_kind kind) > - : irange (m_ranges, N) > +template<unsigned N, bool RESIZABLE> > +int_range<N, RESIZABLE>::int_range (tree min, tree max, value_range_kind > kind) > + : irange (m_ranges, N, RESIZABLE) > { > irange::set (min, max, kind); > } > > -template<unsigned N> > -int_range<N>::int_range (tree type) > - : irange (m_ranges, N) > +template<unsigned N, bool RESIZABLE> > +int_range<N, RESIZABLE>::int_range (tree type) > + : irange (m_ranges, N, RESIZABLE) > { > set_varying (type); > } > > -template<unsigned N> > -int_range<N>::int_range (tree type, const wide_int &wmin, const wide_int > &wmax, > +template<unsigned N, bool RESIZABLE> > +int_range<N, RESIZABLE>::int_range (tree type, const wide_int &wmin, const > wide_int &wmax, > value_range_kind kind) > - : irange (m_ranges, N) > + : irange (m_ranges, N, RESIZABLE) > { > set (type, wmin, wmax, kind); > } > > -template<unsigned N> > -int_range<N>::int_range (const irange &other) > - : irange (m_ranges, N) > +template<unsigned N, bool RESIZABLE> > +int_range<N, RESIZABLE>::int_range (const irange &other) > + : irange (m_ranges, N, RESIZABLE) > { > irange::operator= (other); > } > > -template<unsigned N> > -int_range<N>& > -int_range<N>::operator= (const int_range &src) > +template<unsigned N, bool RESIZABLE> > +int_range<N, RESIZABLE>& > +int_range<N, RESIZABLE>::operator= (const int_range &src) > { > irange::operator= (src); > return *this; > -- > 2.40.0 >