> -----Original Message-----
> From: Richard Sandiford <[email protected]>
> Sent: Monday, July 29, 2024 9:43 PM
> To: Richard Biener <[email protected]>
> Cc: Prathamesh Kulkarni <[email protected]>; gcc-
> [email protected]
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
>
> External email: Use caution opening links or attachments
>
>
> Richard Biener <[email protected]> writes:
> > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
> >
> >> Hi Richard,
> >> Thanks for your suggestions on RFC email, the attached patch adds
> support for streaming of poly_int when it's degree <= accel's
> NUM_POLY_INT_COEFFS.
> >> The patch changes streaming of poly_int as follows:
> >>
> >> Streaming out poly_int:
> >>
> >> degree = poly_int.degree();
> >> stream out degree;
> >> for (i = 0; i < degree; i++)
> >> stream out poly_int.coeffs[i];
> >>
> >> Streaming in poly_int:
> >>
> >> stream in degree;
> >> if (degree > NUM_POLY_INT_COEFFS)
> >> fatal_error();
> >> stream in coeffs;
> >> // Set remaining coeffs to zero in case degree < accel's
> >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
> >> poly_int.coeffs[i] = 0;
> >>
> >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-
> linux-gnu.
> >> LTO bootstrap+test on x86_64-linux-gnu in progress.
> >>
> >> I am not quite sure how to test it for offloading since currently
> it's (entirely) broken for aarch64->nvptx.
> >> I can give a try with x86_64->nvptx offloading if required (altho I
> >> guess LTO bootstrap should test streaming changes ?)
> >
> > + unsigned degree
> > + = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
> > HOST_WIDE_INT));
> >
> > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
> > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
> > I'd say we should constrain it to a reasonable upper bound, like 2?
> > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS
> in
> > poly-int.h and constrain NUM_POLY_INT_COEFFS.
> >
> > The patch looks reasonable over all, but Richard S. should have a
> say
> > about the abstraction you chose and the poly-int adjustment.
>
> Sorry if this has been discussed already, but could we instead stream
> NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
> It's a target invariant, and poly_int has wormed its way into lots of
> things by now :)
Hi Richard,
The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of poly_int
(and streams-out coeffs only up to degree, ignoring the higher zero coeffs).
During streaming-in, it reads back the degree (and streamed coeffs upto degree)
and issues an error if degree > accel's NUM_POLY_INT_COEFFS, since we can't
(as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < N. If
degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set to 0
(similar to zero-extension). I posted more details in RFC:
https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html
The attached patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to
represent number of bits needed for max value of NUM_POLY_INT_COEFFS defined by
any target,
and uses that for packing/unpacking degree of poly_int to/from bitstream, which
should make it independent of the type used for representing
NUM_POLY_INT_COEFFS by
the target.
Bootstrap+test and LTO bootstrap+test in progress on aarch64-linux-gnu.
Does the patch look OK ?
Signed-off-by: Prathamesh Kulkarni <[email protected]>
Thanks,
Prathamesh
>
> Thanks,
> Richard
Partially support streaming of poly_int for offloading.
Support streaming of poly_int for offloading when it's degree doesn't exceed
accel's NUM_POLY_INT_COEFFS.
The patch changes streaming of poly_int as follows:
Streaming out poly_int:
degree = poly_int.degree();
stream out degree;
for (i = 0; i < degree; i++)
stream out poly_int.coeffs[i];
Streaming in poly_int (for accelerator):
stream in degree;
if (degree > NUM_POLY_INT_COEFFS)
fatal_error();
stream in coeffs;
// Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
poly_int.coeffs[i] = 0;
The patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to represent
number of bits needed to represent value of max NUM_POLY_INT_COEFFS for any
target, which should make packing/unpacking degree of poly_int to/from
bitstream independent of the type used to represent NUM_POLY_INT_COEFF by the
target.
gcc/ChangeLog:
* data-streamer-in.cc (streamer_read_poly_uint64): Stream in poly_int
degree and call poly_int_read_common.
(streamer_read_poly_int64): Likewise.
* data-streamer-out.cc (streamer_write_poly_uint64): Stream out poly_int
degree.
(streamer_write_poly_int64): Likewise.
* data-streamer.h (bp_pack_poly_value): Likewise.
(poly_int_read_common): New function template.
(bp_unpack_poly_value): Stream in poly_int degree and call
poly_int_read_common.
* poly-int.h (poly_int::degree): New method.
(MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Stream in
POLY_INT_CST degree, issue a fatal_error if degree is greater than
NUM_POLY_INT_COEFFS, and zero out remaining coeffs.
* tree-streamer-out.cc (write_ts_poly_tree_pointers): Calculate and
stream out POLY_INT_CST degree.
Signed-off-by: Prathamesh Kulkarni <[email protected]>
diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..91cece39b05 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -182,10 +182,11 @@ streamer_read_hwi (class lto_input_block *ib)
poly_uint64
streamer_read_poly_uint64 (class lto_input_block *ib)
{
+ unsigned degree = streamer_read_uhwi (ib);
poly_uint64 res;
- for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+ for (unsigned int i = 0; i < degree; ++i)
res.coeffs[i] = streamer_read_uhwi (ib);
- return res;
+ return poly_int_read_common (res, degree);
}
/* Read a poly_int64 from IB. */
@@ -193,10 +194,11 @@ streamer_read_poly_uint64 (class lto_input_block *ib)
poly_int64
streamer_read_poly_int64 (class lto_input_block *ib)
{
+ unsigned degree = streamer_read_uhwi (ib);
poly_int64 res;
- for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+ for (unsigned int i = 0; i < degree; ++i)
res.coeffs[i] = streamer_read_hwi (ib);
- return res;
+ return poly_int_read_common (res, degree);
}
/* Read gcov_type value from IB. */
diff --git a/gcc/data-streamer-out.cc b/gcc/data-streamer-out.cc
index c237e30f704..b0fb9dedb24 100644
--- a/gcc/data-streamer-out.cc
+++ b/gcc/data-streamer-out.cc
@@ -227,7 +227,9 @@ streamer_write_hwi (struct output_block *ob, HOST_WIDE_INT
work)
void
streamer_write_poly_uint64 (struct output_block *ob, poly_uint64 work)
{
- for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+ unsigned degree = work.degree ();
+ streamer_write_uhwi_stream (ob->main_stream, degree);
+ for (unsigned i = 0; i < degree; ++i)
streamer_write_uhwi_stream (ob->main_stream, work.coeffs[i]);
}
@@ -236,7 +238,9 @@ streamer_write_poly_uint64 (struct output_block *ob,
poly_uint64 work)
void
streamer_write_poly_int64 (struct output_block *ob, poly_int64 work)
{
- for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+ unsigned degree = work.degree ();
+ streamer_write_uhwi_stream (ob->main_stream, degree);
+ for (unsigned i = 0; i < degree; ++i)
streamer_write_hwi_stream (ob->main_stream, work.coeffs[i]);
}
diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h
index 6a2596134ce..ad676cc9287 100644
--- a/gcc/data-streamer.h
+++ b/gcc/data-streamer.h
@@ -142,7 +142,9 @@ bp_pack_poly_value (struct bitpack_d *bp,
const poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> &val,
unsigned nbits)
{
- for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+ unsigned degree = val.degree ();
+ bp_pack_value (bp, degree, MAX_NUM_POLY_INT_COEFFS_BITS);
+ for (unsigned i = 0; i < degree; ++i)
bp_pack_value (bp, val.coeffs[i], nbits);
}
@@ -194,15 +196,33 @@ bp_unpack_value (struct bitpack_d *bp, unsigned nbits)
return val & mask;
}
+template<unsigned N, typename C>
+inline poly_int<N, C>
+poly_int_read_common (poly_int<N, C> x, unsigned degree)
+{
+ if (degree > NUM_POLY_INT_COEFFS)
+ fatal_error (input_location,
+ "%<poly_int%> degree (%u) exceeds value of "
+ "%<NUM_POLY_INT_COEFFS%> (%u)", degree,
+ NUM_POLY_INT_COEFFS);
+ for (unsigned i = degree; i < NUM_POLY_INT_COEFFS; i++)
+ x.coeffs[i] = 0;
+ return x;
+}
+
/* Unpacks a polynomial value from the bit-packing context BP in which each
coefficient has NBITS bits. */
inline poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t>
bp_unpack_poly_value (struct bitpack_d *bp, unsigned nbits)
{
+ unsigned degree
+ = bp_unpack_value (bp, MAX_NUM_POLY_INT_COEFFS_BITS);
+
poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> x;
- for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+ for (unsigned i = 0; i < degree; i++)
x.coeffs[i] = bp_unpack_value (bp, nbits);
- return x;
+
+ return poly_int_read_common (x, degree);
}
diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index e3f8d4df716..5f2272313ed 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2>
? (void) ((RES).coeffs[I] = VALUE) \
: (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
+/* Number of bits needed to represent maximum value of
+ NUM_POLY_INT_COEFFS defined by any target. */
+#define MAX_NUM_POLY_INT_COEFFS_BITS (2)
+
/* poly_int_full and poly_int_hungry are used internally within poly_int
for delegated initializers. poly_int_full indicates that a parameter
pack has enough elements to initialize every coefficient. poly_int_hungry
@@ -422,6 +426,8 @@ public:
poly_int<N, HOST_WIDE_INT> force_shwi () const;
poly_int<N, unsigned HOST_WIDE_INT> force_uhwi () const;
+ unsigned degree (void) const;
+
#if POLY_INT_CONVERSION
operator C () const;
#endif
@@ -678,6 +684,18 @@ poly_int<N, C>::force_uhwi () const
return r;
}
+/* Find leading non-zero coeff. In case all coeffs are zero,
+ treat it as degree-1 poly_int. */
+
+template<unsigned N, typename C>
+inline unsigned poly_int<N, C>::degree () const
+{
+ unsigned i;
+ for (i = NUM_POLY_INT_COEFFS - 1; i > 0 && !this->coeffs[i]; i--)
+ ;
+ return i + 1;
+}
+
#if POLY_INT_CONVERSION
/* Provide a conversion operator to constants. */
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index c248a74f7a1..2394ac209f5 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -671,8 +671,20 @@ static void
lto_input_ts_poly_tree_pointers (class lto_input_block *ib,
class data_in *data_in, tree expr)
{
- for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+ unsigned degree = streamer_read_uhwi (ib);
+ if (degree > NUM_POLY_INT_COEFFS)
+ fatal_error (input_location,
+ "%<poly_int%> degree (%u) exceeds value of "
+ "%<NUM_POLY_INT_COEFFS%> (%u)", degree,
+ NUM_POLY_INT_COEFFS);
+
+ unsigned i;
+ for (i = 0; i < degree; ++i)
POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+
+ tree coeff_type = TREE_TYPE (POLY_INT_CST_COEFF (expr, 0));
+ for (; i < NUM_POLY_INT_COEFFS; ++i)
+ POLY_INT_CST_COEFF (expr, i) = build_zero_cst (coeff_type);
}
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index b7205287ffb..e28616b9a7a 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -576,7 +576,14 @@ write_ts_vector_tree_pointers (struct output_block *ob,
tree expr)
static void
write_ts_poly_tree_pointers (struct output_block *ob, tree expr)
{
- for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+ unsigned i;
+ for (i = NUM_POLY_INT_COEFFS - 1;
+ i > 0 && integer_zerop (POLY_INT_CST_COEFF (expr, i));
+ i--)
+ ;
+ unsigned degree = i + 1;
+ streamer_write_uhwi_stream (ob->main_stream, degree);
+ for (i = 0; i < degree; ++i)
stream_write_tree_ref (ob, POLY_INT_CST_COEFF (expr, i));
}