On Tue, 30 Jul 2024, Richard Sandiford wrote: > Richard Biener <[email protected]> writes: > > On Tue, 30 Jul 2024, Richard Sandiford wrote: > > > >> Richard Biener <[email protected]> writes: > >> > On Tue, 30 Jul 2024, Prathamesh Kulkarni wrote: > >> > > >> >> > >> >> > >> >> > -----Original Message----- > >> >> > From: Richard Sandiford <[email protected]> > >> >> > Sent: Monday, July 29, 2024 9:43 PM > >> >> > To: Richard Biener <[email protected]> > >> >> > Cc: Prathamesh Kulkarni <[email protected]>; gcc- > >> >> > [email protected] > >> >> > Subject: Re: Support streaming of poly_int for offloading when it's > >> >> > degree <= accel's NUM_POLY_INT_COEFFS > >> >> > > >> >> > External email: Use caution opening links or attachments > >> >> > > >> >> > > >> >> > Richard Biener <[email protected]> writes: > >> >> > > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote: > >> >> > > > >> >> > >> Hi Richard, > >> >> > >> Thanks for your suggestions on RFC email, the attached patch adds > >> >> > support for streaming of poly_int when it's degree <= accel's > >> >> > NUM_POLY_INT_COEFFS. > >> >> > >> The patch changes streaming of poly_int as follows: > >> >> > >> > >> >> > >> Streaming out poly_int: > >> >> > >> > >> >> > >> degree = poly_int.degree(); > >> >> > >> stream out degree; > >> >> > >> for (i = 0; i < degree; i++) > >> >> > >> stream out poly_int.coeffs[i]; > >> >> > >> > >> >> > >> Streaming in poly_int: > >> >> > >> > >> >> > >> stream in degree; > >> >> > >> if (degree > NUM_POLY_INT_COEFFS) > >> >> > >> fatal_error(); > >> >> > >> stream in coeffs; > >> >> > >> // Set remaining coeffs to zero in case degree < accel's > >> >> > >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++) > >> >> > >> poly_int.coeffs[i] = 0; > >> >> > >> > >> >> > >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64- > >> >> > linux-gnu. > >> >> > >> LTO bootstrap+test on x86_64-linux-gnu in progress. > >> >> > >> > >> >> > >> I am not quite sure how to test it for offloading since currently > >> >> > it's (entirely) broken for aarch64->nvptx. > >> >> > >> I can give a try with x86_64->nvptx offloading if required (altho I > >> >> > >> guess LTO bootstrap should test streaming changes ?) > >> >> > > > >> >> > > + unsigned degree > >> >> > > + = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned > >> >> > > HOST_WIDE_INT)); > >> >> > > > >> >> > > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained > >> >> > > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary. > >> >> > > I'd say we should constrain it to a reasonable upper bound, like 2? > >> >> > > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS > >> >> > in > >> >> > > poly-int.h and constrain NUM_POLY_INT_COEFFS. > >> >> > > > >> >> > > The patch looks reasonable over all, but Richard S. should have a > >> >> > say > >> >> > > about the abstraction you chose and the poly-int adjustment. > >> >> > > >> >> > Sorry if this has been discussed already, but could we instead stream > >> >> > NUM_POLY_INT_COEFFS once per file, rather than once per poly_int? > >> >> > It's a target invariant, and poly_int has wormed its way into lots of > >> >> > things by now :) > >> >> Hi Richard, > >> >> The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of > >> >> poly_int (and streams-out coeffs only up to degree, ignoring the higher > >> >> zero coeffs). > >> >> During streaming-in, it reads back the degree (and streamed coeffs upto > >> >> degree) and issues an error if degree > accel's NUM_POLY_INT_COEFFS, > >> >> since we can't > >> >> (as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS > >> >> < N. If degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are > >> >> set to 0 > >> >> (similar to zero-extension). I posted more details in RFC: > >> >> https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html > >> > >> It's not clear to me what the plan is for VLA host + VLS offloading. > >> Is the streamed data guaranteed to be "clean" of any host-only > >> VLA stuff? E.g. if code does: > >> > >> #include <arm_sve.h> > >> > >> svint32_t *ptr: > >> void foo(svint32_t); > >> > >> #pragma GCC target "+nosve" > >> > >> ...offloading... > >> > >> is there a guarantee that the offload target won't see the definition > >> of ptr and foo? > > > > No. If it sees any unsupported poly-* the offload compilation will fail. > > Could it fail even if the offloading code doesn't refer to ptr and foo > directly? Or is only "relevant" stuff streamed?
Only "relevant" stuff should be streamed - the offload code and all trees refered to. > > I think all current issues are because of poly-* leaking in for cases > > where a non-poly would have worked fine, but I have not had a look > > myself. > > One of the cases that Prathamesh mentions is streaming the mode sizes. > Are those modes "offload target modes" or "host modes"? It seems like > it shouldn't be an error for the host to have VLA modes per se. It's > just that those modes can't be used in the host/offload interface. There's a requirement that a mode mapping exists from the host to target enum machine_mode. I don't remember exactly how we compute that mapping and whether streaming of some data (and thus poly-int) are part of this. Richard. > Thanks, > Richard > > -- Richard Biener <[email protected]> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
