> -Original Message-
> From: Gcc On Behalf
> Of Prathamesh Kulkarni via Gcc
> Sent: 21 January 2025 17:05
> To: Jakub Jelinek
> Cc: Andrew Stubbs ; Richard Biener
> ; Richard Biener ;
> gcc@gcc.gnu.org; Thomas Schwinge
> Subject: RE: [RFC] Enabling SVE with offloading to nvptx
>
> External email: Use caution opening links or attachments
>
>
> > -Original Message-
> > From: Prathamesh Kulkarni
> > Sent: 08 January 2025 15:22
> > To: Prathamesh Kulkarni ; Jakub Jelinek
> >
> > Cc: Andrew Stubbs ; Richard Biener
> > ; Richard Biener ;
> > gcc@gcc.gnu.org; Thomas Schwinge
> > Subject: RE: [RFC] Enabling SVE with offloading to nvptx
> >
> >
> >
> > > -Original Message-
> > > From: Gcc On
> Behalf
> > > Of Prathamesh Kulkarni via Gcc
> > > Sent: 27 December 2024 18:00
> > > To: Jakub Jelinek
> > > Cc: Andrew Stubbs ; Richard Biener
> > > ; Richard Biener ;
> > > gcc@gcc.gnu.org; Thomas Schwinge
> > > Subject: RE: [RFC] Enabling SVE with offloading to nvptx
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > > -Original Message-
> > > > From: Jakub Jelinek
> > > > Sent: 17 December 2024 19:09
> > > > To: Prathamesh Kulkarni
> > > > Cc: Andrew Stubbs ; Richard Biener
> > > > ; Richard Biener
> ;
> > > > gcc@gcc.gnu.org; Thomas Schwinge
> > > > Subject: Re: [RFC] Enabling SVE with offloading to nvptx
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Mon, Dec 02, 2024 at 11:17:08AM +, Prathamesh Kulkarni
> > wrote:
> > > > > --- a/gcc/cfgloop.h
> > > > > +++ b/gcc/cfgloop.h
> > > > > @@ -233,6 +233,12 @@ public:
> > > > > flag_finite_loops or similar pragmas state. */
> > > > >unsigned finite_p : 1;
> > > > >
> > > > > + /* True if SIMD loop needs delayed lowering of artefacts
> like
> > > > > + safelen and length of omp simd arrays that depend on
> > > target's
> > > > > + max_vf. This is true for offloading, when max_vf is
> > > computed
> > > > after
> > > > > + streaming out to device. */ unsigned
> > > > > + needs_max_vf_lowering: 1;
> > > >
> > > > Consistency, finite_p above uses space before :, the above line
> > > > doesn't.
> > > >
> > > > > --- a/gcc/omp-expand.cc
> > > > > +++ b/gcc/omp-expand.cc
> > > > > @@ -7170,6 +7170,10 @@ expand_omp_simd (struct omp_region
> > *region,
> > > > struct omp_for_data *fd)
> > > > >loop->latch = cont_bb;
> > > > >add_loop (loop, l1_bb->loop_father);
> > > > >loop->safelen = safelen_int;
> > > > > + loop->needs_max_vf_lowering = is_in_offload_region
> > > (region);
> > > > > + if (loop->needs_max_vf_lowering)
> > > > > + cfun->curr_properties &= ~PROP_gimple_lomp_dev;
> > > >
> > > > Do you really need this for non-SVE arches?
> > > > I mean, could you not set loop->needs_max_vf_lowering if maximum
> > > > number of poly_int coeffs is 1? Or if omp_max_vf returns
> constant
> > > or
> > > > something similar?
> > > Well, I guess the issue is not really about VLA vectors but when
> > host
> > > and device have different max_vf, and selecting optimal max_vf is
> > not
> > > really possible during omp-low/omp-expand, since we don't have
> > > device's target info available at this point. Andrew's recent
> patch
> > > works around this limitation by searching for "amdgcn" in
> > > OFFLOAD_TARGET_NAMES in omp_max_vf, but I guess a more general
> > > solution would be to delay lowering max_vf after streaming-out to
> > > device irrespective of VLA/VLS vectors ?
> > > For AArch64/nvptx offloading with SVE, where host is VLA and
> device
> > is
> > > VLS, the issue is more pronounced (failing to compile), compared
> to
> > > offloading from VLS host to VLS device (selecting sub-optimal
> > max_vf).
> > > >
> > > > > --- a/gcc/omp-offload.cc
> > > > > +++ b/gcc/omp-offload.cc
> > > > > @@ -2617,6 +2617,77 @@ find_simtpriv_var_op (tree *tp, int
> > > > *walk_subtrees, void *)
> > > > >return NULL_TREE;
> > > > > }
> > > > >
> > > > > +/* Compute max_vf for target, and accordingly set loop-
> >safelen
> > > and
> > > > length
> > > > > + of omp simd arrays. */
> > > > > +
> > > > > +static void
> > > > > +adjust_max_vf (function *fun)
> > > > > +{
> > > > > + if (!fun->has_simduid_loops)
> > > > > +return;
> > > > > +
> > > > > + poly_uint64 max_vf = omp_max_vf (false);
> > > > > +
> > > > > + /* Since loop->safelen has to be an integer, it's not
> always
> > > > possible
> > > > > + to compare against poly_int. For eg 32 and 16+16x are
> not
> > > > comparable at
> > > > > + compile-time because 16+16x <= 32 for x < 2, but 16+16x
> >
> > 32
> > > > for x >= 2.
> > > > > + Even if we could get runtime VL based on -mcpu/-march,
> > that
> > > > would not be
> > > > > + portable across other SVE archs.
> > > > > +
> > > > > + For now, use constant_lower_bound (max_vf), as a "safer
> > > > approximation" to
> > > > > + max_vf that avoids these