> -----Original Message-----
> From: Andrew Stubbs <a...@baylibre.com>
> Sent: 02 January 2025 17:21
> To: Prathamesh Kulkarni <prathame...@nvidia.com>; Jakub Jelinek
> <ja...@redhat.com>
> Cc: Richard Biener <richard.guent...@gmail.com>; Richard Biener
> <rguent...@suse.de>; gcc@gcc.gnu.org; Thomas Schwinge
> <tschwi...@baylibre.com>
> Subject: Re: [RFC] Enabling SVE with offloading to nvptx
> 
> External email: Use caution opening links or attachments
> 
> 
> On 27/12/2024 12:29, Prathamesh Kulkarni wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jakub Jelinek <ja...@redhat.com>
> >> Sent: 17 December 2024 19:09
> >> To: Prathamesh Kulkarni <prathame...@nvidia.com>
> >> Cc: Andrew Stubbs <a...@baylibre.com>; Richard Biener
> >> <richard.guent...@gmail.com>; Richard Biener <rguent...@suse.de>;
> >> gcc@gcc.gnu.org; Thomas Schwinge <tschwi...@baylibre.com>
> >> Subject: Re: [RFC] Enabling SVE with offloading to nvptx
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> On Mon, Dec 02, 2024 at 11:17:08AM +0000, Prathamesh Kulkarni
> wrote:
> >>> --- a/gcc/cfgloop.h
> >>> +++ b/gcc/cfgloop.h
> >>> @@ -233,6 +233,12 @@ public:
> >>>        flag_finite_loops or similar pragmas state.  */
> >>>     unsigned finite_p : 1;
> >>>
> >>> +  /* True if SIMD loop needs delayed lowering of artefacts like
> >>> +     safelen and length of omp simd arrays that depend on
> target's
> >>> +     max_vf.  This is true for offloading, when max_vf is
> computed
> >> after
> >>> +     streaming out to device.  */
> >>> +  unsigned needs_max_vf_lowering: 1;
> >>
> >> Consistency, finite_p above uses space before :, the above line
> >> doesn't.
> >>
> >>> --- a/gcc/omp-expand.cc
> >>> +++ b/gcc/omp-expand.cc
> >>> @@ -7170,6 +7170,10 @@ expand_omp_simd (struct omp_region *region,
> >> struct omp_for_data *fd)
> >>>         loop->latch = cont_bb;
> >>>         add_loop (loop, l1_bb->loop_father);
> >>>         loop->safelen = safelen_int;
> >>> +      loop->needs_max_vf_lowering = is_in_offload_region
> (region);
> >>> +      if (loop->needs_max_vf_lowering)
> >>> +     cfun->curr_properties &= ~PROP_gimple_lomp_dev;
> >>
> >> Do you really need this for non-SVE arches?
> >> I mean, could you not set loop->needs_max_vf_lowering if maximum
> >> number of poly_int coeffs is 1?  Or if omp_max_vf returns constant
> or
> >> something similar?
> > Well, I guess the issue is not really about VLA vectors but when
> host
> > and device have different max_vf, and selecting optimal max_vf is
> not
> > really possible during omp-low/omp-expand, since we don't have
> > device's target info available at this point. Andrew's recent patch
> > works around this limitation by searching for "amdgcn" in
> OFFLOAD_TARGET_NAMES in omp_max_vf, but I guess a more general
> solution would be to delay lowering max_vf after streaming-out to
> device irrespective of VLA/VLS vectors ?
> > For AArch64/nvptx offloading with SVE, where host is VLA and device
> is
> > VLS, the issue is more pronounced (failing to compile), compared to
> offloading from VLS host to VLS device (selecting sub-optimal max_vf).
> 
> That patch fixed a couple of cases. The name matching was only used
> for the case where an oversized VF was harmless. The other case where
> making the VF too large would reserve excess memory was deferred to
> the device compiler.
Ah OK, thanks for clarifying!
> 
> In general, deferring decisions is probably a good idea, but it's not
> always possible, or optimal, and in the above case it certainly wasn't
> the easy option. There's already precedent for doing the name match in
> the SIMT VF code (for NVPTX), so it was easier and sufficient to do
> the same.
Currently, I have left it in the patch to delay lowering of max_vf by default.
If that isn't really useful, I'd restrict delaying of max_vf only for VLA hosts.

Thanks,
Prathamesh
> 
> Andrew

Reply via email to