> -----Original Message----- > From: Andrew Stubbs <a...@baylibre.com> > Sent: 02 January 2025 17:21 > To: Prathamesh Kulkarni <prathame...@nvidia.com>; Jakub Jelinek > <ja...@redhat.com> > Cc: Richard Biener <richard.guent...@gmail.com>; Richard Biener > <rguent...@suse.de>; gcc@gcc.gnu.org; Thomas Schwinge > <tschwi...@baylibre.com> > Subject: Re: [RFC] Enabling SVE with offloading to nvptx > > External email: Use caution opening links or attachments > > > On 27/12/2024 12:29, Prathamesh Kulkarni wrote: > > > > > >> -----Original Message----- > >> From: Jakub Jelinek <ja...@redhat.com> > >> Sent: 17 December 2024 19:09 > >> To: Prathamesh Kulkarni <prathame...@nvidia.com> > >> Cc: Andrew Stubbs <a...@baylibre.com>; Richard Biener > >> <richard.guent...@gmail.com>; Richard Biener <rguent...@suse.de>; > >> gcc@gcc.gnu.org; Thomas Schwinge <tschwi...@baylibre.com> > >> Subject: Re: [RFC] Enabling SVE with offloading to nvptx > >> > >> External email: Use caution opening links or attachments > >> > >> > >> On Mon, Dec 02, 2024 at 11:17:08AM +0000, Prathamesh Kulkarni > wrote: > >>> --- a/gcc/cfgloop.h > >>> +++ b/gcc/cfgloop.h > >>> @@ -233,6 +233,12 @@ public: > >>> flag_finite_loops or similar pragmas state. */ > >>> unsigned finite_p : 1; > >>> > >>> + /* True if SIMD loop needs delayed lowering of artefacts like > >>> + safelen and length of omp simd arrays that depend on > target's > >>> + max_vf. This is true for offloading, when max_vf is > computed > >> after > >>> + streaming out to device. */ > >>> + unsigned needs_max_vf_lowering: 1; > >> > >> Consistency, finite_p above uses space before :, the above line > >> doesn't. > >> > >>> --- a/gcc/omp-expand.cc > >>> +++ b/gcc/omp-expand.cc > >>> @@ -7170,6 +7170,10 @@ expand_omp_simd (struct omp_region *region, > >> struct omp_for_data *fd) > >>> loop->latch = cont_bb; > >>> add_loop (loop, l1_bb->loop_father); > >>> loop->safelen = safelen_int; > >>> + loop->needs_max_vf_lowering = is_in_offload_region > (region); > >>> + if (loop->needs_max_vf_lowering) > >>> + cfun->curr_properties &= ~PROP_gimple_lomp_dev; > >> > >> Do you really need this for non-SVE arches? > >> I mean, could you not set loop->needs_max_vf_lowering if maximum > >> number of poly_int coeffs is 1? Or if omp_max_vf returns constant > or > >> something similar? > > Well, I guess the issue is not really about VLA vectors but when > host > > and device have different max_vf, and selecting optimal max_vf is > not > > really possible during omp-low/omp-expand, since we don't have > > device's target info available at this point. Andrew's recent patch > > works around this limitation by searching for "amdgcn" in > OFFLOAD_TARGET_NAMES in omp_max_vf, but I guess a more general > solution would be to delay lowering max_vf after streaming-out to > device irrespective of VLA/VLS vectors ? > > For AArch64/nvptx offloading with SVE, where host is VLA and device > is > > VLS, the issue is more pronounced (failing to compile), compared to > offloading from VLS host to VLS device (selecting sub-optimal max_vf). > > That patch fixed a couple of cases. The name matching was only used > for the case where an oversized VF was harmless. The other case where > making the VF too large would reserve excess memory was deferred to > the device compiler. Ah OK, thanks for clarifying! > > In general, deferring decisions is probably a good idea, but it's not > always possible, or optimal, and in the above case it certainly wasn't > the easy option. There's already precedent for doing the name match in > the SIMT VF code (for NVPTX), so it was easier and sufficient to do > the same. Currently, I have left it in the patch to delay lowering of max_vf by default. If that isn't really useful, I'd restrict delaying of max_vf only for VLA hosts.
Thanks, Prathamesh > > Andrew