Re: [RFC] Enabling SVE with offloading to nvptx

Jakub Jelinek via Gcc Thu, 28 Nov 2024 04:16:26 -0800

On Thu, Nov 14, 2024 at 08:29:26AM +0000, Prathamesh Kulkarni wrote:
> +  /* True if SIMD loop needs delayed lowering of artefacts like
> +     safelen and length of omp simd arrays that depend on target's
> +     max_vf. This is true for offloading, when max_vf is computed after


2 spaces after ., not just one.

> +     streaming out to device.  */
> +  unsigned needs_max_vf_lowering: 1;
> +
>    /* The number of times to unroll the loop.  0 means no information given,
>       just do what we always do.  A value of 1 means do not unroll the loop.
>       A value of USHRT_MAX means unroll with no specific unrolling factor.
> diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
> index 80fb1843445..c9581e3d92e 100644
> --- a/gcc/omp-expand.cc
> +++ b/gcc/omp-expand.cc
> @@ -7171,6 +7171,10 @@ expand_omp_simd (struct omp_region *region, struct 
> omp_for_data *fd)
>        loop->latch = cont_bb;
>        add_loop (loop, l1_bb->loop_father);
>        loop->safelen = safelen_int;
> +      loop->needs_max_vf_lowering = is_in_offload_region (region);
> +      if (loop->needs_max_vf_lowering)
> +     cfun->curr_properties &= ~PROP_gimple_lomp_dev;
> +
>        if (simduid)
>       {
>         loop->simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
> diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
> index 70a2c108fbc..0f68876a2bc 100644
> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -4660,7 +4660,16 @@ lower_rec_simd_input_clauses (tree new_var, 
> omp_context *ctx,
>      }
>    else
>      {
> -      tree atype = build_array_type_nelts (TREE_TYPE (new_var), 
> sctx->max_vf);
> +      poly_uint64 nelts;
> +      /* FIXME: If offloading is enabled, and max_vf is poly_int, avoid
> +      using it as length of omp simd array, and use a placeholder value
> +      instead to avoid streaming out poly_int to device. The arrays
> +      will be set to correct length later in omp_device_lower pass.  */

Likewise.  I really don't like introducing FIXME comments.  Describe what
you want to say, but not with FIXME.

> +      if (omp_maybe_offloaded_ctx (ctx) && !sctx->max_vf.is_constant ())
> +     nelts = INT_MAX;

I'm not convinced INT_MAX is the right placeholder.
On 32-bit arches whenever TREE_TYPE (new_var) is 2 or more bytes the array
won't fit into address space anymore.
Use some magic value like 64 and say in the comment it really doesn't matter
much, as we'll change it later.
1 probably wouldn't be best, then I think some undesirable optimizations
couild trigger (e.g. ignoring the exact value of the index).

> +      else
> +     nelts = sctx->max_vf;
> +      tree atype = build_array_type_nelts (TREE_TYPE (new_var), nelts);
>        tree avar = create_tmp_var_raw (atype);
>        if (TREE_ADDRESSABLE (new_var))
>       TREE_ADDRESSABLE (avar) = 1;
> diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
> index 372b019f9d6..9d6467cc6a6 100644
> --- a/gcc/omp-offload.cc
> +++ b/gcc/omp-offload.cc
> @@ -2618,6 +2618,75 @@ find_simtpriv_var_op (tree *tp, int *walk_subtrees, 
> void *)
>    return NULL_TREE;
>  }
>  
> +/* Compute max_vf for target, and accordingly set loop->safelen and length
> +   of omp simd arrays.  */
> +
> +static void
> +adjust_max_vf (function *fun)
> +{
> +  if (!fun->has_simduid_loops)
> +    return;
> +
> +  poly_uint64 max_vf = omp_max_vf (false);
> +
> +  /* FIXME: Since loop->safelen has to be an integer, it's not always 
> possible
> +     to compare against poly_int. For eg 32 and 16+16x are not comparable at
> +     compile-time because 16+16x <= 32 for x < 2, but 16+16x > 32 for x >= 2.
> +     Even if we could get runtime VL based on -mcpu/-march, that would not be
> +     portable across other SVE archs.
> +
> +     For now, use constant_lower_bound (max_vf), as a "safer approximation" 
> to
> +     max_vf that avoids these issues, with the downside that it will be
> +     suboptimal max_vf for SVE archs implementing SIMD width > 128 bits.  */

Again, I don't like FIXMEs and . should be followed by 2 spaces.

> +                 tree assign_stmt_lhs = gimple_assign_lhs (assign_stmt);
> +                 if (TREE_CODE (assign_stmt_lhs) == ARRAY_REF)
> +                   {
> +                     tree array = TREE_OPERAND (assign_stmt_lhs, 0);
> +                     tree& max = TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE 
> (array)));
> +                     max = size_int (loop->safelen - 1);
> +                     relayout_decl (array);

This is definitely wrong, please don't do that.
ARRAY_TYPEs are shared, see build_array_type_1, so by forcefully changing
the type in place, you don't change just one particular array (e.g. if it is
int[INT_MAX]), but all user arrays as well.
You need to create a new array type instead, using built_array_type_nelts.
See what shrink_simd_arrays does.

Why do you need that calculate_dominance_info (CDI_DOMINATORS); ?
I wonder if it wouldn't be better to reuse tree-vectorizer.cc infrastructure
here, note_simd_array_uses in particular, perhaps add a helper function
in tree-vectorizer.cc for that.

> @@ -2755,7 +2826,10 @@ execute_omp_device_lower ()
>           rhs = build_int_cst (type, vf);
>           break;
>         case IFN_GOMP_MAX_VF:
> -         rhs = build_int_cst (type, omp_max_vf (false));
> +         /* Use constant_lower_bound, to keep max_vf consistent with
> +            adjut_max_vf above.  */

s/adjut/adjust/

> +         rhs = build_int_cst (type,
> +                              constant_lower_bound (omp_max_vf (false)));
>           break;
>         case IFN_GOMP_SIMT_ORDERED_PRED:
>           rhs = vf == 1 ? integer_zero_node : NULL_TREE;


        Jakub

Re: [RFC] Enabling SVE with offloading to nvptx

Reply via email to