On Wed, 25 Jun 2025, Richard Biener wrote:

> On Wed, 25 Jun 2025, Robin Dapp wrote:
> 
> > Hi,
> > 
> > this patch adds simple misalignment checks for gather/scatter
> > operations.  Previously, we assumed that those perform element accesses
> > internally so alignment does not matter.  The riscv vector spec however
> > explicitly states that vector operations are allowed to fault on
> > element-misaligned accesses.  Reasonable uarchs won't, but...
> > 
> > For gather/scatter we have two paths in the vectorizer:
> > 
> > (1) Regular analysis based on datarefs.  Here we can also create
> >     strided loads.
> > (2) Non-affine access where each gather index is relative to the
> >     initial address.
> > 
> > The assumption this patch works off is that once the alignment for the
> > first scalar is correct, all others will fall in line, as the index is
> > always a multiple of the first element's size.
> > 
> > For (1) we have a dataref and can check it for alignment as in other
> > cases.  For (2) this patch checks the object alignment of BASE and
> > compares it against the natural alignment of the current vectype's unit.
> > 
> > The patch also adds a pointer argument to the gather/scatter IFNs that
> > contains the necessary alignment.  Most of the patch is thus mechanical
> > in that it merely adjusts indices.
> > 
> > I tested the riscv version with a custom qemu version that faults on
> > element-misaligned vector accesses.  With this patch applied, there is
> > just a single fault left, which is due to PR120782 and which will be
> > addressed separately.
> > 
> > Is the general approach reasonable or do we need to do something else
> > entirely?  Bootstrap and regtest on aarch64 went fine.
> > 
> > I couldn't bootstrap/regtest on x86 as my regular cfarm machines
> > (420-422) are currently down.  Issues are expected, though, as the patch
> > doesn't touch x86's old-style gathers/scatters at all yet.  I still
> > wanted to get this initial version out there to get feedback.
> > 
> > The two riscv-specific changes I can still split off, obviously.
> > Also, I couldn't help but do tiny refactoring in some spots :)  This
> > could also go if requested.
> > 
> > I noticed one early-break failure with the changes where we would give
> > up on a load_permutation of {0}.  It looks latent and probably
> > unintended but I didn't investigate for now and just allowed this
> > specific permutation.
> 
> This change reminds me that we lack documentation about arguments
> of most of the "complicated" internal functions ...
> 
> We miss internal_fn_gatherscatter_{offset,scale}_index and possibly
> a internal_fn_ldst_ptr_index (always zero?) and
> internal_fn_ldst_alias_align_index (always one, if supported?).
> 
>   if (elsvals && icode != CODE_FOR_nothing)
>      get_supported_else_vals
> -      (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD) + 1, 
> *elsvals);
> +      (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD), *elsvals);
> 
> these "fixes" seem to be independent?
> 
> +  /* TODO: Is IS_PACKED necessary/useful here or does get_obj_alignment
> +     suffice?  */
> +  bool is_packed = not_size_aligned (DR_REF (dr));
> +  info->align_ptr = build_int_cst
> +    (reference_alias_ptr_type (DR_REF (dr)),
> +     is_packed ? 1 : get_object_alignment (DR_REF (dr)));
> 
> I think get_object_alignment should be sufficient.
> 
> +      gs_info->align_ptr = build_int_cst
> +       (reference_alias_ptr_type (DR_REF (dr)), DR_BASE_ALIGNMENT (dr));
> 
> why's this?  If DR_BASE_ALIGNMENT is bigger than element alignment
> it could be possibly not apply to all loads forming the gather?
> 
> @@ -2411,8 +2413,7 @@ get_group_load_store_type (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>        || *memory_access_type == VMAT_CONTIGUOUS_REVERSE)
>      *poffset = neg_ldst_offset;
> 
> -  if (*memory_access_type == VMAT_GATHER_SCATTER
> -      || *memory_access_type == VMAT_ELEMENTWISE
> +  if (*memory_access_type == VMAT_ELEMENTWISE
> 
> this probably needs some refactoring with the adjustments you
> do in get_load_store_type given a few lines above we can end up
> classifying a load/store as VMAT_GATHER_SCATTER if
> vect_use_strided_gather_scatters_p.  But then you'd use the
> wrong alignment analysis going forward.

It also occurs to me that whether we need the alignment check depends
on the gather strategy used - in particular with emulated gather/scatter
we use actual scalar loads, thus the existing checks are sufficient.
So you'd want to gate this on gs_info.ifn != IFN_LAST (gs_info.decl
is only used by x86 and is legacy and x86 is fine with misaligned
element accesses).

Richrd.

> +      bool is_misaligned = scalar_align < inner_vectype_sz;
> +      bool is_packed = scalar_align > 1 && is_misaligned;
> +
> +      *misalignment = !is_misaligned ? 0 : inner_vectype_sz - 
> scalar_align;
> +
> +      if (targetm.vectorize.support_vector_misalignment
> +         (TYPE_MODE (vectype), inner_vectype, *misalignment, is_packed))
> 
> the misalignment argument is meaningless, I think you want to
> pass DR_MISALIGNMENT_UNKNOWN for this and just pass is_packed
> if the scalars acesses are not at least size aligned.
> 
> Note the hook really doesn't know whether you ask it for gather/scatter
> or a contiguous vector load so I wonder whether the above fits
> constraints on other platforms where scalar accesses might be
> allowed to be packed but all unaligned vector accesses would need
> to be element aligned?
> 
> +  /* The alignment_ptr of the base.  */
> 
> The TBAA alias pointer type where the value determines the alignment
> of the scalar accesses.
> 
> +  tree align_ptr;
> 
> in particular it shouldn't be the alignment of the base, because
> that might be larger than the vector element alignment.
> 
> Thats my comments sofar.
> 
> Thanks for tackling this.
> Richard.
> 
> 
> > Regards
> > Robin
> > 
> > gcc/ChangeLog:
> > 
> >     * config/riscv/riscv.cc (riscv_support_vector_misalignment):
> >     Always support known aligned types.
> >     * internal-fn.cc (expand_scatter_store_optab_fn): Change
> >     argument numbers.
> >     (expand_gather_load_optab_fn): Ditto.
> >     (internal_fn_len_index): Ditto.
> >     (internal_fn_else_index): Ditto.
> >     (internal_fn_mask_index): Ditto.
> >     (internal_fn_stored_value_index): Ditto.
> >     (internal_gather_scatter_fn_supported_p): Ditto.
> >     * optabs-query.cc (supports_vec_gather_load_p): Ditto.
> >     * tree-vect-data-refs.cc (vect_describe_gather_scatter_call):
> >     Handle align_ptr.
> >     (vect_check_gather_scatter): Compute and set align_ptr.
> >     * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern):
> >     Ditto.
> >     * tree-vect-slp.cc (GATHER_SCATTER_OFFSET): Define.
> >     (vect_get_and_check_slp_defs): Use define.
> >     * tree-vect-stmts.cc (vect_truncate_gather_scatter_offset):
> >     Set align_ptr.
> >     (get_group_load_store_type): Do not special-case gather/scatter.
> >     (get_load_store_type): Compute misalignment.
> >     (vectorizable_store): Remove alignment assert for
> >     scatter/gather.
> >     (vectorizable_load): Ditto.
> >     * tree-vectorizer.h (struct gather_scatter_info): Add align_ptr.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >     * lib/target-supports.exp: Fix riscv misalign supported check.
> > ---
> > gcc/config/riscv/riscv.cc             | 24 ++++++--
> > gcc/internal-fn.cc                    | 21 ++++---
> > gcc/optabs-query.cc                   |  2 +-
> > gcc/testsuite/lib/target-supports.exp |  2 +-
> > gcc/tree-vect-data-refs.cc            | 13 ++++-
> > gcc/tree-vect-patterns.cc             | 17 +++---
> > gcc/tree-vect-slp.cc                  | 20 ++++---
> > gcc/tree-vect-stmts.cc                | 83 ++++++++++++++++++++-------
> > gcc/tree-vectorizer.h                 |  3 +
> > 9 files changed, 130 insertions(+), 55 deletions(-)
> > 
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 8fdc5b21484..02637ee5a5b 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -12069,11 +12069,27 @@ riscv_estimated_poly_value (poly_int64 val,
> >    target.  */
> > bool
> > riscv_support_vector_misalignment (machine_mode mode,
> > -                              const_tree type ATTRIBUTE_UNUSED,
> > +                              const_tree type,
> >                                int misalignment,
> > -                              bool is_packed ATTRIBUTE_UNUSED)
> > -{
> > -  /* Depend on movmisalign pattern.  */
> > +                              bool is_packed)
> > +{
> > +  /* IS_PACKED is true if the corresponding scalar element is not naturally
> > +     aligned.  In that case defer to the default hook which will check
> > +     if movmisalign is present.  Movmisalign, in turn, depends on
> > +     TARGET_VECTOR_MISALIGN_SUPPORTED.  */
> > +  if (is_packed)
> > +    return default_builtin_support_vector_misalignment (mode, type,
> > +                                                   misalignment,
> > +                                                   is_packed);
> > +
> > +  /* If we know that misalignment is a multiple of the element size, we're
> > +     good.  */
> > +  if (misalignment % TYPE_ALIGN_UNIT (type) == 0)
> > +    return true;
> > +
> > +  /* TODO: misalignment == -1.  Give up?  */
> > +
> > +  /* Otherwise fall back to movmisalign again.  */
> >   return default_builtin_support_vector_misalignment (mode, type,
> >   misalignment,
> >                                                   is_packed);
> > }
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 7b44fabc408..2f066aea460 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -3654,8 +3654,8 @@ expand_scatter_store_optab_fn (internal_fn, gcall 
> > *stmt,
> > direct_optab optab)
> >   internal_fn ifn = gimple_call_internal_fn (stmt);
> >   int rhs_index = internal_fn_stored_value_index (ifn);
> >   tree base = gimple_call_arg (stmt, 0);
> > -  tree offset = gimple_call_arg (stmt, 1);
> > -  tree scale = gimple_call_arg (stmt, 2);
> > +  tree offset = gimple_call_arg (stmt, 2);
> > +  tree scale = gimple_call_arg (stmt, 3);
> >   tree rhs = gimple_call_arg (stmt, rhs_index);
> > 
> >   rtx base_rtx = expand_normal (base);
> > @@ -3684,8 +3684,8 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt,
> > direct_optab optab)
> > {
> >   tree lhs = gimple_call_lhs (stmt);
> >   tree base = gimple_call_arg (stmt, 0);
> > -  tree offset = gimple_call_arg (stmt, 1);
> > -  tree scale = gimple_call_arg (stmt, 2);
> > +  tree offset = gimple_call_arg (stmt, 2);
> > +  tree scale = gimple_call_arg (stmt, 3);
> > 
> >   rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >   rtx base_rtx = expand_normal (base);
> > @@ -4936,11 +4936,13 @@ internal_fn_len_index (internal_fn fn)
> >       return 2;
> > 
> >     case IFN_MASK_LEN_SCATTER_STORE:
> > +      return 6;
> > +
> >     case IFN_MASK_LEN_STRIDED_LOAD:
> >       return 5;
> > 
> >     case IFN_MASK_LEN_GATHER_LOAD:
> > -      return 6;
> > +      return 7;
> > 
> >     case IFN_COND_LEN_FMA:
> >     case IFN_COND_LEN_FMS:
> > @@ -5044,7 +5046,7 @@ internal_fn_else_index (internal_fn fn)
> > 
> >     case IFN_MASK_GATHER_LOAD:
> >     case IFN_MASK_LEN_GATHER_LOAD:
> > -      return 5;
> > +      return 6;
> > 
> >     default:
> >       return -1;
> > @@ -5079,7 +5081,7 @@ internal_fn_mask_index (internal_fn fn)
> >     case IFN_MASK_SCATTER_STORE:
> >     case IFN_MASK_LEN_GATHER_LOAD:
> >     case IFN_MASK_LEN_SCATTER_STORE:
> > -      return 4;
> > +      return 5;
> > 
> >     case IFN_VCOND_MASK:
> >     case IFN_VCOND_MASK_LEN:
> > @@ -5104,10 +5106,11 @@ internal_fn_stored_value_index (internal_fn fn)
> > 
> >     case IFN_MASK_STORE:
> >     case IFN_MASK_STORE_LANES:
> > +      return 3;
> >     case IFN_SCATTER_STORE:
> >     case IFN_MASK_SCATTER_STORE:
> >     case IFN_MASK_LEN_SCATTER_STORE:
> > -      return 3;
> > +      return 4;
> > 
> >     case IFN_LEN_STORE:
> >       return 4;
> > @@ -5205,7 +5208,7 @@ internal_gather_scatter_fn_supported_p (internal_fn 
> > ifn,
> > tree vector_type,
> >      */
> >   if (ok && elsvals)
> >     get_supported_else_vals
> > -      (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD) + 1, *elsvals);
> > +      (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD), *elsvals);
> > 
> >   return ok;
> > }
> > diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
> > index f5ca98da818..ac9d7106aee 100644
> > --- a/gcc/optabs-query.cc
> > +++ b/gcc/optabs-query.cc
> > @@ -725,7 +725,7 @@ supports_vec_gather_load_p (machine_mode mode, vec<int>
> > *elsvals)
> >      */
> >   if (elsvals && icode != CODE_FOR_nothing)
> >     get_supported_else_vals
> > -      (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD) + 1, *elsvals);
> > +      (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD), *elsvals);
> > 
> >   return this_fn_optabs->supports_vec_gather_load[mode] > 0;
> > }
> > diff --git a/gcc/testsuite/lib/target-supports.exp
> > b/gcc/testsuite/lib/target-supports.exp
> > index dfffe3adfbd..ab127cb8f8b 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -2428,7 +2428,7 @@ proc check_effective_target_riscv_v_misalign_ok { } {
> >             = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
> >           asm ("vsetivli zero,7,e8,m1,ta,ma");
> >           asm ("addi a7,%0,1" : : "r" (a) : "a7" );
> > -         asm ("vle8.v v8,0(a7)" : : : "v8");
> > +         asm ("vle16.v v8,0(a7)" : : : "v8");
> >           return 0; } } "-march=${gcc_march}"] } {
> >             return 1
> >     }
> > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > index 036903a948f..087c717b8e9 100644
> > --- a/gcc/tree-vect-data-refs.cc
> > +++ b/gcc/tree-vect-data-refs.cc
> > @@ -4441,10 +4441,11 @@ vect_describe_gather_scatter_call (stmt_vec_info
> > stmt_info,
> >   info->ifn = gimple_call_internal_fn (call);
> >   info->decl = NULL_TREE;
> >   info->base = gimple_call_arg (call, 0);
> > -  info->offset = gimple_call_arg (call, 1);
> > +  info->align_ptr = gimple_call_arg (call, 1);
> > +  info->offset = gimple_call_arg (call, 2);
> >   info->offset_dt = vect_unknown_def_type;
> >   info->offset_vectype = NULL_TREE;
> > -  info->scale = TREE_INT_CST_LOW (gimple_call_arg (call, 2));
> > +  info->scale = TREE_INT_CST_LOW (gimple_call_arg (call, 3));
> >   info->element_type = TREE_TYPE (vectype);
> >   info->memory_type = TREE_TYPE (DR_REF (dr));
> > }
> > @@ -4769,6 +4770,14 @@ vect_check_gather_scatter (stmt_vec_info stmt_info,
> > loop_vec_info loop_vinfo,
> >   info->ifn = ifn;
> >   info->decl = decl;
> >   info->base = base;
> > +
> > +  /* TODO: Is IS_PACKED necessary/useful here or does get_obj_alignment
> > +     suffice?  */
> > +  bool is_packed = not_size_aligned (DR_REF (dr));
> > +  info->align_ptr = build_int_cst
> > +    (reference_alias_ptr_type (DR_REF (dr)),
> > +     is_packed ? 1 : get_object_alignment (DR_REF (dr)));
> > +
> >   info->offset = off;
> >   info->offset_dt = vect_unknown_def_type;
> >   info->offset_vectype = offset_vectype;
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index 0f6d6b77ea1..e0035ed845a 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -6042,12 +6042,14 @@ vect_recog_gather_scatter_pattern (vec_info *vinfo,
> > 
> >       tree vec_els
> >         = vect_get_mask_load_else (elsval, TREE_TYPE (gs_vectype));
> > -     pattern_stmt = gimple_build_call_internal (gs_info.ifn, 6, base,
> > +     pattern_stmt = gimple_build_call_internal (gs_info.ifn, 7, base,
> > +                                                gs_info.align_ptr,
> >                                                  offset, scale, zero,
> >                                                  mask,
> >                                                  vec_els);
> >             }
> >       else
> > -   pattern_stmt = gimple_build_call_internal (gs_info.ifn, 4, base,
> > +   pattern_stmt = gimple_build_call_internal (gs_info.ifn, 5, base,
> > +                                              gs_info.align_ptr,
> >                                                offset, scale, zero);
> >       tree lhs = gimple_get_lhs (stmt_info->stmt);
> >       tree load_lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > @@ -6057,12 +6059,13 @@ vect_recog_gather_scatter_pattern (vec_info *vinfo,
> >     {
> >       tree rhs = vect_get_store_rhs (stmt_info);
> >       if (mask != NULL)
> > -   pattern_stmt = gimple_build_call_internal (gs_info.ifn, 5,
> > -                                              base, offset, scale, rhs,
> > -                                              mask);
> > +   pattern_stmt = gimple_build_call_internal (gs_info.ifn, 6,
> > +                                              base, gs_info.align_ptr,
> > +                                              offset, scale, rhs, mask);
> >       else
> > -   pattern_stmt = gimple_build_call_internal (gs_info.ifn, 4,
> > -                                              base, offset, scale, rhs);
> > +   pattern_stmt = gimple_build_call_internal (gs_info.ifn, 5,
> > +                                              base, gs_info.align_ptr,
> > +                                              offset, scale, rhs);
> >     }
> >   gimple_call_set_nothrow (pattern_stmt, true);
> > 
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index dc89da3bf17..b0d417d0309 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -507,6 +507,8 @@ vect_def_types_match (enum vect_def_type dta, enum
> > vect_def_type dtb)
> >           && (dtb == vect_external_def || dtb == vect_constant_def)));
> > }
> > 
> > +#define GATHER_SCATTER_OFFSET (-3)
> > +
> > static const int cond_expr_maps[3][5] = {
> >   { 4, -1, -2, 1, 2 },
> >   { 4, -2, -1, 1, 2 },
> > @@ -514,17 +516,17 @@ static const int cond_expr_maps[3][5] = {
> > };
> > static const int no_arg_map[] = { 0 };
> > static const int arg0_map[] = { 1, 0 };
> > -static const int arg1_map[] = { 1, 1 };
> > +static const int arg1_map[] = { 1, 2 };
> > static const int arg2_arg3_map[] = { 2, 2, 3 };
> > -static const int arg1_arg3_map[] = { 2, 1, 3 };
> > -static const int arg1_arg4_arg5_map[] = { 3, 1, 4, 5 };
> > -static const int arg1_arg3_arg4_map[] = { 3, 1, 3, 4 };
> > +static const int arg1_arg3_map[] = { 2, 2, 4 };
> > +static const int arg1_arg4_arg5_map[] = { 3, 2, 5, 6 };
> > +static const int arg1_arg3_arg4_map[] = { 3, 2, 4, 5 };
> > static const int arg3_arg2_map[] = { 2, 3, 2 };
> > static const int op1_op0_map[] = { 2, 1, 0 };
> > -static const int off_map[] = { 1, -3 };
> > -static const int off_op0_map[] = { 2, -3, 0 };
> > -static const int off_arg2_arg3_map[] = { 3, -3, 2, 3 };
> > -static const int off_arg3_arg2_map[] = { 3, -3, 3, 2 };
> > +static const int off_map[] = { 1, GATHER_SCATTER_OFFSET };
> > +static const int off_op0_map[] = { 2, GATHER_SCATTER_OFFSET, 0 };
> > +static const int off_arg2_arg3_map[] = { 3, GATHER_SCATTER_OFFSET, 2, 3 };
> > +static const int off_arg3_arg2_map[] = { 3, GATHER_SCATTER_OFFSET, 3, 2 };
> > static const int mask_call_maps[6][7] = {
> >   { 1, 1, },
> >   { 2, 1, 2, },
> > @@ -696,7 +698,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned
> > char swap,
> >     {
> >       oprnd_info = (*oprnds_info)[i];
> >       int opno = map ? map[i] : int (i);
> > -      if (opno == -3)
> > +      if (opno == GATHER_SCATTER_OFFSET)
> >     {
> >       gcc_assert (STMT_VINFO_GATHER_SCATTER_P (stmt_info));
> >       if (!is_a <loop_vec_info> (vinfo)
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index 02a12ab20c2..3c7861c3fd9 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -1803,6 +1803,8 @@ vect_truncate_gather_scatter_offset (stmt_vec_info
> > stmt_info,
> >       /* Logically the sum of DR_BASE_ADDRESS, DR_INIT and DR_OFFSET,
> >              but we don't need to store that here.  */
> >       gs_info->base = NULL_TREE;
> > +      gs_info->align_ptr = build_int_cst
> > +   (reference_alias_ptr_type (DR_REF (dr)), DR_BASE_ALIGNMENT (dr));
> >       gs_info->element_type = TREE_TYPE (vectype);
> >       gs_info->offset = fold_convert (offset_type, step);
> >       gs_info->offset_dt = vect_constant_def;
> > @@ -2411,8 +2413,7 @@ get_group_load_store_type (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> >       || *memory_access_type == VMAT_CONTIGUOUS_REVERSE)
> >     *poffset = neg_ldst_offset;
> > 
> > -  if (*memory_access_type == VMAT_GATHER_SCATTER
> > -      || *memory_access_type == VMAT_ELEMENTWISE
> > +  if (*memory_access_type == VMAT_ELEMENTWISE
> >       || *memory_access_type == VMAT_STRIDED_SLP
> >       || *memory_access_type == VMAT_INVARIANT)
> >     {
> > @@ -2543,9 +2544,36 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info
> > stmt_info,
> >           return false;
> >         }
> >     }
> > -      /* Gather-scatter accesses perform only component accesses, alignment
> > -    is irrelevant for them.  */
> > -      *alignment_support_scheme = dr_unaligned_supported;
> > +
> > +      /* Gather-scatter accesses normally perform only component accesses 
> > so
> > +    alignment is irrelevant for them.  Targets like riscv do care about
> > +    scalar alignment in vector accesses, though, so check scalar
> > +    alignment here.  We determined the alias pointer as well as the base
> > +    alignment during pattern recognition and can re-use it here.
> > +
> > +    As we do not have a dataref we only know the alignment of the
> > +    base.  For now don't try harder to determine misalignment and just
> > +    assume it is unknown.  We consider the type packed if its scalar
> > +    alignment is lower than the natural alignment of a vector
> > +    element's type.  */
> > +
> > +      tree inner_vectype = TREE_TYPE (vectype);
> > +
> > +      unsigned HOST_WIDE_INT scalar_align
> > +   = tree_to_uhwi (gs_info->align_ptr);
> > +      unsigned HOST_WIDE_INT inner_vectype_sz
> > +   = tree_to_uhwi (TYPE_SIZE (inner_vectype));
> > +
> > +      bool is_misaligned = scalar_align < inner_vectype_sz;
> > +      bool is_packed = scalar_align > 1 && is_misaligned;
> > +
> > +      *misalignment = !is_misaligned ? 0 : inner_vectype_sz - scalar_align;
> > +
> > +      if (targetm.vectorize.support_vector_misalignment
> > +     (TYPE_MODE (vectype), inner_vectype, *misalignment, is_packed))
> > +   *alignment_support_scheme = dr_unaligned_supported;
> > +      else
> > +   *alignment_support_scheme = dr_unaligned_unsupported;
> >     }
> >   else if (!get_group_load_store_type (vinfo, stmt_info, vectype, slp_node,
> >                                    masked_p,
> > @@ -2586,10 +2614,10 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info
> > stmt_info,
> >                        "alignment. With non-contiguous memory
> >                        vectorization"
> >                        " could read out of bounds at %G ",
> >                        STMT_VINFO_STMT (stmt_info));
> > -   if (inbounds)
> > -     LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> > -   else
> > -     return false;
> > +      if (inbounds)
> > +   LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> > +      else
> > +   return false;
> >     }
> > 
> >   /* If this DR needs alignment for correctness, we must ensure the target
> > @@ -2677,7 +2705,9 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info
> > stmt_info,
> >      such only the first load in the group is aligned, the rest are not.
> >      Because of this the permutes may break the alignment requirements
> >      that
> >      have been set, and as such we should for now, reject them.  */
> > -      if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
> > +      load_permutation_t lperm = SLP_TREE_LOAD_PERMUTATION (slp_node);
> > +      if (lperm.exists ()
> > +     && (lperm.length () > 1 || (lperm.length () && lperm[0] != 0)))
> >     {
> >       if (dump_enabled_p ())
> >         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > @@ -9201,7 +9231,8 @@ vectorizable_store (vec_info *vinfo,
> >             {
> >               if (VECTOR_TYPE_P (TREE_TYPE (vec_offset)))
> >                 call = gimple_build_call_internal (
> > -                       IFN_MASK_LEN_SCATTER_STORE, 7, dataref_ptr,
> > +                       IFN_MASK_LEN_SCATTER_STORE, 8, dataref_ptr,
> > +                       gs_info.align_ptr,
> >                         vec_offset, scale, vec_oprnd, final_mask,
> >                         final_len,
> >                         bias);
> >               else
> > @@ -9214,11 +9245,14 @@ vectorizable_store (vec_info *vinfo,
> >             }
> >           else if (final_mask)
> >             call = gimple_build_call_internal
> > -                        (IFN_MASK_SCATTER_STORE, 5, dataref_ptr,
> > +                        (IFN_MASK_SCATTER_STORE, 6, dataref_ptr,
> > +                         gs_info.align_ptr,
> >                           vec_offset, scale, vec_oprnd, final_mask);
> >           else
> > -           call = gimple_build_call_internal (IFN_SCATTER_STORE, 4,
> > -                                              dataref_ptr, vec_offset,
> > +           call = gimple_build_call_internal (IFN_SCATTER_STORE, 5,
> > +                                              dataref_ptr,
> > +                                              gs_info.align_ptr,
> > +                                              vec_offset,
> >                                                scale, vec_oprnd);
> >           gimple_call_set_nothrow (call, true);
> >           vect_finish_stmt_generation (vinfo, stmt_info, call, gsi);
> > @@ -10869,7 +10903,6 @@ vectorizable_load (vec_info *vinfo,
> >             vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
> >     }
> > 
> > -  gcc_assert (alignment_support_scheme);
> >   vec_loop_masks *loop_masks
> >     = (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> >        ? &LOOP_VINFO_MASKS (loop_vinfo)
> > @@ -10889,10 +10922,12 @@ vectorizable_load (vec_info *vinfo,
> > 
> >   /* Targets with store-lane instructions must not require explicit
> >      realignment.  vect_supportable_dr_alignment always returns either
> > -     dr_aligned or dr_unaligned_supported for masked operations.  */
> > +     dr_aligned or dr_unaligned_supported for (non-length) masked
> > +     operations.  */
> >   gcc_assert ((memory_access_type != VMAT_LOAD_STORE_LANES
> >            && !mask
> >            && !loop_masks)
> > +         || memory_access_type == VMAT_GATHER_SCATTER
> >           || alignment_support_scheme == dr_aligned
> >           || alignment_support_scheme == dr_unaligned_supported);
> > 
> > @@ -11259,8 +11294,8 @@ vectorizable_load (vec_info *vinfo,
> > 
> >   if (memory_access_type == VMAT_GATHER_SCATTER)
> >     {
> > -      gcc_assert (alignment_support_scheme == dr_aligned
> > -             || alignment_support_scheme == dr_unaligned_supported);
> > +//      gcc_assert (alignment_support_scheme == dr_aligned
> > +//           || alignment_support_scheme == dr_unaligned_supported);
> >       gcc_assert (!grouped_load && !slp_perm);
> > 
> >       unsigned int inside_cost = 0, prologue_cost = 0;
> > @@ -11363,7 +11398,8 @@ vectorizable_load (vec_info *vinfo,
> >                 {
> >                   if (VECTOR_TYPE_P (TREE_TYPE (vec_offset)))
> >                     call = gimple_build_call_internal (
> > -                     IFN_MASK_LEN_GATHER_LOAD, 8, dataref_ptr,
> > vec_offset,
> > +                     IFN_MASK_LEN_GATHER_LOAD, 9, dataref_ptr,
> > +                     gs_info.align_ptr, vec_offset,
> >                       scale, zero, final_mask, vec_els, final_len, bias);
> >                   else
> >                     /* Non-vector offset indicates that prefer to take
> > @@ -11375,13 +11411,16 @@ vectorizable_load (vec_info *vinfo,
> >                 }
> >               else if (final_mask)
> >                 call = gimple_build_call_internal (IFN_MASK_GATHER_LOAD,
> > -                                                  6, dataref_ptr,
> > +                                                  7, dataref_ptr,
> > +                                                  gs_info.align_ptr,
> >                                                    vec_offset, scale,
> >                                                    zero, final_mask,
> >                                                    vec_els);
> >               else
> > -               call = gimple_build_call_internal (IFN_GATHER_LOAD, 4,
> > -                                                  dataref_ptr,
> > vec_offset,
> > +               call = gimple_build_call_internal (IFN_GATHER_LOAD, 5,
> > +                                                  dataref_ptr,
> > +                                                  gs_info.align_ptr,
> > +                                                  vec_offset,
> >                                                    scale, zero);
> >               gimple_call_set_nothrow (call, true);
> >               new_stmt = call;
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index 32c7e52a46e..42da0fa294b 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -1545,6 +1545,9 @@ struct gather_scatter_info {
> >   /* The loop-invariant base value.  */
> >   tree base;
> > 
> > +  /* The alignment_ptr of the base.  */
> > +  tree align_ptr;
> > +
> >   /* The original scalar offset, which is a non-loop-invariant SSA_NAME.  */
> >   tree offset;
> > 
> > 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to