On Wed, 25 Jun 2025, Richard Biener wrote: > On Wed, 25 Jun 2025, Robin Dapp wrote: > > > Hi, > > > > this patch adds simple misalignment checks for gather/scatter > > operations. Previously, we assumed that those perform element accesses > > internally so alignment does not matter. The riscv vector spec however > > explicitly states that vector operations are allowed to fault on > > element-misaligned accesses. Reasonable uarchs won't, but... > > > > For gather/scatter we have two paths in the vectorizer: > > > > (1) Regular analysis based on datarefs. Here we can also create > > strided loads. > > (2) Non-affine access where each gather index is relative to the > > initial address. > > > > The assumption this patch works off is that once the alignment for the > > first scalar is correct, all others will fall in line, as the index is > > always a multiple of the first element's size. > > > > For (1) we have a dataref and can check it for alignment as in other > > cases. For (2) this patch checks the object alignment of BASE and > > compares it against the natural alignment of the current vectype's unit. > > > > The patch also adds a pointer argument to the gather/scatter IFNs that > > contains the necessary alignment. Most of the patch is thus mechanical > > in that it merely adjusts indices. > > > > I tested the riscv version with a custom qemu version that faults on > > element-misaligned vector accesses. With this patch applied, there is > > just a single fault left, which is due to PR120782 and which will be > > addressed separately. > > > > Is the general approach reasonable or do we need to do something else > > entirely? Bootstrap and regtest on aarch64 went fine. > > > > I couldn't bootstrap/regtest on x86 as my regular cfarm machines > > (420-422) are currently down. Issues are expected, though, as the patch > > doesn't touch x86's old-style gathers/scatters at all yet. I still > > wanted to get this initial version out there to get feedback. > > > > The two riscv-specific changes I can still split off, obviously. > > Also, I couldn't help but do tiny refactoring in some spots :) This > > could also go if requested. > > > > I noticed one early-break failure with the changes where we would give > > up on a load_permutation of {0}. It looks latent and probably > > unintended but I didn't investigate for now and just allowed this > > specific permutation. > > This change reminds me that we lack documentation about arguments > of most of the "complicated" internal functions ... > > We miss internal_fn_gatherscatter_{offset,scale}_index and possibly > a internal_fn_ldst_ptr_index (always zero?) and > internal_fn_ldst_alias_align_index (always one, if supported?). > > if (elsvals && icode != CODE_FOR_nothing) > get_supported_else_vals > - (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD) + 1, > *elsvals); > + (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD), *elsvals); > > these "fixes" seem to be independent? > > + /* TODO: Is IS_PACKED necessary/useful here or does get_obj_alignment > + suffice? */ > + bool is_packed = not_size_aligned (DR_REF (dr)); > + info->align_ptr = build_int_cst > + (reference_alias_ptr_type (DR_REF (dr)), > + is_packed ? 1 : get_object_alignment (DR_REF (dr))); > > I think get_object_alignment should be sufficient. > > + gs_info->align_ptr = build_int_cst > + (reference_alias_ptr_type (DR_REF (dr)), DR_BASE_ALIGNMENT (dr)); > > why's this? If DR_BASE_ALIGNMENT is bigger than element alignment > it could be possibly not apply to all loads forming the gather? > > @@ -2411,8 +2413,7 @@ get_group_load_store_type (vec_info *vinfo, > stmt_vec_info stmt_info, > || *memory_access_type == VMAT_CONTIGUOUS_REVERSE) > *poffset = neg_ldst_offset; > > - if (*memory_access_type == VMAT_GATHER_SCATTER > - || *memory_access_type == VMAT_ELEMENTWISE > + if (*memory_access_type == VMAT_ELEMENTWISE > > this probably needs some refactoring with the adjustments you > do in get_load_store_type given a few lines above we can end up > classifying a load/store as VMAT_GATHER_SCATTER if > vect_use_strided_gather_scatters_p. But then you'd use the > wrong alignment analysis going forward.
It also occurs to me that whether we need the alignment check depends on the gather strategy used - in particular with emulated gather/scatter we use actual scalar loads, thus the existing checks are sufficient. So you'd want to gate this on gs_info.ifn != IFN_LAST (gs_info.decl is only used by x86 and is legacy and x86 is fine with misaligned element accesses). Richrd. > + bool is_misaligned = scalar_align < inner_vectype_sz; > + bool is_packed = scalar_align > 1 && is_misaligned; > + > + *misalignment = !is_misaligned ? 0 : inner_vectype_sz - > scalar_align; > + > + if (targetm.vectorize.support_vector_misalignment > + (TYPE_MODE (vectype), inner_vectype, *misalignment, is_packed)) > > the misalignment argument is meaningless, I think you want to > pass DR_MISALIGNMENT_UNKNOWN for this and just pass is_packed > if the scalars acesses are not at least size aligned. > > Note the hook really doesn't know whether you ask it for gather/scatter > or a contiguous vector load so I wonder whether the above fits > constraints on other platforms where scalar accesses might be > allowed to be packed but all unaligned vector accesses would need > to be element aligned? > > + /* The alignment_ptr of the base. */ > > The TBAA alias pointer type where the value determines the alignment > of the scalar accesses. > > + tree align_ptr; > > in particular it shouldn't be the alignment of the base, because > that might be larger than the vector element alignment. > > Thats my comments sofar. > > Thanks for tackling this. > Richard. > > > > Regards > > Robin > > > > gcc/ChangeLog: > > > > * config/riscv/riscv.cc (riscv_support_vector_misalignment): > > Always support known aligned types. > > * internal-fn.cc (expand_scatter_store_optab_fn): Change > > argument numbers. > > (expand_gather_load_optab_fn): Ditto. > > (internal_fn_len_index): Ditto. > > (internal_fn_else_index): Ditto. > > (internal_fn_mask_index): Ditto. > > (internal_fn_stored_value_index): Ditto. > > (internal_gather_scatter_fn_supported_p): Ditto. > > * optabs-query.cc (supports_vec_gather_load_p): Ditto. > > * tree-vect-data-refs.cc (vect_describe_gather_scatter_call): > > Handle align_ptr. > > (vect_check_gather_scatter): Compute and set align_ptr. > > * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): > > Ditto. > > * tree-vect-slp.cc (GATHER_SCATTER_OFFSET): Define. > > (vect_get_and_check_slp_defs): Use define. > > * tree-vect-stmts.cc (vect_truncate_gather_scatter_offset): > > Set align_ptr. > > (get_group_load_store_type): Do not special-case gather/scatter. > > (get_load_store_type): Compute misalignment. > > (vectorizable_store): Remove alignment assert for > > scatter/gather. > > (vectorizable_load): Ditto. > > * tree-vectorizer.h (struct gather_scatter_info): Add align_ptr. > > > > gcc/testsuite/ChangeLog: > > > > * lib/target-supports.exp: Fix riscv misalign supported check. > > --- > > gcc/config/riscv/riscv.cc | 24 ++++++-- > > gcc/internal-fn.cc | 21 ++++--- > > gcc/optabs-query.cc | 2 +- > > gcc/testsuite/lib/target-supports.exp | 2 +- > > gcc/tree-vect-data-refs.cc | 13 ++++- > > gcc/tree-vect-patterns.cc | 17 +++--- > > gcc/tree-vect-slp.cc | 20 ++++--- > > gcc/tree-vect-stmts.cc | 83 ++++++++++++++++++++------- > > gcc/tree-vectorizer.h | 3 + > > 9 files changed, 130 insertions(+), 55 deletions(-) > > > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > > index 8fdc5b21484..02637ee5a5b 100644 > > --- a/gcc/config/riscv/riscv.cc > > +++ b/gcc/config/riscv/riscv.cc > > @@ -12069,11 +12069,27 @@ riscv_estimated_poly_value (poly_int64 val, > > target. */ > > bool > > riscv_support_vector_misalignment (machine_mode mode, > > - const_tree type ATTRIBUTE_UNUSED, > > + const_tree type, > > int misalignment, > > - bool is_packed ATTRIBUTE_UNUSED) > > -{ > > - /* Depend on movmisalign pattern. */ > > + bool is_packed) > > +{ > > + /* IS_PACKED is true if the corresponding scalar element is not naturally > > + aligned. In that case defer to the default hook which will check > > + if movmisalign is present. Movmisalign, in turn, depends on > > + TARGET_VECTOR_MISALIGN_SUPPORTED. */ > > + if (is_packed) > > + return default_builtin_support_vector_misalignment (mode, type, > > + misalignment, > > + is_packed); > > + > > + /* If we know that misalignment is a multiple of the element size, we're > > + good. */ > > + if (misalignment % TYPE_ALIGN_UNIT (type) == 0) > > + return true; > > + > > + /* TODO: misalignment == -1. Give up? */ > > + > > + /* Otherwise fall back to movmisalign again. */ > > return default_builtin_support_vector_misalignment (mode, type, > > misalignment, > > is_packed); > > } > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > > index 7b44fabc408..2f066aea460 100644 > > --- a/gcc/internal-fn.cc > > +++ b/gcc/internal-fn.cc > > @@ -3654,8 +3654,8 @@ expand_scatter_store_optab_fn (internal_fn, gcall > > *stmt, > > direct_optab optab) > > internal_fn ifn = gimple_call_internal_fn (stmt); > > int rhs_index = internal_fn_stored_value_index (ifn); > > tree base = gimple_call_arg (stmt, 0); > > - tree offset = gimple_call_arg (stmt, 1); > > - tree scale = gimple_call_arg (stmt, 2); > > + tree offset = gimple_call_arg (stmt, 2); > > + tree scale = gimple_call_arg (stmt, 3); > > tree rhs = gimple_call_arg (stmt, rhs_index); > > > > rtx base_rtx = expand_normal (base); > > @@ -3684,8 +3684,8 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, > > direct_optab optab) > > { > > tree lhs = gimple_call_lhs (stmt); > > tree base = gimple_call_arg (stmt, 0); > > - tree offset = gimple_call_arg (stmt, 1); > > - tree scale = gimple_call_arg (stmt, 2); > > + tree offset = gimple_call_arg (stmt, 2); > > + tree scale = gimple_call_arg (stmt, 3); > > > > rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > > rtx base_rtx = expand_normal (base); > > @@ -4936,11 +4936,13 @@ internal_fn_len_index (internal_fn fn) > > return 2; > > > > case IFN_MASK_LEN_SCATTER_STORE: > > + return 6; > > + > > case IFN_MASK_LEN_STRIDED_LOAD: > > return 5; > > > > case IFN_MASK_LEN_GATHER_LOAD: > > - return 6; > > + return 7; > > > > case IFN_COND_LEN_FMA: > > case IFN_COND_LEN_FMS: > > @@ -5044,7 +5046,7 @@ internal_fn_else_index (internal_fn fn) > > > > case IFN_MASK_GATHER_LOAD: > > case IFN_MASK_LEN_GATHER_LOAD: > > - return 5; > > + return 6; > > > > default: > > return -1; > > @@ -5079,7 +5081,7 @@ internal_fn_mask_index (internal_fn fn) > > case IFN_MASK_SCATTER_STORE: > > case IFN_MASK_LEN_GATHER_LOAD: > > case IFN_MASK_LEN_SCATTER_STORE: > > - return 4; > > + return 5; > > > > case IFN_VCOND_MASK: > > case IFN_VCOND_MASK_LEN: > > @@ -5104,10 +5106,11 @@ internal_fn_stored_value_index (internal_fn fn) > > > > case IFN_MASK_STORE: > > case IFN_MASK_STORE_LANES: > > + return 3; > > case IFN_SCATTER_STORE: > > case IFN_MASK_SCATTER_STORE: > > case IFN_MASK_LEN_SCATTER_STORE: > > - return 3; > > + return 4; > > > > case IFN_LEN_STORE: > > return 4; > > @@ -5205,7 +5208,7 @@ internal_gather_scatter_fn_supported_p (internal_fn > > ifn, > > tree vector_type, > > */ > > if (ok && elsvals) > > get_supported_else_vals > > - (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD) + 1, *elsvals); > > + (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD), *elsvals); > > > > return ok; > > } > > diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc > > index f5ca98da818..ac9d7106aee 100644 > > --- a/gcc/optabs-query.cc > > +++ b/gcc/optabs-query.cc > > @@ -725,7 +725,7 @@ supports_vec_gather_load_p (machine_mode mode, vec<int> > > *elsvals) > > */ > > if (elsvals && icode != CODE_FOR_nothing) > > get_supported_else_vals > > - (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD) + 1, *elsvals); > > + (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD), *elsvals); > > > > return this_fn_optabs->supports_vec_gather_load[mode] > 0; > > } > > diff --git a/gcc/testsuite/lib/target-supports.exp > > b/gcc/testsuite/lib/target-supports.exp > > index dfffe3adfbd..ab127cb8f8b 100644 > > --- a/gcc/testsuite/lib/target-supports.exp > > +++ b/gcc/testsuite/lib/target-supports.exp > > @@ -2428,7 +2428,7 @@ proc check_effective_target_riscv_v_misalign_ok { } { > > = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}; > > asm ("vsetivli zero,7,e8,m1,ta,ma"); > > asm ("addi a7,%0,1" : : "r" (a) : "a7" ); > > - asm ("vle8.v v8,0(a7)" : : : "v8"); > > + asm ("vle16.v v8,0(a7)" : : : "v8"); > > return 0; } } "-march=${gcc_march}"] } { > > return 1 > > } > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > > index 036903a948f..087c717b8e9 100644 > > --- a/gcc/tree-vect-data-refs.cc > > +++ b/gcc/tree-vect-data-refs.cc > > @@ -4441,10 +4441,11 @@ vect_describe_gather_scatter_call (stmt_vec_info > > stmt_info, > > info->ifn = gimple_call_internal_fn (call); > > info->decl = NULL_TREE; > > info->base = gimple_call_arg (call, 0); > > - info->offset = gimple_call_arg (call, 1); > > + info->align_ptr = gimple_call_arg (call, 1); > > + info->offset = gimple_call_arg (call, 2); > > info->offset_dt = vect_unknown_def_type; > > info->offset_vectype = NULL_TREE; > > - info->scale = TREE_INT_CST_LOW (gimple_call_arg (call, 2)); > > + info->scale = TREE_INT_CST_LOW (gimple_call_arg (call, 3)); > > info->element_type = TREE_TYPE (vectype); > > info->memory_type = TREE_TYPE (DR_REF (dr)); > > } > > @@ -4769,6 +4770,14 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, > > loop_vec_info loop_vinfo, > > info->ifn = ifn; > > info->decl = decl; > > info->base = base; > > + > > + /* TODO: Is IS_PACKED necessary/useful here or does get_obj_alignment > > + suffice? */ > > + bool is_packed = not_size_aligned (DR_REF (dr)); > > + info->align_ptr = build_int_cst > > + (reference_alias_ptr_type (DR_REF (dr)), > > + is_packed ? 1 : get_object_alignment (DR_REF (dr))); > > + > > info->offset = off; > > info->offset_dt = vect_unknown_def_type; > > info->offset_vectype = offset_vectype; > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > index 0f6d6b77ea1..e0035ed845a 100644 > > --- a/gcc/tree-vect-patterns.cc > > +++ b/gcc/tree-vect-patterns.cc > > @@ -6042,12 +6042,14 @@ vect_recog_gather_scatter_pattern (vec_info *vinfo, > > > > tree vec_els > > = vect_get_mask_load_else (elsval, TREE_TYPE (gs_vectype)); > > - pattern_stmt = gimple_build_call_internal (gs_info.ifn, 6, base, > > + pattern_stmt = gimple_build_call_internal (gs_info.ifn, 7, base, > > + gs_info.align_ptr, > > offset, scale, zero, > > mask, > > vec_els); > > } > > else > > - pattern_stmt = gimple_build_call_internal (gs_info.ifn, 4, base, > > + pattern_stmt = gimple_build_call_internal (gs_info.ifn, 5, base, > > + gs_info.align_ptr, > > offset, scale, zero); > > tree lhs = gimple_get_lhs (stmt_info->stmt); > > tree load_lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL); > > @@ -6057,12 +6059,13 @@ vect_recog_gather_scatter_pattern (vec_info *vinfo, > > { > > tree rhs = vect_get_store_rhs (stmt_info); > > if (mask != NULL) > > - pattern_stmt = gimple_build_call_internal (gs_info.ifn, 5, > > - base, offset, scale, rhs, > > - mask); > > + pattern_stmt = gimple_build_call_internal (gs_info.ifn, 6, > > + base, gs_info.align_ptr, > > + offset, scale, rhs, mask); > > else > > - pattern_stmt = gimple_build_call_internal (gs_info.ifn, 4, > > - base, offset, scale, rhs); > > + pattern_stmt = gimple_build_call_internal (gs_info.ifn, 5, > > + base, gs_info.align_ptr, > > + offset, scale, rhs); > > } > > gimple_call_set_nothrow (pattern_stmt, true); > > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > > index dc89da3bf17..b0d417d0309 100644 > > --- a/gcc/tree-vect-slp.cc > > +++ b/gcc/tree-vect-slp.cc > > @@ -507,6 +507,8 @@ vect_def_types_match (enum vect_def_type dta, enum > > vect_def_type dtb) > > && (dtb == vect_external_def || dtb == vect_constant_def))); > > } > > > > +#define GATHER_SCATTER_OFFSET (-3) > > + > > static const int cond_expr_maps[3][5] = { > > { 4, -1, -2, 1, 2 }, > > { 4, -2, -1, 1, 2 }, > > @@ -514,17 +516,17 @@ static const int cond_expr_maps[3][5] = { > > }; > > static const int no_arg_map[] = { 0 }; > > static const int arg0_map[] = { 1, 0 }; > > -static const int arg1_map[] = { 1, 1 }; > > +static const int arg1_map[] = { 1, 2 }; > > static const int arg2_arg3_map[] = { 2, 2, 3 }; > > -static const int arg1_arg3_map[] = { 2, 1, 3 }; > > -static const int arg1_arg4_arg5_map[] = { 3, 1, 4, 5 }; > > -static const int arg1_arg3_arg4_map[] = { 3, 1, 3, 4 }; > > +static const int arg1_arg3_map[] = { 2, 2, 4 }; > > +static const int arg1_arg4_arg5_map[] = { 3, 2, 5, 6 }; > > +static const int arg1_arg3_arg4_map[] = { 3, 2, 4, 5 }; > > static const int arg3_arg2_map[] = { 2, 3, 2 }; > > static const int op1_op0_map[] = { 2, 1, 0 }; > > -static const int off_map[] = { 1, -3 }; > > -static const int off_op0_map[] = { 2, -3, 0 }; > > -static const int off_arg2_arg3_map[] = { 3, -3, 2, 3 }; > > -static const int off_arg3_arg2_map[] = { 3, -3, 3, 2 }; > > +static const int off_map[] = { 1, GATHER_SCATTER_OFFSET }; > > +static const int off_op0_map[] = { 2, GATHER_SCATTER_OFFSET, 0 }; > > +static const int off_arg2_arg3_map[] = { 3, GATHER_SCATTER_OFFSET, 2, 3 }; > > +static const int off_arg3_arg2_map[] = { 3, GATHER_SCATTER_OFFSET, 3, 2 }; > > static const int mask_call_maps[6][7] = { > > { 1, 1, }, > > { 2, 1, 2, }, > > @@ -696,7 +698,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned > > char swap, > > { > > oprnd_info = (*oprnds_info)[i]; > > int opno = map ? map[i] : int (i); > > - if (opno == -3) > > + if (opno == GATHER_SCATTER_OFFSET) > > { > > gcc_assert (STMT_VINFO_GATHER_SCATTER_P (stmt_info)); > > if (!is_a <loop_vec_info> (vinfo) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > index 02a12ab20c2..3c7861c3fd9 100644 > > --- a/gcc/tree-vect-stmts.cc > > +++ b/gcc/tree-vect-stmts.cc > > @@ -1803,6 +1803,8 @@ vect_truncate_gather_scatter_offset (stmt_vec_info > > stmt_info, > > /* Logically the sum of DR_BASE_ADDRESS, DR_INIT and DR_OFFSET, > > but we don't need to store that here. */ > > gs_info->base = NULL_TREE; > > + gs_info->align_ptr = build_int_cst > > + (reference_alias_ptr_type (DR_REF (dr)), DR_BASE_ALIGNMENT (dr)); > > gs_info->element_type = TREE_TYPE (vectype); > > gs_info->offset = fold_convert (offset_type, step); > > gs_info->offset_dt = vect_constant_def; > > @@ -2411,8 +2413,7 @@ get_group_load_store_type (vec_info *vinfo, > > stmt_vec_info stmt_info, > > || *memory_access_type == VMAT_CONTIGUOUS_REVERSE) > > *poffset = neg_ldst_offset; > > > > - if (*memory_access_type == VMAT_GATHER_SCATTER > > - || *memory_access_type == VMAT_ELEMENTWISE > > + if (*memory_access_type == VMAT_ELEMENTWISE > > || *memory_access_type == VMAT_STRIDED_SLP > > || *memory_access_type == VMAT_INVARIANT) > > { > > @@ -2543,9 +2544,36 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info > > stmt_info, > > return false; > > } > > } > > - /* Gather-scatter accesses perform only component accesses, alignment > > - is irrelevant for them. */ > > - *alignment_support_scheme = dr_unaligned_supported; > > + > > + /* Gather-scatter accesses normally perform only component accesses > > so > > + alignment is irrelevant for them. Targets like riscv do care about > > + scalar alignment in vector accesses, though, so check scalar > > + alignment here. We determined the alias pointer as well as the base > > + alignment during pattern recognition and can re-use it here. > > + > > + As we do not have a dataref we only know the alignment of the > > + base. For now don't try harder to determine misalignment and just > > + assume it is unknown. We consider the type packed if its scalar > > + alignment is lower than the natural alignment of a vector > > + element's type. */ > > + > > + tree inner_vectype = TREE_TYPE (vectype); > > + > > + unsigned HOST_WIDE_INT scalar_align > > + = tree_to_uhwi (gs_info->align_ptr); > > + unsigned HOST_WIDE_INT inner_vectype_sz > > + = tree_to_uhwi (TYPE_SIZE (inner_vectype)); > > + > > + bool is_misaligned = scalar_align < inner_vectype_sz; > > + bool is_packed = scalar_align > 1 && is_misaligned; > > + > > + *misalignment = !is_misaligned ? 0 : inner_vectype_sz - scalar_align; > > + > > + if (targetm.vectorize.support_vector_misalignment > > + (TYPE_MODE (vectype), inner_vectype, *misalignment, is_packed)) > > + *alignment_support_scheme = dr_unaligned_supported; > > + else > > + *alignment_support_scheme = dr_unaligned_unsupported; > > } > > else if (!get_group_load_store_type (vinfo, stmt_info, vectype, slp_node, > > masked_p, > > @@ -2586,10 +2614,10 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info > > stmt_info, > > "alignment. With non-contiguous memory > > vectorization" > > " could read out of bounds at %G ", > > STMT_VINFO_STMT (stmt_info)); > > - if (inbounds) > > - LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true; > > - else > > - return false; > > + if (inbounds) > > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true; > > + else > > + return false; > > } > > > > /* If this DR needs alignment for correctness, we must ensure the target > > @@ -2677,7 +2705,9 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info > > stmt_info, > > such only the first load in the group is aligned, the rest are not. > > Because of this the permutes may break the alignment requirements > > that > > have been set, and as such we should for now, reject them. */ > > - if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()) > > + load_permutation_t lperm = SLP_TREE_LOAD_PERMUTATION (slp_node); > > + if (lperm.exists () > > + && (lperm.length () > 1 || (lperm.length () && lperm[0] != 0))) > > { > > if (dump_enabled_p ()) > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > @@ -9201,7 +9231,8 @@ vectorizable_store (vec_info *vinfo, > > { > > if (VECTOR_TYPE_P (TREE_TYPE (vec_offset))) > > call = gimple_build_call_internal ( > > - IFN_MASK_LEN_SCATTER_STORE, 7, dataref_ptr, > > + IFN_MASK_LEN_SCATTER_STORE, 8, dataref_ptr, > > + gs_info.align_ptr, > > vec_offset, scale, vec_oprnd, final_mask, > > final_len, > > bias); > > else > > @@ -9214,11 +9245,14 @@ vectorizable_store (vec_info *vinfo, > > } > > else if (final_mask) > > call = gimple_build_call_internal > > - (IFN_MASK_SCATTER_STORE, 5, dataref_ptr, > > + (IFN_MASK_SCATTER_STORE, 6, dataref_ptr, > > + gs_info.align_ptr, > > vec_offset, scale, vec_oprnd, final_mask); > > else > > - call = gimple_build_call_internal (IFN_SCATTER_STORE, 4, > > - dataref_ptr, vec_offset, > > + call = gimple_build_call_internal (IFN_SCATTER_STORE, 5, > > + dataref_ptr, > > + gs_info.align_ptr, > > + vec_offset, > > scale, vec_oprnd); > > gimple_call_set_nothrow (call, true); > > vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); > > @@ -10869,7 +10903,6 @@ vectorizable_load (vec_info *vinfo, > > vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); > > } > > > > - gcc_assert (alignment_support_scheme); > > vec_loop_masks *loop_masks > > = (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) > > ? &LOOP_VINFO_MASKS (loop_vinfo) > > @@ -10889,10 +10922,12 @@ vectorizable_load (vec_info *vinfo, > > > > /* Targets with store-lane instructions must not require explicit > > realignment. vect_supportable_dr_alignment always returns either > > - dr_aligned or dr_unaligned_supported for masked operations. */ > > + dr_aligned or dr_unaligned_supported for (non-length) masked > > + operations. */ > > gcc_assert ((memory_access_type != VMAT_LOAD_STORE_LANES > > && !mask > > && !loop_masks) > > + || memory_access_type == VMAT_GATHER_SCATTER > > || alignment_support_scheme == dr_aligned > > || alignment_support_scheme == dr_unaligned_supported); > > > > @@ -11259,8 +11294,8 @@ vectorizable_load (vec_info *vinfo, > > > > if (memory_access_type == VMAT_GATHER_SCATTER) > > { > > - gcc_assert (alignment_support_scheme == dr_aligned > > - || alignment_support_scheme == dr_unaligned_supported); > > +// gcc_assert (alignment_support_scheme == dr_aligned > > +// || alignment_support_scheme == dr_unaligned_supported); > > gcc_assert (!grouped_load && !slp_perm); > > > > unsigned int inside_cost = 0, prologue_cost = 0; > > @@ -11363,7 +11398,8 @@ vectorizable_load (vec_info *vinfo, > > { > > if (VECTOR_TYPE_P (TREE_TYPE (vec_offset))) > > call = gimple_build_call_internal ( > > - IFN_MASK_LEN_GATHER_LOAD, 8, dataref_ptr, > > vec_offset, > > + IFN_MASK_LEN_GATHER_LOAD, 9, dataref_ptr, > > + gs_info.align_ptr, vec_offset, > > scale, zero, final_mask, vec_els, final_len, bias); > > else > > /* Non-vector offset indicates that prefer to take > > @@ -11375,13 +11411,16 @@ vectorizable_load (vec_info *vinfo, > > } > > else if (final_mask) > > call = gimple_build_call_internal (IFN_MASK_GATHER_LOAD, > > - 6, dataref_ptr, > > + 7, dataref_ptr, > > + gs_info.align_ptr, > > vec_offset, scale, > > zero, final_mask, > > vec_els); > > else > > - call = gimple_build_call_internal (IFN_GATHER_LOAD, 4, > > - dataref_ptr, > > vec_offset, > > + call = gimple_build_call_internal (IFN_GATHER_LOAD, 5, > > + dataref_ptr, > > + gs_info.align_ptr, > > + vec_offset, > > scale, zero); > > gimple_call_set_nothrow (call, true); > > new_stmt = call; > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > index 32c7e52a46e..42da0fa294b 100644 > > --- a/gcc/tree-vectorizer.h > > +++ b/gcc/tree-vectorizer.h > > @@ -1545,6 +1545,9 @@ struct gather_scatter_info { > > /* The loop-invariant base value. */ > > tree base; > > > > + /* The alignment_ptr of the base. */ > > + tree align_ptr; > > + > > /* The original scalar offset, which is a non-loop-invariant SSA_NAME. */ > > tree offset; > > > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)