On Wed, 15 Sep 2021, Richard Sandiford wrote:
> Richard Biener <[email protected]> writes:
> > On Tue, 14 Sep 2021, Richard Sandiford wrote:
> >
> >> Richard Biener via Gcc-patches <[email protected]> writes:
> >> > This changes us to maintain and compute (mis-)alignment info for
> >> > the first element of a group only rather than for each DR when
> >> > doing interleaving and for the earliest, first, or first in the SLP
> >> > node (or any pair or all three of those) when SLP vectorizing.
> >> >
> >> > For this to work out the easiest way I have changed the accessors
> >> > DR_MISALIGNMENT and DR_TARGET_ALIGNMENT to do the indirection to
> >> > the first element rather than adjusting all callers.
> >> > dr_misalignment is moved out-of-line and I'm not too fond of the
> >> > poly-int dances there (any hints?), but basically we are now
> >> > adjusting the first elements misalignment based on the DR_INIT
> >> > difference.
> >> >
> >> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> >> >
> >> > Richard.
> >> >
> >> > 2021-09-13 Richard Biener <[email protected]>
> >> >
> >> > * tree-vectorizer.h (dr_misalignment): Move out of line.
> >> > (dr_target_alignment): New.
> >> > (DR_TARGET_ALIGNMENT): Wrap dr_target_alignment.
> >> > (set_dr_target_alignment): New.
> >> > (SET_DR_TARGET_ALIGNMENT): Wrap set_dr_target_alignment.
> >> > * tree-vect-data-refs.c (dr_misalignment): Compute and
> >> > return the group members misalignment.
> >> > (vect_compute_data_ref_alignment): Use SET_DR_TARGET_ALIGNMENT.
> >> > (vect_analyze_data_refs_alignment): Compute alignment only
> >> > for the first element of a DR group.
> >> > (vect_slp_analyze_node_alignment): Likewise.
> >> > ---
> >> > gcc/tree-vect-data-refs.c | 65 ++++++++++++++++++++++++---------------
> >> > gcc/tree-vectorizer.h | 24 ++++++++++-----
> >> > 2 files changed, 57 insertions(+), 32 deletions(-)
> >> >
> >> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> >> > index 66e76132d14..b53d6a0b3f1 100644
> >> > --- a/gcc/tree-vect-data-refs.c
> >> > +++ b/gcc/tree-vect-data-refs.c
> >> > @@ -887,6 +887,36 @@ vect_slp_analyze_instance_dependence (vec_info
> >> > *vinfo, slp_instance instance)
> >> > return res;
> >> > }
> >> >
> >> > +/* Return the misalignment of DR_INFO. */
> >> > +
> >> > +int
> >> > +dr_misalignment (dr_vec_info *dr_info)
> >> > +{
> >> > + if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
> >> > + {
> >> > + dr_vec_info *first_dr
> >> > + = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
> >> > + int misalign = first_dr->misalignment;
> >> > + gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> >> > + if (misalign == DR_MISALIGNMENT_UNKNOWN)
> >> > + return misalign;
> >> > + poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr))
> >> > + - wi::to_poly_offset (DR_INIT
> >> > (first_dr->dr)));
> >> > + poly_int64 mispoly = misalign + diff.to_constant ().to_shwi ();
> >> > + bool res = known_misalignment (mispoly,
> >> > +
> >> > first_dr->target_alignment.to_constant (),
> >> > + &misalign);
> >> > + gcc_assert (res);
> >> > + return misalign;
> >>
> >> Yeah, not too keen on the to_constants here. The one on diff looks
> >> redundant -- you could just use diff.force_shwi () instead, and
> >> keep everything poly_int.
> >>
> >> For the known_misalignment I think we should use:
> >>
> >> if (!can_div_trunc_p (mispoly, first_dr->target_alignment,
> >> "ient, &misalign))
> >> misalign = DR_MISALIGNMENT_UNKNOWN;
> >> return misalign;
> >>
> >> There are then no to_constant assumptions.
> >
> > OK, note that group analysis does
> >
> > /* Check that the DR_INITs are compile-time constants. */
> > if (TREE_CODE (DR_INIT (dra)) != INTEGER_CST
> > || TREE_CODE (DR_INIT (drb)) != INTEGER_CST)
> > break;
> >
> > /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb). */
> > HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
> > HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
> >
> > so I'm confident my variant was "correct", but it still was ugly.
>
> Ah, OK. In that case I don't mind the original version, but it would be
> good to have a comment above the to_constant saying where the condition
> is enforced.
>
> I'm just trying to avoid to_constant calls with no comment to explain
> them, and with no nearby is_constant call. Otherwise it could end up
> a bit like tree_to_uhwi, where sometimes tree_fits_uhwi_p really has
> been checked earlier (not always obvious where) and sometimes
> tree_to_uhwi is just used out of hope, to avoid having to think about
> the alternative.
>
> > There's also the issue that target_alignment is poly_uint64 but
> > misalignment is signed int.
> >
> > Note that can_div_trunc_p seems to require a poly_uint64 remainder,
> > I'm not sure what to do with that, so I used is_constant.
>
> Ah, yeah, forgot about that sorry. I guess in that case, using
> is_constant on first_dr->target_alignment and sticking with
> known_misalignment would make sense.
>
> > Btw, to what value do we want to align with variable sized vectors?
> > We are aligning globals according to target_alignment but only
> > support that for constant alignment. Most users only want to
> > distinguish between aligned or not aligned and the actual misalignment
> > is only used to seed the SSA alignment info, so I suppose being
> > too conservative in dr_misalignment for variable size vectors isn't
> > too bad if we correctly identify fully aligned accesses?
>
> In practice we don't require more than element alignment for VLA SVE
> as things stand. However, there's support for using fully-predicated
> loops in which the first loop iteration aligns by using leading
> inactive lanes where possible (e.g. start with 1 inactive lane
> for a misalignment of 1). In principle that would work for VLA too,
> we just haven't implemented it yet.
>
> > I'm now testing a variant that looks like
> >
> > int
> > dr_misalignment (dr_vec_info *dr_info)
> > {
> > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
> > {
> > dr_vec_info *first_dr
> > = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
> > int misalign = first_dr->misalignment;
> > gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> > if (misalign == DR_MISALIGNMENT_UNKNOWN)
> > return misalign;
> > /* vect_analyze_data_ref_accesses guarantees that DR_INIT are
> > INTEGER_CSTs and the first element in the group has the lowest
> > address. */
> > poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr))
> > - wi::to_poly_offset (DR_INIT
> > (first_dr->dr)));
> > gcc_assert (known_ge (diff, 0));
> > poly_uint64 mispoly = misalign + diff.force_uhwi ();
> > unsigned HOST_WIDE_INT quotient;
> > if (can_div_trunc_p (mispoly, first_dr->target_alignment, "ient,
> > &mispoly)
> > && mispoly.is_constant ())
> > return mispoly.to_constant ();
> > return DR_MISALIGNMENT_UNKNOWN;
> >
> > I wonder why a non-poly works for the quotient here. I suppose it's
> > because the runtime factor is the same for all poly-ints and thus
> > it cancels out? But then the remainder likely will never be constant
> > unless it is zero?
>
> It's not so much that the quotient is guaranteed to be constant,
> but that that particular overload is designed for the case in which
> the caller only cares about constant quotients, and wants can_div_trunc_p
> to fail otherwise. In other words, it's really just a way of cutting down
> on is_constant calls.
>
> poly-int.h doesn't provide every possible overload of can_div_trunc_p.
> In principle it could have a fully-general 4-poly_int version, but we
> haven't needed that yet. It could also have a new overload for the
> case we care about here, avoiding the separate is_constant.
Hmm. I guess I'll see to somehow unify this with the logic in
vect_compute_data_ref_alignment which at the moment forces
DR_MISALIGNMENT to unknown if DR_TARGET_ALIGNMENT of the vector type
isn't constant (so for SVE all accesses appear unaligned?). In the
end vect_compute_data_ref_alignment does
poly_int64 misalignment
= base_misalignment + wi::to_poly_offset (drb->init).force_shwi ();
/* If this is a backward running DR then first access in the larger
vectype actually is N-1 elements before the address in the DR.
Adjust misalign accordingly. */
if (tree_int_cst_sgn (drb->step) < 0)
/* PLUS because STEP is negative. */
misalignment += ((TYPE_VECTOR_SUBPARTS (vectype) - 1)
* -TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE
(vectype))));
unsigned int const_misalignment;
if (!known_misalignment (misalignment, vect_align_c,
&const_misalignment))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"Non-constant misalignment for access: %T\n",
ref);
return;
}
SET_DR_MISALIGNMENT (dr_info, const_misalignment);
ignoring the negative step offset (where poly-int TYPE_VECTOR_SUBPARTS
sneaks in) this combines integer base_misaligned with DR_INIT
(known INTEGER_CST) and then with the known constant vect_align_c.
So the most simplest version is
if (misalign == DR_MISALIGNMENT_UNKNOWN)
return misalign;
/* vect_analyze_data_ref_accesses guarantees that DR_INIT are
INTEGER_CSTs and the first element in the group has the lowest
address. Likewise vect_compute_data_ref_alignment will
have ensured that target_alignment is constant and otherwise
set misalign to DR_MISALIGNMENT_UNKNOWN. */
HOST_WIDE_INT diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr))
- TREE_INT_CST_LOW (DR_INIT (first_dr->dr)));
gcc_assert (diff >= 0);
unsigned HOST_WIDE_INT target_alignment_c
= first_dr->target_alignment.to_constant ();
return (misalign + diff) % target_alignment_c;
Note my intent is to lift the restriction of having only one
vector type for vectorizing a DR group and alignment is where
currently correctness issues pop up so in the end
dr_misalignment () would get a vectype parameter (and the
odd negative STEP handling would move to the caller, eventually
with another 'offset' parameter to dr_misalignment () / aligned_access_p
()).
So if you are fine with improving the situation for poly-ints
after all this then I'd go with the most simplest variant above.
OK?
Thanks,
Richard.