Richard Biener <rguent...@suse.de> writes: > The following fixes bad alignment computaton for epilog vectorization > when as in this case for 510.parest_r and masked epilog vectorization > with AVX512 we end up choosing AVX to vectorize the main loop and > masked AVX512 (sic!) to vectorize the epilog. In that case alignment > analysis for the epilog tries to force alignment of the base to 64, > but that cannot possibly help the epilog when the main loop had used > a vector mode with smaller alignment requirement. > > There's another issue, that the check whether the step preserves > alignment needs to consider possibly previously involved VFs > (here, the main loops smaller VF) as well. > > These might not be the only case with problems for such a mode mix > but at least there it seems wise to never use DR alignment forcing > when analyzing an epilog. > > We get to chose this mode setup because the iteration over epilog > modes doesn't prevent this, the maybe_ge (cached_vf_per_mode[0], > first_vinfo_vf) skip is conditional on !supports_partial_vectors > and it is also conditional on having a cached VF. Further nothing > in vect_analyze_loop_1 rejects this setup - it might be conceivable > that a target can do masking only for larger modes. There is a > second reason we end up with this mode setup, which is that > vect_need_peeling_or_partial_vectors_p says we do not need > peeling or partial vectors when analyzing the main loop with > AVX512 (if it would say so we'd have chosen a masked AVX512 > epilog-only vectorization). It does that because it looks at > LOOP_VINFO_COST_MODEL_THRESHOLD (which is not yet computed, so > always zero at this point), and compares max_niter (5) against > the VF (8), but not with equality as the comment says but with > greater. This also needs looking at, PR120939. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > PR tree-optimization/120927 > * tree-vect-data-refs.cc (vect_compute_data_ref_alignment): > Do not force a DRs base alignment when analyzing an > epilog loop. Check whether the step preserves alignment > for all VFs possibly involved sofar. > > * gcc.dg/vect/vect-pr120927.c: New testcase.
LGTM FWIW. Richard > --- > gcc/testsuite/gcc.dg/vect/vect-pr120927.c | 24 +++++++++++++++++++++++ > gcc/tree-vect-data-refs.cc | 16 +++++++++++---- > 2 files changed, 36 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-pr120927.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-pr120927.c > b/gcc/testsuite/gcc.dg/vect/vect-pr120927.c > new file mode 100644 > index 00000000000..793593f758f > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-pr120927.c > @@ -0,0 +1,24 @@ > +/* { dg-additional-options "--param vect-partial-vector-usage=1" } */ > +/* { dg-additional-options "-mavx512bw -mavx512vl" { target avx512f_runtime > } } */ > + > +#include "tree-vect.h" > + > +static const double a[] = { 1., 2., 3., 4., 5. }; > + > +void __attribute__((noipa)) > +foo (double *b, double *bp, double c, int n) > +{ > + for (int i = 0; i < n; ++i) > + b[i] = bp[i] = a[i] * c; > +} > + > +int main() > +{ > + double b[6], bp[6]; > + b[5] = bp[5] = 13.; > + check_vect (); > + foo (b, bp, 3., 5); > + if (b[5] != 13. || bp[5] != 13.) > + abort (); > + return 0; > +} > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > index ee040eb9888..c84cd29116e 100644 > --- a/gcc/tree-vect-data-refs.cc > +++ b/gcc/tree-vect-data-refs.cc > @@ -1501,10 +1501,17 @@ vect_compute_data_ref_alignment (vec_info *vinfo, > dr_vec_info *dr_info, > /* We can only use base and misalignment information relative to > an innermost loop if the misalignment stays the same throughout the > execution of the loop. As above, this is the case if the stride of > - the dataref evenly divides by the alignment. */ > - poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > - step_preserves_misalignment_p > - = multiple_p (drb->step_alignment * vf, vect_align_c); > + the dataref evenly divides by the alignment. Make sure to check > + previous epilogues and the main loop. */ > + step_preserves_misalignment_p = true; > + auto lvinfo = loop_vinfo; > + while (lvinfo) > + { > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (lvinfo); > + step_preserves_misalignment_p > + &= multiple_p (drb->step_alignment * vf, vect_align_c); > + lvinfo = LOOP_VINFO_ORIG_LOOP_INFO (lvinfo); > + } > > if (!step_preserves_misalignment_p && dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -1571,6 +1578,7 @@ vect_compute_data_ref_alignment (vec_info *vinfo, > dr_vec_info *dr_info, > unsigned int max_alignment; > tree base = get_base_for_alignment (drb->base_address, &max_alignment); > if (max_alignment < vect_align_c > + || (loop_vinfo && LOOP_VINFO_EPILOGUE_P (loop_vinfo)) > || !vect_can_force_dr_alignment_p (base, > vect_align_c * BITS_PER_UNIT)) > {