Re: [PATCH] tree-optimization/120927 - 510.parest_r segfault with masked epilog

Richard Sandiford Thu, 03 Jul 2025 07:25:07 -0700

Richard Biener <rguent...@suse.de> writes:
> The following fixes bad alignment computaton for epilog vectorization
> when as in this case for 510.parest_r and masked epilog vectorization
> with AVX512 we end up choosing AVX to vectorize the main loop and
> masked AVX512 (sic!) to vectorize the epilog.  In that case alignment
> analysis for the epilog tries to force alignment of the base to 64,
> but that cannot possibly help the epilog when the main loop had used
> a vector mode with smaller alignment requirement.
>
> There's another issue, that the check whether the step preserves
> alignment needs to consider possibly previously involved VFs
> (here, the main loops smaller VF) as well.
>
> These might not be the only case with problems for such a mode mix
> but at least there it seems wise to never use DR alignment forcing
> when analyzing an epilog.
>
> We get to chose this mode setup because the iteration over epilog
> modes doesn't prevent this, the maybe_ge (cached_vf_per_mode[0],
> first_vinfo_vf) skip is conditional on !supports_partial_vectors
> and it is also conditional on having a cached VF.  Further nothing
> in vect_analyze_loop_1 rejects this setup - it might be conceivable
> that a target can do masking only for larger modes.  There is a
> second reason we end up with this mode setup, which is that
> vect_need_peeling_or_partial_vectors_p says we do not need
> peeling or partial vectors when analyzing the main loop with
> AVX512 (if it would say so we'd have chosen a masked AVX512
> epilog-only vectorization).  It does that because it looks at
> LOOP_VINFO_COST_MODEL_THRESHOLD (which is not yet computed, so
> always zero at this point), and compares max_niter (5) against
> the VF (8), but not with equality as the comment says but with
> greater.  This also needs looking at, PR120939.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
>       PR tree-optimization/120927
>       * tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
>       Do not force a DRs base alignment when analyzing an
>       epilog loop.  Check whether the step preserves alignment
>       for all VFs possibly involved sofar.
>
>       * gcc.dg/vect/vect-pr120927.c: New testcase.


LGTM FWIW.

Richard

> ---
>  gcc/testsuite/gcc.dg/vect/vect-pr120927.c | 24 +++++++++++++++++++++++
>  gcc/tree-vect-data-refs.cc                | 16 +++++++++++----
>  2 files changed, 36 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-pr120927.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-pr120927.c 
> b/gcc/testsuite/gcc.dg/vect/vect-pr120927.c
> new file mode 100644
> index 00000000000..793593f758f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-pr120927.c
> @@ -0,0 +1,24 @@
> +/* { dg-additional-options "--param vect-partial-vector-usage=1" } */
> +/* { dg-additional-options "-mavx512bw -mavx512vl" { target avx512f_runtime 
> } } */
> +
> +#include "tree-vect.h"
> +
> +static const double a[] = { 1., 2., 3., 4., 5. };
> +
> +void __attribute__((noipa))
> +foo (double *b, double *bp, double c, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    b[i] = bp[i] = a[i] * c;
> +}
> +
> +int main()
> +{
> +  double b[6], bp[6];
> +  b[5] = bp[5] = 13.;
> +  check_vect ();
> +  foo (b, bp, 3., 5);
> +  if (b[5] != 13. || bp[5] != 13.)
> +    abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index ee040eb9888..c84cd29116e 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -1501,10 +1501,17 @@ vect_compute_data_ref_alignment (vec_info *vinfo, 
> dr_vec_info *dr_info,
>        /* We can only use base and misalignment information relative to
>        an innermost loop if the misalignment stays the same throughout the
>        execution of the loop.  As above, this is the case if the stride of
> -      the dataref evenly divides by the alignment.  */
> -      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> -      step_preserves_misalignment_p
> -     = multiple_p (drb->step_alignment * vf, vect_align_c);
> +      the dataref evenly divides by the alignment.  Make sure to check
> +      previous epilogues and the main loop.  */
> +      step_preserves_misalignment_p = true;
> +      auto lvinfo = loop_vinfo;
> +      while (lvinfo)
> +     {
> +       poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (lvinfo);
> +       step_preserves_misalignment_p
> +         &= multiple_p (drb->step_alignment * vf, vect_align_c);
> +       lvinfo = LOOP_VINFO_ORIG_LOOP_INFO (lvinfo);
> +     }
>  
>        if (!step_preserves_misalignment_p && dump_enabled_p ())
>       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -1571,6 +1578,7 @@ vect_compute_data_ref_alignment (vec_info *vinfo, 
> dr_vec_info *dr_info,
>        unsigned int max_alignment;
>        tree base = get_base_for_alignment (drb->base_address, &max_alignment);
>        if (max_alignment < vect_align_c
> +       || (loop_vinfo && LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>         || !vect_can_force_dr_alignment_p (base,
>                                            vect_align_c * BITS_PER_UNIT))
>       {

Re: [PATCH] tree-optimization/120927 - 510.parest_r segfault with masked epilog

Reply via email to