Richard Biener <rguent...@suse.de> writes:
> This adds a missing check for the availability of intermediate vector
> types required to re-use the accumulator of a vectorized reduction
> in the vectorized epilogue.  For SVE and VNx2DF vs V2DF with
> -msve-vector-bits=512 for example V4DF is not available.
>
> In addition to that we have to verify the reduction operation is
> supported, otherwise we for example on i?86 get vector code that's
> later decomposed again by vector lowering when trying to use
> a V2HI epilogue for a V8HI reduction with a target without
> TARGET_MMX_WITH_SSE.
>
> It might be we want -Wvector-operation-performance for all vect.exp
> tests but that seems to have existing regressions.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

LGTM.  The earlier patch also passed testing on SVE FWIW.

Thanks,
Richard

>
> Thanks,
> Richard.
>
> 2022-01-19  Richard Biener  <rguent...@suse.de>
>
>       PR tree-optimization/104112
>       * tree-vect-loop.cc (vect_find_reusable_accumulator): Check
>       for required intermediate vector types.
>
>       * gcc.dg/vect/pr104112-1.c: New testcase.
>       * gcc.dg/vect/pr104112-2.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/pr104112-1.c | 18 ++++++++++++++++++
>  gcc/testsuite/gcc.dg/vect/pr104112-2.c | 11 +++++++++++
>  gcc/tree-vect-loop.cc                  | 15 ++++++++++++++-
>  3 files changed, 43 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr104112-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr104112-2.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr104112-1.c 
> b/gcc/testsuite/gcc.dg/vect/pr104112-1.c
> new file mode 100644
> index 00000000000..84e69b85170
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr104112-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Ofast" } */
> +/* { dg-additional-options "-march=armv8.2-a+sve -msve-vector-bits=512" { 
> target aarch64-*-* } } */
> +
> +void
> +boom(int n, double *a, double *x)
> +{
> +  int i, j;
> +  double temp;
> +
> +  for (j = n; j >= 1; --j)
> +    {
> +      temp = x[j];
> +      for (i = j - 1; i >= 1; --i)
> +     temp += a[i + j] * x[i];
> +      x[j] = temp;
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr104112-2.c 
> b/gcc/testsuite/gcc.dg/vect/pr104112-2.c
> new file mode 100644
> index 00000000000..7469b3c5d84
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr104112-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* Diagnose vector ops that are later decomposed.  */
> +/* { dg-additional-options "-Wvector-operation-performance" } */
> +
> +unsigned short foo (unsigned short *a, int n)
> +{
> +  unsigned short sum = 0;
> +  for (int i = 0; i < n; ++i)
> +    sum += a[i];
> +  return sum;
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 0fe3529b2d1..0b2785a5ed6 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -4979,9 +4979,22 @@ vect_find_reusable_accumulator (loop_vec_info 
> loop_vinfo,
>    /* Handle the case where we can reduce wider vectors to narrower ones.  */
>    tree vectype = STMT_VINFO_VECTYPE (reduc_info);
>    tree old_vectype = TREE_TYPE (accumulator->reduc_input);
> +  unsigned HOST_WIDE_INT m;
>    if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),
> -                         TYPE_VECTOR_SUBPARTS (vectype)))
> +                         TYPE_VECTOR_SUBPARTS (vectype), &m))
>      return false;
> +  /* Check the intermediate vector types are available.  */
> +  while (m > 2)
> +    {
> +      m /= 2;
> +      tree intermediate_vectype = get_related_vectype_for_scalar_type
> +     (TYPE_MODE (vectype), TREE_TYPE (vectype),
> +      exact_div (TYPE_VECTOR_SUBPARTS (old_vectype), m));
> +      if (!intermediate_vectype
> +       || !directly_supported_p (STMT_VINFO_REDUC_CODE (reduc_info),
> +                                 intermediate_vectype))
> +     return false;
> +    }
>  
>    /* Non-SLP reductions might apply an adjustment after the reduction
>       operation, in order to simplify the initialization of the accumulator.

Reply via email to