On Tue, Jan 28, 2025 at 10:05 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> vectorizable_slp_permutation_1 has two ways of generating the
> permutations: one that looks for repeating patterns and one that
> calculates the permutation index for every output element individually.
> The former works for VLA and VLS whereas the latter only works for VLS.
>
> There are two justifications for using the repeating code for VLS:
> it gives more testing coverage, and it should reduce the analysis
> overhead for common cases.  This PR kind-of demonstrates both:
> the VLS coverage was showing a bug in the analysis shortcut.
>
> The bug seems to go back to g:ab7e60cec1a6, which added the
> repeating_p path.  It generated N copies of the permutation vector
> in the repeating case, but didn't multiply the number of permutation
> instructions for costing purposes by N.  So we seem to have been
> undercounting ncopies>1 permutations all this time...
>
> The problem became more visible with g:8157f3f2d211, which extended
> the repeating code to handle more cases.
>
> In the patch, I think noutputs is in practice always a multiple
> of unpack_factor, but it seemed more future-proof to handle the
> general case.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  Also tested by
> making sure that g:8157f3f2d211 with this patch added didn't
> change the imagick code for -O3 -mcpu=generic on aarch64
> compared to g:8157f3f2d211~.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
>         PR tree-optimization/117270
>         * tree-vect-slp.cc (vectorizable_slp_permutation_1): Make nperms
>         account for the number of times that each permutation will be used
>         during transformation.
> ---
>  gcc/tree-vect-slp.cc | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 115b7b73eee..ac1733004b6 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -10884,9 +10884,15 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
> gimple_stmt_iterator *gsi,
>       vectors to check during analysis, but we need to generate NOUTPUTS
>       vectors during transformation.  */
>    unsigned total_nelts = olanes;
> -  if (repeating_p && gsi)
> -    total_nelts = (total_nelts / unpack_factor) * noutputs;
> -  for (unsigned i = 0; i < total_nelts; ++i)
> +  unsigned process_nelts = olanes;
> +  if (repeating_p)
> +    {
> +      total_nelts = (total_nelts / unpack_factor) * noutputs;
> +      if (gsi)
> +       process_nelts = total_nelts;
> +    }
> +  unsigned last_ei = (total_nelts - 1) % process_nelts;
> +  for (unsigned i = 0; i < process_nelts; ++i)
>      {
>        /* VI is the input vector index when generating code for REPEATING_P.  
> */
>        unsigned vi = i / olanes * (pack_p ? 2 : 1);
> @@ -10960,7 +10966,7 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
> gimple_stmt_iterator *gsi,
>             }
>
>           if (!identity_p)
> -           nperms++;
> +           nperms += CEIL (total_nelts, process_nelts) - (ei > last_ei);
>           if (gsi)
>             {
>               if (second_vec.first == -1U)
> --
> 2.25.1
>

Reply via email to