On Tue, Jan 28, 2025 at 10:05 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > vectorizable_slp_permutation_1 has two ways of generating the > permutations: one that looks for repeating patterns and one that > calculates the permutation index for every output element individually. > The former works for VLA and VLS whereas the latter only works for VLS. > > There are two justifications for using the repeating code for VLS: > it gives more testing coverage, and it should reduce the analysis > overhead for common cases. This PR kind-of demonstrates both: > the VLS coverage was showing a bug in the analysis shortcut. > > The bug seems to go back to g:ab7e60cec1a6, which added the > repeating_p path. It generated N copies of the permutation vector > in the repeating case, but didn't multiply the number of permutation > instructions for costing purposes by N. So we seem to have been > undercounting ncopies>1 permutations all this time... > > The problem became more visible with g:8157f3f2d211, which extended > the repeating code to handle more cases. > > In the patch, I think noutputs is in practice always a multiple > of unpack_factor, but it seemed more future-proof to handle the > general case. > > Tested on aarch64-linux-gnu & x86_64-linux-gnu. Also tested by > making sure that g:8157f3f2d211 with this patch added didn't > change the imagick code for -O3 -mcpu=generic on aarch64 > compared to g:8157f3f2d211~. OK to install?
OK. Thanks, Richard. > Richard > > > gcc/ > PR tree-optimization/117270 > * tree-vect-slp.cc (vectorizable_slp_permutation_1): Make nperms > account for the number of times that each permutation will be used > during transformation. > --- > gcc/tree-vect-slp.cc | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 115b7b73eee..ac1733004b6 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -10884,9 +10884,15 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, > gimple_stmt_iterator *gsi, > vectors to check during analysis, but we need to generate NOUTPUTS > vectors during transformation. */ > unsigned total_nelts = olanes; > - if (repeating_p && gsi) > - total_nelts = (total_nelts / unpack_factor) * noutputs; > - for (unsigned i = 0; i < total_nelts; ++i) > + unsigned process_nelts = olanes; > + if (repeating_p) > + { > + total_nelts = (total_nelts / unpack_factor) * noutputs; > + if (gsi) > + process_nelts = total_nelts; > + } > + unsigned last_ei = (total_nelts - 1) % process_nelts; > + for (unsigned i = 0; i < process_nelts; ++i) > { > /* VI is the input vector index when generating code for REPEATING_P. > */ > unsigned vi = i / olanes * (pack_p ? 2 : 1); > @@ -10960,7 +10966,7 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, > gimple_stmt_iterator *gsi, > } > > if (!identity_p) > - nperms++; > + nperms += CEIL (total_nelts, process_nelts) - (ei > last_ei); > if (gsi) > { > if (second_vec.first == -1U) > -- > 2.25.1 >