On Thu, Dec 14, 2017 at 12:43:11AM +0000, Jeff Law wrote: > On 11/22/2017 11:10 AM, Richard Sandiford wrote: > > Richard Sandiford <richard.sandif...@linaro.org> writes: > >> Two things stopped us using SLP reductions with variable-length vectors: > >> > >> (1) We didn't have a way of constructing the initial vector. > >> This patch does it by creating a vector full of the neutral > >> identity value and then using a shift-and-insert function > >> to insert any non-identity inputs into the low-numbered elements. > >> (The non-identity values are needed for double reductions.) > >> Alternatively, for unchained MIN/MAX reductions that have no neutral > >> value, we instead use the same duplicate-and-interleave approach as > >> for SLP constant and external definitions (added by a previous > >> patch). > >> > >> (2) The epilogue for constant-length vectors would extract the vector > >> elements associated with each SLP statement and do scalar arithmetic > >> on these individual elements. For variable-length vectors, the patch > >> instead creates a reduction vector for each SLP statement, replacing > >> the elements for other SLP statements with the identity value. > >> It then uses a hardware reduction instruction on each vector. > >> > >> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu > >> and powerpc64le-linux-gnu. > > > > Here's an updated version that applies on top of the recent > > removal of REDUC_*_EXPR. Tested as before. > > > > Thanks, > > Richard > > > > > > 2017-11-22 Richard Sandiford <richard.sandif...@linaro.org> > > Alan Hayward <alan.hayw...@arm.com> > > David Sherwood <david.sherw...@arm.com> > > > > gcc/ > > * doc/md.texi (vec_shl_insert_@var{m}): New optab. > > * internal-fn.def (VEC_SHL_INSERT): New internal function. > > * optabs.def (vec_shl_insert_optab): New optab. > > * tree-vectorizer.h (can_duplicate_and_interleave_p): Declare. > > (duplicate_and_interleave): Likewise. > > * tree-vect-loop.c: Include internal-fn.h. > > (neutral_op_for_slp_reduction): New function, split out from > > get_initial_defs_for_reduction. > > (get_initial_def_for_reduction): Handle option 2 for variable-length > > vectors by loading the neutral value into a vector and then shifting > > the initial value into element 0. > > (get_initial_defs_for_reduction): Replace the code argument with > > the neutral value calculated by neutral_op_for_slp_reduction. > > Use gimple_build_vector for constant-length vectors. > > Use IFN_VEC_SHL_INSERT for variable-length vectors if all > > but the first group_size elements have a neutral value. > > Use duplicate_and_interleave otherwise. > > (vect_create_epilog_for_reduction): Take a neutral_op parameter. > > Update call to get_initial_defs_for_reduction. Handle SLP > > reductions for variable-length vectors by creating one vector > > result for each scalar result, with the elements associated > > with other scalar results stubbed out with the neutral value. > > (vectorizable_reduction): Call neutral_op_for_slp_reduction. > > Require IFN_VEC_SHL_INSERT for double reductions on > > variable-length vectors, or SLP reductions that have > > a neutral value. Require can_duplicate_and_interleave_p > > support for variable-length unchained SLP reductions if there > > is no neutral value, such as for MIN/MAX reductions. Also require > > the number of vector elements to be a multiple of the number of > > SLP statements when doing variable-length unchained SLP reductions. > > Update call to vect_create_epilog_for_reduction. > > * tree-vect-slp.c (can_duplicate_and_interleave_p): Make public > > and remove initial values. > > (duplicate_and_interleave): Use IFN_VEC_SHL_INSERT for > > variable-length vectors if all but the first group_size elements > > have a neutral value. > > * config/aarch64/aarch64.md (UNSPEC_INSR): New unspec. > > * config/aarch64/aarch64-sve.md (vec_shl_insert_<mode>): New insn. > > > > gcc/testsuite/ > > * gcc.dg/vect/pr37027.c: Remove XFAIL for variable-length vectors. > > * gcc.dg/vect/pr67790.c: Likewise. > > * gcc.dg/vect/slp-reduc-1.c: Likewise. > > * gcc.dg/vect/slp-reduc-2.c: Likewise. > > * gcc.dg/vect/slp-reduc-3.c: Likewise. > > * gcc.dg/vect/slp-reduc-5.c: Likewise. > > * gcc.target/aarch64/sve_slp_5.c: New test. > > * gcc.target/aarch64/sve_slp_5_run.c: Likewise. > > * gcc.target/aarch64/sve_slp_6.c: Likewise. > > * gcc.target/aarch64/sve_slp_6_run.c: Likewise. > > * gcc.target/aarch64/sve_slp_7.c: Likewise. > > * gcc.target/aarch64/sve_slp_7_run.c: Likewise. > OK > jeff
As you explicitly asked on another thread, this is OK from an AArch64 maintainer too. James