On 11/17/2017 08:29 AM, Richard Sandiford wrote: > This patch uses SVE CLASTB to optimise conditional reductions. It means > that we no longer need to maintain a separate index vector to record > the most recent valid value, and no longer need to worry about overflow > cases. > > Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu > and powerpc64le-linux-gnu. OK to install? > > Richard > > > 2017-11-17 Richard Sandiford <richard.sandif...@linaro.org> > Alan Hayward <alan.hayw...@arm.com> > David Sherwood <david.sherw...@arm.com> > > gcc/ > * doc/md.texi (fold_extract_last_@var{m}): Document. > * doc/sourcebuild.texi (vect_fold_extract_last): Likewise. > * optabs.def (fold_extract_last_optab): New optab. > * internal-fn.def (FOLD_EXTRACT_LAST): New internal function. > * internal-fn.c (fold_extract_direct): New macro. > (expand_fold_extract_optab_fn): Likewise. > (direct_fold_extract_optab_supported_p): Likewise. > * tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type. > * tree-vect-loop.c (vect_model_reduction_cost): Handle > EXTRACT_LAST_REDUCTION. > (get_initial_def_for_reduction): Do not create an initial vector > for EXTRACT_LAST_REDUCTION reductions. > (vectorizable_reduction): Leave the scalar phi in place for > EXTRACT_LAST_REDUCTIONs. Try using EXTRACT_LAST_REDUCTION > ahead of INTEGER_INDUC_COND_REDUCTION. Do not check for an > epilogue code for EXTRACT_LAST_REDUCTION and defer the > transform phase to vectorizable_condition. > * tree-vect-stmts.c (vect_finish_stmt_generation_1): New function, > split out from... > (vect_finish_stmt_generation): ...here. > (vect_finish_replace_stmt): New function. > (vectorizable_condition): Handle EXTRACT_LAST_REDUCTION. > * config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New > pattern. > * config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec. > > gcc/testsuite/ > * lib/target-supports.exp > (check_effective_target_vect_fold_extract_last): New proc. > * gcc.dg/vect/pr65947-1.c: Update dump messages. Add markup > for fold_extract_last. > * gcc.dg/vect/pr65947-2.c: Likewise. > * gcc.dg/vect/pr65947-3.c: Likewise. > * gcc.dg/vect/pr65947-4.c: Likewise. > * gcc.dg/vect/pr65947-5.c: Likewise. > * gcc.dg/vect/pr65947-6.c: Likewise. > * gcc.dg/vect/pr65947-9.c: Likewise. > * gcc.dg/vect/pr65947-10.c: Likewise. > * gcc.dg/vect/pr65947-12.c: Likewise. > * gcc.dg/vect/pr65947-13.c: Likewise. > * gcc.dg/vect/pr65947-14.c: Likewise. > * gcc.target/aarch64/sve_clastb_1.c: New test. > * gcc.target/aarch64/sve_clastb_1_run.c: Likewise. > * gcc.target/aarch64/sve_clastb_2.c: Likewise. > * gcc.target/aarch64/sve_clastb_2_run.c: Likewise. > * gcc.target/aarch64/sve_clastb_3.c: Likewise. > * gcc.target/aarch64/sve_clastb_3_run.c: Likewise. > * gcc.target/aarch64/sve_clastb_4.c: Likewise. > * gcc.target/aarch64/sve_clastb_4_run.c: Likewise. > * gcc.target/aarch64/sve_clastb_5.c: Likewise. > * gcc.target/aarch64/sve_clastb_5_run.c: Likewise. > * gcc.target/aarch64/sve_clastb_6.c: Likewise. > * gcc.target/aarch64/sve_clastb_6_run.c: Likewise. > * gcc.target/aarch64/sve_clastb_7.c: Likewise. > * gcc.target/aarch64/sve_clastb_7_run.c: Likewise. LIke some of the other patches, I focused just on the generic bits and did not look at the aarch64 target bits. The generic bits are OK.
jeff