https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116901
--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>: https://gcc.gnu.org/g:855b61b61e63b17cc9770cbe1c5387e4f59c1ffe commit r15-7985-g855b61b61e63b17cc9770cbe1c5387e4f59c1ffe Author: Richard Sandiford <richard.sandif...@arm.com> Date: Wed Mar 12 09:40:10 2025 +0000 vect: Fix ncopies when costing SLP reductions [PR116901] pr110625_[24].c started failing after r15-1329-gd66b820f392aa9a7, which switched to single def-use cycles for single-lane SLP. The problem is that we only costed one vector accumulator operation for an N-vector cycle. The problem seems to have been latent, and meant that we also only costed one FADDA for reduc_strict_4.c and reduc_strict_5.c, even though they need 4 and 6 FADDAs respectively. I'm not sure why: if ((double_reduc || reduction_type != TREE_CODE_REDUCTION) && ncopies > 1) was previously only necessary for non-SLP, but the patch preserves that for safety. gcc/ PR tree-optimization/116901 * tree-vect-loop.cc (vectorizable_reduction): Set ncopies to SLP_TREE_NUMBER_OF_VEC_STMTS for SLP. gcc/testsuite/ PR tree-optimization/116901 * gcc.target/aarch64/sve/reduc_strict_4.c: Turn off costing. * gcc.target/aarch64/sve/reduc_strict_5.c: Likewise.