https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116901

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>:

https://gcc.gnu.org/g:855b61b61e63b17cc9770cbe1c5387e4f59c1ffe

commit r15-7985-g855b61b61e63b17cc9770cbe1c5387e4f59c1ffe
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Wed Mar 12 09:40:10 2025 +0000

    vect: Fix ncopies when costing SLP reductions [PR116901]

    pr110625_[24].c started failing after r15-1329-gd66b820f392aa9a7,
    which switched to single def-use cycles for single-lane SLP.
    The problem is that we only costed one vector accumulator
    operation for an N-vector cycle.

    The problem seems to have been latent, and meant that we also
    only costed one FADDA for reduc_strict_4.c and reduc_strict_5.c,
    even though they need 4 and 6 FADDAs respectively.

    I'm not sure why:

       if ((double_reduc || reduction_type != TREE_CODE_REDUCTION)
           && ncopies > 1)

    was previously only necessary for non-SLP, but the patch preserves
    that for safety.

    gcc/
            PR tree-optimization/116901
            * tree-vect-loop.cc (vectorizable_reduction): Set ncopies to
            SLP_TREE_NUMBER_OF_VEC_STMTS for SLP.

    gcc/testsuite/
            PR tree-optimization/116901
            * gcc.target/aarch64/sve/reduc_strict_4.c: Turn off costing.
            * gcc.target/aarch64/sve/reduc_strict_5.c: Likewise.

Reply via email to