[Bug tree-optimization/116760] [15 Regression] 6-11% slowdown of 416.gamess on AMD Zen3 and Zen4 since r15-3509-gd34cda72098867

cvs-commit at gcc dot gnu.org via Gcc-bugs Mon, 25 Nov 2024 06:54:25 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116760


--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rgue...@gcc.gnu.org>:

https://gcc.gnu.org/g:cd8db107b9bef73fd822ffb420f96ed2bc622a19

commit r15-5651-gcd8db107b9bef73fd822ffb420f96ed2bc622a19
Author: Richard Biener <rguent...@suse.de>
Date:   Mon Nov 25 13:32:15 2024 +0100

    target/116760 - 416.gamess slowdown with SLP

    For the TWOTFF loop vectorization the backend scales constructor
    and vector extract cost to make higher VFs less profitable.  This
    heuristic currently fails to consider VMAT_STRIDED_SLP which we
    now get with single-lane SLP, causing a huge regression in SPEC 2k6
    416.gamess for the respective loop nest.

    The following fixes this, matching behavior to that of GCC 14 by
    treating single-lane VMAT_STRIDED_SLP the same as VMAT_ELEMENTWISE.

            PR target/116760
            * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
            Scale vec_construct for single-lane VMAT_STRIDED_SLP the
            same as VMAT_ELEMENTWISE.
            * tree-vect-stmts.cc (vectorizable_store): Pass SLP node
            down to costing for vec_to_scalar for VMAT_STRIDED_SLP.

[Bug tree-optimization/116760] [15 Regression] 6-11% slowdown of 416.gamess on AMD Zen3 and Zen4 since r15-3509-gd34cda72098867

Reply via email to