https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116760
--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Richard Biener <rgue...@gcc.gnu.org>: https://gcc.gnu.org/g:cd8db107b9bef73fd822ffb420f96ed2bc622a19 commit r15-5651-gcd8db107b9bef73fd822ffb420f96ed2bc622a19 Author: Richard Biener <rguent...@suse.de> Date: Mon Nov 25 13:32:15 2024 +0100 target/116760 - 416.gamess slowdown with SLP For the TWOTFF loop vectorization the backend scales constructor and vector extract cost to make higher VFs less profitable. This heuristic currently fails to consider VMAT_STRIDED_SLP which we now get with single-lane SLP, causing a huge regression in SPEC 2k6 416.gamess for the respective loop nest. The following fixes this, matching behavior to that of GCC 14 by treating single-lane VMAT_STRIDED_SLP the same as VMAT_ELEMENTWISE. PR target/116760 * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Scale vec_construct for single-lane VMAT_STRIDED_SLP the same as VMAT_ELEMENTWISE. * tree-vect-stmts.cc (vectorizable_store): Pass SLP node down to costing for vec_to_scalar for VMAT_STRIDED_SLP.