https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120751
Bug ID: 120751 Summary: [16 Regression] 10-15% slowdown of 454.calculix on Zen4 and Zen5 since r16-1001-g0291f53f8d2343 Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pheeck at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-pc-linux-gnu Target: x86_64-pc-linux-gnu As seen here https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1101.170.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1240.170.0 there was a 10-15% exec time slowdown of 454.calculix SPEC 2006 benchmark when run with -O2 -march=x86-64-v3 (or -march=native) -flto on a Zen4/Zen5 machine. I bisected it to r16-1001-g0291f53f8d2343. 0291f53f8d2343ca0d39589ebffc31d9c328d6ab is the first bad commit commit 0291f53f8d2343ca0d39589ebffc31d9c328d6ab Author: Richard Biener <rguent...@suse.de> Date: Fri May 30 08:54:10 2025 +0200 tree-optimization/120457 - avoid lowering of some single-element interleave The following makes sure we are not lowering single-element interleaving schemes in a way that defeats load vectorizing later but allows the VMAT_ELEMENTWISE fallback to be used. PR tree-optimization/120457 * tree-vect-slp.cc (vect_lower_load_permutations): Implement the same heuristics as load vectorization for single-element interleaving that spans multiple vectors. gcc/tree-vect-slp.cc | 9 +++++++++ 1 file changed, 9 insertions(+) This is a regression against GCC 15. See the comparison here: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1104.170.0&plot.1=1144.170.0&plot.2=1101.170.0& Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)