https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941
Bug ID: 120941 Summary: [16 Regression] 20-40% slowdown of 519.lbm_r on Zen2 since r16-1644-gaba3b9d3a48a07 Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pheeck at gcc dot gnu.org CC: hjl at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux As seen here https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.477.0 there was a 40% exec time slowdown (on another machine I measured only 24%) of 527.cam4_r SPEC 2017 benchmark when run with -Ofast -march=native -flto on an AMD Zen2 machine. I bisected it to r16-1644-gaba3b9d3a48a07. aba3b9d3a48a0703fd565f7c5f0caf604f59970b is the first bad commit commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b Author: H.J. Lu <hjl.to...@gmail.com> Date: Fri May 9 07:17:07 2025 +0800 x86: Extend the remove_redundant_vector pass Extend the remove_redundant_vector pass to handle vector broadcasts from constant and variable scalars. When broadcasting from constants and function arguments, we can place a single widest vector broadcast at entry of the nearest common dominator for basic blocks with all uses since constants and function arguments aren't changed. For broadcast from variables with a single definition, the single definition is replaced with the widest broadcast. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)