https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120957
Bug ID: 120957 Summary: [16 Regression] 6-9% slowdown of 503.bwaves_r on Zen{2,3} since r16-1647-gc06979ff957485 Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pheeck at gcc dot gnu.org CC: liuhongt at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-pc-linux-gnu Target: x86_64-pc-linux-gnu As seen here https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.427.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.427.0 there was a 6% exec time slowdown of the 503.bwaves SPEC 2017 benchmark when run with -Ofast -march=native on an AMD Zen2 machine, 9% slowdown on an AMD Zen3 machine. I bisected this to r16-1647-gc06979ff957485 (2025-06-24). c06979ff95748559da0c2d3aa4eda9d5999eaaf6 is the first bad commit commit c06979ff95748559da0c2d3aa4eda9d5999eaaf6 Author: hongtao.liu <hongtao....@intel.com> Date: Wed Mar 5 12:25:32 2025 +0100 Don't duplicate setup code cost when do group-candidate cost calucalution. - /* Uses in a group can share setup code, so only add setup cost once. */ - cost -= cost.scratch; It looks like the original code took into account avoiding double counting, but unfortunately cost is reset inside the follow loop which invalidates the upper code, and makes same setup code cost duplicated in each use of the group. The patch fix the issue. It can also improve 548.exchange_r by 6% with -march=x86-64-v3 -O2 due to better ivopt on EMR. No big performance impact for SPEC2017 on graviton4/SPR with -mcpu=native -Ofast -fomit-framepointer -flto=auto. gcc/ChangeLog: PR target/115842 * tree-ssa-loop-ivopts.cc (determine_group_iv_cost_address): Don't recalculate inv_expr when group-candidate cost calucalution. gcc/tree-ssa-loop-ivopts.cc | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) bisect found first bad commit Connection to tiber.arch.suse.cz closed. This is a regression against GCC 15. See the comparison (Zen2) here: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1070.427.0&plot.1=1219.427.0&plot.2=295.427.0& Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)