https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120219
Bug ID: 120219 Summary: [16 Regression] ~11% slowdown of 548.exchange2_r on AMD Zen Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pheeck at gcc dot gnu.org CC: pinskia at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-pc-linux-gnu Target: x86_64-pc-linux-gnu As seen for example here https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=298.407.0 there was a 11% exec time slowdown of the 548.exchange2_r SPEC 2017 benchmark when run with -O2 -march=generic -flto on an AMD Zen2 machine. I bisected it to r16-448-g8335fd561fa823. Author: Andrew Pinski <quic_apin...@quicinc.com> Date: Sun May 4 19:24:09 2025 +0000 Loop-IM: Hoist (non-expensive) stmts to executed all loop when running before PRE While fixing up how rewrite_to_defined_overflow works, gcc.dg/Wrestrict-22.c started to fail. This is because `d p+ 2` would moved by LIM and then be rewritten not using pointer plus. The rewriting part is correct behavior. It only recently started to be moved out; due to r16-190-g6901d56fea2132. Which has the following comment: ``` When we run before PRE and PRE is active hoist all expressions since PRE would do so anyway and we can preserve range info but PRE cannot. ``` This is not true if hoisting past the always executed point; so, instead of hoisting these statements all the way out of the max loops, take into account the always executed loop too. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-loop-im.cc (compute_invariantness): Hoist to the always executed point if ignorning the cost. Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com> gcc/tree-ssa-loop-im.cc | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) This slowdown also happened on Zen3 and Zen4: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=470.407.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=957.407.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1103.407.0 Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)