https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120219
Bug ID: 120219
Summary: [16 Regression] ~11% slowdown of 548.exchange2_r on
AMD Zen
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: pheeck at gcc dot gnu.org
CC: pinskia at gcc dot gnu.org
Blocks: 26163
Target Milestone: ---
Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu
As seen for example here
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=298.407.0
there was a 11% exec time slowdown of the 548.exchange2_r SPEC 2017
benchmark when run with -O2 -march=generic -flto on an AMD Zen2 machine.
I bisected it to r16-448-g8335fd561fa823.
Author: Andrew Pinski <[email protected]>
Date: Sun May 4 19:24:09 2025 +0000
Loop-IM: Hoist (non-expensive) stmts to executed all loop when running
before PRE
While fixing up how rewrite_to_defined_overflow works,
gcc.dg/Wrestrict-22.c started
to fail. This is because `d p+ 2` would moved by LIM and then be rewritten
not using
pointer plus. The rewriting part is correct behavior. It only recently
started to be
moved out; due to r16-190-g6901d56fea2132.
Which has the following comment:
```
When we run before PRE and PRE is active hoist all expressions
since PRE would do so anyway and we can preserve range info
but PRE cannot.
```
This is not true if hoisting past the always executed point; so, instead of
hoisting
these statements all the way out of the max loops, take into account the
always executed
loop too.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-loop-im.cc (compute_invariantness): Hoist to the always
executed point
if ignorning the cost.
Signed-off-by: Andrew Pinski <[email protected]>
gcc/tree-ssa-loop-im.cc | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
This slowdown also happened on Zen3 and Zen4:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=470.407.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=957.407.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1103.407.0
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)