[Bug tree-optimization/120219] New: [16 Regression] ~11% slowdown of 548.exchange2_r on AMD Zen

pheeck at gcc dot gnu.org via Gcc-bugs Sun, 11 May 2025 04:00:11 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120219


            Bug ID: 120219
           Summary: [16 Regression] ~11% slowdown of 548.exchange2_r on
                    AMD Zen
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pheeck at gcc dot gnu.org
                CC: pinskia at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-pc-linux-gnu
            Target: x86_64-pc-linux-gnu

As seen for example here

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=298.407.0

there was a 11% exec time slowdown of the 548.exchange2_r SPEC 2017
benchmark when run with -O2 -march=generic -flto on an AMD Zen2 machine.
I bisected it to r16-448-g8335fd561fa823.

Author: Andrew Pinski <quic_apin...@quicinc.com>
Date:   Sun May 4 19:24:09 2025 +0000

    Loop-IM: Hoist (non-expensive) stmts to executed all loop when running
before PRE

    While fixing up how rewrite_to_defined_overflow works,
gcc.dg/Wrestrict-22.c started
    to fail. This is because `d p+ 2` would moved by LIM and then be rewritten
not using
    pointer plus. The rewriting part is correct behavior. It only recently
started to be
    moved out; due to r16-190-g6901d56fea2132.
    Which has the following comment:
    ```
    When we run before PRE and PRE is active hoist all expressions
    since PRE would do so anyway and we can preserve range info
    but PRE cannot.
    ```
    This is not true if hoisting past the always executed point; so, instead of
hoisting
    these statements all the way out of the max loops, take into account the
always executed
    loop too.

    Bootstrapped and tested on x86_64-linux-gnu.

    gcc/ChangeLog:

            * tree-ssa-loop-im.cc (compute_invariantness): Hoist to the always
executed point
            if ignorning the cost.

    Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com>

 gcc/tree-ssa-loop-im.cc | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)


This slowdown also happened on Zen3 and Zen4:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=470.407.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=957.407.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1103.407.0


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug tree-optimization/120219] New: [16 Regression] ~11% slowdown of 548.exchange2_r on AMD Zen

Reply via email to