https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941

            Bug ID: 120941
           Summary: [16 Regression] 20-40% slowdown of 519.lbm_r on Zen2
                    since r16-1644-gaba3b9d3a48a07
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pheeck at gcc dot gnu.org
                CC: hjl at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

As seen here

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.477.0

there was a 40% exec time slowdown (on another machine I measured only 24%) of
527.cam4_r SPEC 2017 benchmark when run with -Ofast -march=native -flto on an
AMD Zen2 machine.
I bisected it to r16-1644-gaba3b9d3a48a07.

aba3b9d3a48a0703fd565f7c5f0caf604f59970b is the first bad commit
commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b
Author: H.J. Lu <hjl.to...@gmail.com>
Date:   Fri May 9 07:17:07 2025 +0800

    x86: Extend the remove_redundant_vector pass

    Extend the remove_redundant_vector pass to handle vector broadcasts from
    constant and variable scalars.  When broadcasting from constants and
    function arguments, we can place a single widest vector broadcast at
    entry of the nearest common dominator for basic blocks with all uses
    since constants and function arguments aren't changed.  For broadcast
    from variables with a single definition, the single definition is
    replaced with the widest broadcast.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

Reply via email to