https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119495

            Bug ID: 119495
           Summary: 8% slowdown of 436.cactusADM on AMD Zen2 since
                    r15-7895-gb191e8bdecf881
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization, ra
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pheeck at gcc dot gnu.org
                CC: rsandifo at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

As seen here (the first spike of March):

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=290.100.0

there was an 8% slowdown of 436.cactusADM SPEC 2006 benchmark on AMD Zen2 when
run with options -O2 -march=native -flto

I've bisected this to
r15-7895-gb191e8bdecf881

Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Thu Mar 6 11:06:25 2025 +0000

    ira: Add new hooks for callee-save vs spills [PR117477]

    Following on from the discussion in: 

      https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html

    this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and 
    replaces it with two hooks: one that controls the cost of using an
    extra callee-saved register and one that controls the cost of allocating
    a frame for the first spill.

    (The patch does not attempt to address the shrink-wrapping part of
    the thread above.)

    On AArch64, this is enough to fix PR117477, as verified by the new tests.
    The patch does not change the SPEC2017 scores significantly.  (I saw a
    slight improvement in fotonik3d and roms, but I'm not convinced that
    the improvements are real.)

    The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c,
    which is a scan-dump correctness test that relies on not using
    caller saves.  The decision to use caller saves looks appropriate,
    and saves an instruction, so I've just added -fno-caller-saves
    to the test options.

    The x86 parts were written by Honza.  ix86_callee_save_cost is updated
    by H.J. to replace gcc_checking_assert with returning 1 if mem_cost <= 2.

However:

1. This isn't a regression against GCC 14.  Comparison here:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.8=1033.100.0&plot.9=290.100.0&;

2. cactusADM performance is prone to change a lot with minor changes to how
registers are allocated (I source this information from Richard Biener's
comment in pr119044 :)).

So my understanding is that this slowdown isn't really that important. 
However, it seemed reasonable to at least notify Richard Sandiford about this
in case he wants to investigate it.  Otherwise, I would be fine with closing
this as WONTFIX or something like that.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

Reply via email to