[Bug target/119723] New: 30% slowdown of 436.cactusADM on AMD Zen2 since r15-9204-g0520ef274762f1

pheeck at gcc dot gnu.org via Gcc-bugs Fri, 11 Apr 2025 04:23:07 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119723


            Bug ID: 119723
           Summary: 30% slowdown of 436.cactusADM on AMD Zen2 since
                    r15-9204-g0520ef274762f1
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pheeck at gcc dot gnu.org
                CC: jakub at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

As seen here

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=301.100.0

there was a 30% exec time slowdown of the 436.cactusADM SPEC 2006
benchmark when run with -Ofast -march=native on an AMD Zen2 machine.
I bisected it to r15-9204-g0520ef274762f1.

0520ef274762f100c7297efc4f230fcfc6486987 is the first bad commit
commit 0520ef274762f100c7297efc4f230fcfc6486987
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Fri Apr 4 20:07:37 2025 +0200

    rtlanal, i386: Adjust pattern_cost and x86 constant cost [PR115910]

    Below is an attempt to fix up RTX costing P1 caused by r15-775
    https://gcc.gnu.org/pipermail/gcc-patches/2024-May/thread.html#652446
    @@ -21562,7 +21562,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int
outer_code_i, int opno,
           if (x86_64_immediate_operand (x, VOIDmode))
            *total = 0;
          else
    -       *total = 1;
    +       /* movabsq is slightly more expensive than a simple instruction. */
    +       *total = COSTS_N_INSNS (1) + 1;
           return true;

         case CONST_DOUBLE:
    change.  In my understanding this was partially trying to workaround
    weird code in pattern_cost, which uses
      return cost > 0 ? cost : COSTS_N_INSNS (1);
    That doesn't make sense to me.  All costs smaller than COSTS_N_INSNS (1)
    mean we need to have at least one instruction there which has the
    COSTS_N_INSNS (1) minimal cost.  So special casing just cost 0 for the
    really cheap immediates which can be used pretty much everywhere but not
    ones which have just tiny bit larger cost than that (1, 2 or 3) is just
    weird.

    So, the following patch changes that to MAX (COSTS_N_INSNS (1), cost)
    which doesn't have this weird behavior where set_src_cost 0 is considered
    more expensive than set_src_cost 1.

    Note, pattern_cost isn't the only spot where costs are computed and
normally
    we often sum the subcosts of different parts of a pattern or just query
    rtx costs of different parts of subexpressions, so the jump from
    1 to 5 is quite significant.

    Additionally, x86_64 doesn't have just 2 kinds of constants with different
    costs, it has 3, signed 32-bit ones are the ones which can appear in
    almost all instructions and so using cost of 0 for those looks best,
    then unsigned 32-bit ones which can be done with still cheap movl
    instruction (and I think some others too) and finally full 64-bit ones
    which can be done only with a single movabsq instruction and are quite
    costly both in instruction size and even more expensive to execute.

    The following patch attempts to restore the behavior of GCC 14 with the
    pattern_cost hunk fixed for the unsigned 32-bit ones and only keeps the
    bigger cost for the 64-bit ones.

    2025-04-04  Jakub Jelinek  <ja...@redhat.com>

            PR target/115910
            * rtlanal.cc (pattern_cost): Return at least COSTS_N_INSNS (1)
            rather than just COSTS_N_INTNS (1) for cost <= 0.
            * config/i386/i386.cc (ix86_rtx_costs): Set *total to 1 for
            TARGET_64BIT x86_64_zext_immediate_operand constants.

            * gcc.target/i386/pr115910.c: New test.

 gcc/config/i386/i386.cc                  |  6 +++++-
 gcc/rtlanal.cc                           |  2 +-
 gcc/testsuite/gcc.target/i386/pr115910.c | 20 ++++++++++++++++++++
 3 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr115910.c


This is not a regression against GCC 14. See the comparison
here:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.8=1046.100.0&plot.9=301.100.0&;


There was also this 10% regression for -O2 -march=native

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=291.100.0


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/119723] New: 30% slowdown of 436.cactusADM on AMD Zen2 since r15-9204-g0520ef274762f1

Reply via email to