https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119495
Bug ID: 119495 Summary: 8% slowdown of 436.cactusADM on AMD Zen2 since r15-7895-gb191e8bdecf881 Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization, ra Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pheeck at gcc dot gnu.org CC: rsandifo at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux As seen here (the first spike of March): https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=290.100.0 there was an 8% slowdown of 436.cactusADM SPEC 2006 benchmark on AMD Zen2 when run with options -O2 -march=native -flto I've bisected this to r15-7895-gb191e8bdecf881 Author: Richard Sandiford <richard.sandif...@arm.com> Date: Thu Mar 6 11:06:25 2025 +0000 ira: Add new hooks for callee-save vs spills [PR117477] Following on from the discussion in: https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and replaces it with two hooks: one that controls the cost of using an extra callee-saved register and one that controls the cost of allocating a frame for the first spill. (The patch does not attempt to address the shrink-wrapping part of the thread above.) On AArch64, this is enough to fix PR117477, as verified by the new tests. The patch does not change the SPEC2017 scores significantly. (I saw a slight improvement in fotonik3d and roms, but I'm not convinced that the improvements are real.) The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c, which is a scan-dump correctness test that relies on not using caller saves. The decision to use caller saves looks appropriate, and saves an instruction, so I've just added -fno-caller-saves to the test options. The x86 parts were written by Honza. ix86_callee_save_cost is updated by H.J. to replace gcc_checking_assert with returning 1 if mem_cost <= 2. However: 1. This isn't a regression against GCC 14. Comparison here: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.8=1033.100.0&plot.9=290.100.0& 2. cactusADM performance is prone to change a lot with minor changes to how registers are allocated (I source this information from Richard Biener's comment in pr119044 :)). So my understanding is that this slowdown isn't really that important. However, it seemed reasonable to at least notify Richard Sandiford about this in case he wants to investigate it. Otherwise, I would be fine with closing this as WONTFIX or something like that. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)