https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120752
Bug ID: 120752 Summary: 5% slowdown of 525.x264_r since r16-1346-gb0d50cbb42ab2c Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pheeck at gcc dot gnu.org CC: hubicka at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-pc-linux-gnu, aarch64-gnu-linux Target: x86_64-pc-linux-gnu, aarch64-gnu-linux As seen here https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=477.377.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=958.377.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=286.377.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=589.377.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=792.377.0 there was a 5% exec time slowdown of the 525.x264_r SPEC 2017 benchmark when run with -Ofast -march=native -flto -fprofile-use. I've seen it on x86_64 AMD, x86_64 Intel and Aarch64 machines. I bisected it to r16-1346-gb0d50cbb42ab2c. b0d50cbb42ab2ce5fab8a832cb82fc54b371c914 is the first bad commit commit b0d50cbb42ab2ce5fab8a832cb82fc54b371c914 Author: Jan Hubicka <hubi...@ucw.cz> Date: Fri Jun 6 17:57:00 2025 +0200 Fix profile updating in ipa-cp Bootstrapping with autoprofiledbootstrap, LTO and checking enables ICEs in WPA because we end up mixing local and IPA count in ipa-cp.cc:update_specialized_profile. This is because of missing call to profile_count::adjust_for_ipa_scaling. While looking into that I however noticed that the function forgets to update indirect call edges. This made me to commonize same logic which currently exists in clone_inlined_nodes, update_specialized_profile, update_profiling_info and update_counts_for_self_gen_clones. While testing it I noticed that we also ICE when linking with -fdump-ipa-all-details-blocks since IPA and local counts are temporarily mixed during IPA transformation stage, so I also added check to profile_count::dump to not crash and added verifier to gimple_verify_flow_info. Other problem I also noticed is that while profile updates done by inliner (via cgraph_node::clone) are correctly using global0 profiles instead of erasing profile completely when IPA counts drops to 0, the scaling in ipa-cp is not doing that, so we lose info and possibly some code quality. I will fix that incrementally. Similarly ipa-split, when offlining region with 0 entry count may re-do frequency propagation to get something useful. This is not a regression against GCC 15. See the comparison here: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1016.377.0&plot.1=1179.377.0&plot.2=958.377.0& Btw, if you look at the graphs, at some point we got better execution times than GCC 15 and this slowdown is just a rebound (sometimes not even entirely). Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)