https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90151
Bug ID: 90151 Summary: 554.roms_r regression on x86_64 at -O2 and generic march/mtune Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux My apologies for discovering and reporting this so late, but my measurements show that 554.roms_r at -O2 and with generic x86_64 march/mtune regressed in performance when compiled with GCC 8 or GCC 9 compared to GCC 7. On an Intel Sandy Bridge I see 4% slowdown, on an AMD Zen CPU I see 6.8%, LNT reports almost 4% on Kaby Lake and 5.5% on Zen: https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch I have bisected this down to r254885 which is Honza's update of profile info in IPA-CP: Author: hubicka Date: Fri Nov 17 17:41:10 2017 New Revision: 254885 URL: https://gcc.gnu.org/viewcvs?rev=254885&root=gcc&view=rev Log: * ipa-cp.c (update_profiling_info): Handle conversion to local profile. * tree-cfg.c (execute_fixup_cfg): Do fixup same way as inliner does. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-cp.c trunk/gcc/tree-cfg.c I compared the number of samples in the hottest functions and arrived at the following table: | Object | Function | r254884 | r254885 | diff | % | |--------------+------------------------------------+---------+---------+-------+--------| | roms_r_base | __step2d_mod_MOD_step2d_tile | 407180 | 409257 | 2077 | +0.51 | | roms_r_base | __pre_step3d_mod_MOD_pre_step3d | 118119 | 134705 | 16586 | +14.04 | | roms_r_base | __t3dmix_mod_MOD_t3dmix2 | 85499 | 101650 | 16151 | +18.89 | | roms_r_base | __step3d_t_mod_MOD_step3d_t | 85003 | 104599 | 19596 | +23.05 | | roms_r_base | __rho_eos_mod_MOD_rho_eos_tile | 74216 | 74746 | 530 | +0.71 | | roms_r_base | __step3d_uv_mod_MOD_step3d_uv_tile | 66393 | 67117 | 724 | +1.09 | | roms_r_base | __rhs3d_mod_MOD_rhs3d | 62354 | 73321 | 10967 | +17.59 | | roms_r_base | __lmd_skpp_mod_MOD_lmd_skpp | 59767 | 70798 | 11031 | +18.46 | | libm-2.29.so | __ieee754_exp_fma | 54324 | 56546 | 2222 | +4.09 | | roms_r_base | __prsgrd_mod_MOD_prsgrd | 48439 | 56413 | 7974 | +16.46 | | roms_r_base | __uv3dmix_mod_MOD_uv3dmix2 | 45255 | 52950 | 7695 | +17.00 | | roms_r_base | __lmd_vmix_mod_MOD_lmd_vmix | 45069 | 46098 | 1029 | +2.28 | | libm-2.29.so | __ieee754_pow_fma | 39869 | 40731 | 862 | +2.16 | When I looked at what happens to the compilation unit with __step3d_t_mod_MOD_step3d_t, I discovered that no IPA-CP is taking place. In fact all IPA dumps from both revisions are exactly the same, but profile counts of BB in tree dumps that immediately follow are vastly different: $ diff -u0 1/step3d_t.fppized.f90.092t.ccp2 2/step3d_t.fppized.f90.092t.ccp2 | head -38 --- 1/step3d_t.fppized.f90.092t.ccp2 2019-04-18 18:34:39.725703893 +0200 +++ 2/step3d_t.fppized.f90.092t.ccp2 2019-04-18 18:53:23.999336873 +0200 @@ -1064 +1064 @@ - <bb 2> [local count: 8]: + <bb 2> [local count: 10000]: @@ -1227 +1227 @@ - <bb 3> [local count: 4]: + <bb 3> [local count: 5000]: @@ -1229 +1229 @@ - <bb 4> [local count: 8]: + <bb 4> [local count: 10000]: @@ -1264 +1264 @@ - <bb 5> [local count: 4]: + <bb 5> [local count: 5000]: @@ -1266 +1266 @@ - <bb 6> [local count: 8]: + <bb 6> [local count: 10000]: @@ -1314 +1314 @@ - <bb 7> [local count: 4]: + <bb 7> [local count: 5000]: @@ -1316 +1316 @@ - <bb 8> [local count: 8]: + <bb 8> [local count: 10000]: @@ -1347 +1347 @@ - <bb 9> [local count: 4]: + <bb 9> [local count: 5000]: @@ -1349 +1349 @@ - <bb 10> [local count: 8]: + <bb 10> [local count: 10000]: @@ -1380 +1380 @@ - <bb 11> [local count: 4]: + <bb 11> [local count: 5000]: @@ -1382 +1382 @@ - <bb 12> [local count: 8]: + <bb 12> [local count: 10000]: @@ -1407 +1407 @@ - <bb 13> [local count: 4]: + <bb 13> [local count: 5000]: Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)