https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90151

            Bug ID: 90151
           Summary: 554.roms_r regression on x86_64 at -O2 and generic
                    march/mtune
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ipa
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

My apologies for discovering and reporting this so late, but my
measurements show that 554.roms_r at -O2 and with generic x86_64
march/mtune regressed in performance when compiled with GCC 8 or GCC 9
compared to GCC 7.  On an Intel Sandy Bridge I see 4% slowdown, on an
AMD Zen CPU I see 6.8%, LNT reports almost 4% on Kaby Lake and 5.5% on
Zen: https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch

I have bisected this down to r254885 which is Honza's update of
profile info in IPA-CP:

Author: hubicka
Date: Fri Nov 17 17:41:10 2017
New Revision: 254885

URL: https://gcc.gnu.org/viewcvs?rev=254885&root=gcc&view=rev
Log:

        * ipa-cp.c (update_profiling_info): Handle conversion to local
        profile.
        * tree-cfg.c (execute_fixup_cfg): Do fixup same way as inliner does.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ipa-cp.c
    trunk/gcc/tree-cfg.c

I compared the number of samples in the hottest functions and arrived
at the following table:

| Object       | Function                           | r254884 | r254885 |  diff
|      % |
|--------------+------------------------------------+---------+---------+-------+--------|
| roms_r_base  | __step2d_mod_MOD_step2d_tile       |  407180 |  409257 |  2077
|  +0.51 |
| roms_r_base  | __pre_step3d_mod_MOD_pre_step3d    |  118119 |  134705 | 16586
| +14.04 |
| roms_r_base  | __t3dmix_mod_MOD_t3dmix2           |   85499 |  101650 | 16151
| +18.89 |
| roms_r_base  | __step3d_t_mod_MOD_step3d_t        |   85003 |  104599 | 19596
| +23.05 |
| roms_r_base  | __rho_eos_mod_MOD_rho_eos_tile     |   74216 |   74746 |   530
|  +0.71 |
| roms_r_base  | __step3d_uv_mod_MOD_step3d_uv_tile |   66393 |   67117 |   724
|  +1.09 |
| roms_r_base  | __rhs3d_mod_MOD_rhs3d              |   62354 |   73321 | 10967
| +17.59 |
| roms_r_base  | __lmd_skpp_mod_MOD_lmd_skpp        |   59767 |   70798 | 11031
| +18.46 |
| libm-2.29.so | __ieee754_exp_fma                  |   54324 |   56546 |  2222
|  +4.09 |
| roms_r_base  | __prsgrd_mod_MOD_prsgrd            |   48439 |   56413 |  7974
| +16.46 |
| roms_r_base  | __uv3dmix_mod_MOD_uv3dmix2         |   45255 |   52950 |  7695
| +17.00 |
| roms_r_base  | __lmd_vmix_mod_MOD_lmd_vmix        |   45069 |   46098 |  1029
|  +2.28 |
| libm-2.29.so | __ieee754_pow_fma                  |   39869 |   40731 |   862
|  +2.16 |


When I looked at what happens to the compilation unit with
__step3d_t_mod_MOD_step3d_t, I discovered that no IPA-CP is taking
place.  In fact all IPA dumps from both revisions are exactly the
same, but profile counts of BB in tree dumps that immediately follow
are vastly different:

$ diff -u0 1/step3d_t.fppized.f90.092t.ccp2 2/step3d_t.fppized.f90.092t.ccp2 |
head -38
--- 1/step3d_t.fppized.f90.092t.ccp2    2019-04-18 18:34:39.725703893 +0200
+++ 2/step3d_t.fppized.f90.092t.ccp2    2019-04-18 18:53:23.999336873 +0200
@@ -1064 +1064 @@
-  <bb 2> [local count: 8]:
+  <bb 2> [local count: 10000]:
@@ -1227 +1227 @@
-  <bb 3> [local count: 4]:
+  <bb 3> [local count: 5000]:
@@ -1229 +1229 @@
-  <bb 4> [local count: 8]:
+  <bb 4> [local count: 10000]:
@@ -1264 +1264 @@
-  <bb 5> [local count: 4]:
+  <bb 5> [local count: 5000]:
@@ -1266 +1266 @@
-  <bb 6> [local count: 8]:
+  <bb 6> [local count: 10000]:
@@ -1314 +1314 @@
-  <bb 7> [local count: 4]:
+  <bb 7> [local count: 5000]:
@@ -1316 +1316 @@
-  <bb 8> [local count: 8]:
+  <bb 8> [local count: 10000]:
@@ -1347 +1347 @@
-  <bb 9> [local count: 4]:
+  <bb 9> [local count: 5000]:
@@ -1349 +1349 @@
-  <bb 10> [local count: 8]:
+  <bb 10> [local count: 10000]:
@@ -1380 +1380 @@
-  <bb 11> [local count: 4]:
+  <bb 11> [local count: 5000]:
@@ -1382 +1382 @@
-  <bb 12> [local count: 8]:
+  <bb 12> [local count: 10000]:
@@ -1407 +1407 @@
-  <bb 13> [local count: 4]:
+  <bb 13> [local count: 5000]:


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

Reply via email to