https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94427
Bug ID: 94427 Summary: 456.hmmer is 8-17% slower when compiled at -Ofast than with GCC 9 Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux SPECINT 2006 benchmark 456.hmmer runs 18% slower on AMD Zen2 CPUs, 15% on AMD Zen1 CPUs and 8% on Intel Cascade Lake server CPUs when built with trunk (revision 26b3e568a60) and just -Ofast (so with generic march/mtune) than when compiled wth GCC 9. Bisecting the regression leads to commit: commit 14ec49a7537004633b7fff859178cbebd288ca1d Author: Richard Biener <rguent...@suse.de> Date: Tue Jul 2 07:35:23 2019 +0000 re PR tree-optimization/58483 (missing optimization opportunity for const std::vector compared to std::array) 2019-07-02 Richard Biener <rguent...@suse.de> PR tree-optimization/58483 * tree-ssa-scopedtables.c (avail_expr_hash): Use OEP_ADDRESS_OF for MEM_REF base hashing. (equal_mem_array_ref_p): Likewise for base comparison. * gcc.dg/tree-ssa/ssa-dom-cse-8.c: New testcase. From-SVN: r272922 Collected profiles are weird, almost the other way round I would expect them to be, because the *slow* version spends less time in cold section - but both spend IMHO too much time there. The following data were collected on AMD Zen2 but those from Intel are similar in this regard. What is different is that on Intel perf stat reports doubling of branch misses - and because it has older perf it does not report front/back-end stalls. Before the aforementioned revision: Performance counter stats for 'numactl -C 0 -l specinvoke': 163360.87 msec task-clock:u # 0.992 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 7639 page-faults:u # 0.047 K/sec 525635661818 cycles:u # 809847511 stalled-cycles-frontend:u # 0.15% frontend cycles idle (83.35%) 299331255326 stalled-cycles-backend:u # 56.95% backend cycles idle (83.30%) 1757801907547 instructions:u # 3.34 insn per cycle # 0.17 stalled cycles per insn (83.34%) 133496985084 branches:u # 817.191 M/sec (83.35%) 682351923 branch-misses:u # 0.51% of all branches (83.31%) 164.659685804 seconds time elapsed 163.325420000 seconds user 0.022183000 seconds sys # Samples: 637K of event 'cycles:u' # Event count (approx.): 527143782584 # # Overhead Samples Shared Object Symbol # ........ ............ ....................... .................... # 58.43% 372284 hmmer_peak.mine-std-gen [.] P7Viterbi 35.12% 223887 hmmer_peak.mine-std-gen [.] P7Viterbi.cold 2.59% 16418 hmmer_peak.mine-std-gen [.] FChoose 2.51% 15906 hmmer_peak.mine-std-gen [.] sre_random At the aforementioned revision: Performance counter stats for 'numactl -C 0 -l specinvoke': 191483.84 msec task-clock:u # 0.994 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 7639 page-faults:u # 0.040 K/sec 622159384711 cycles:u # 817604010 stalled-cycles-frontend:u # 0.13% frontend cycles idle (83.31%) 439972264588 stalled-cycles-backend:u # 70.72% backend cycles idle (83.34%) 1707838992202 instructions:u # 2.75 insn per cycle # 0.26 stalled cycles per insn (83.35%) 91309384910 branches:u # 476.852 M/sec (83.32%) 655463713 branch-misses:u # 0.72% of all branches (83.33%) 192.564513355 seconds time elapsed 191.443774000 seconds user 0.023978000 seconds sys # Samples: 752K of event 'cycles:u' # Event count (approx.): 622947549968 # # Overhead Samples Shared Object Symbol # ........ ............ ........................ .................... # 83.68% 629645 hmmer_peak.small-std-gen [.] P7Viterbi 10.84% 81591 hmmer_peak.small-std-gen [.] P7Viterbi.cold 2.21% 16546 hmmer_peak.small-std-gen [.] FChoose 2.11% 15793 hmmer_peak.small-std-gen [.] sre_random Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)