https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90283
Bug ID: 90283 Summary: 519.lbm_r is 7%-10% slower with -Ofast -march=native and both LTO and PGO than with GCC 8 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: rsandifo at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux When I build 519.lbm_r with GCC 9 (specifically, r270364) using -Ofast -march=native -mtune=native and both LTO and PGO, the binary is then about 7%-10% slower than when built with GCC 8 and the same options. I can see this on both and AMD Zen machine (10%) and an Intel Skylake server (7%). I have bisected the regression on the Zen machine where it regressed in two steps. The first one is r260348, which causes a 7% regression on both the Zen and Intel server CPUs. Because it affects both in a similar way, I hope it is not another manifestation of PR 84200. As far as profile data are concerned, in all cases 99% of run-time is spent in function main. Perf stat output is the following: Fast (r260347) on Zen: Performance counter stats for 'numactl -C 0 -l specinvoke': 157862.072201 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 4354 page-faults:u # 0.028 K/sec 490921430199 cycles:u # 5942617830 stalled-cycles-frontend:u # 1.21% frontend cycles idle (83.36%) 11565687163 stalled-cycles-backend:u # 2.36% backend cycles idle (83.32%) 1121945505076 instructions:u # 2.29 insn per cycle # 0.01 stalled cycles per insn (83.32%) 11591019938 branches:u # 73.425 M/sec (83.36%) 50878910 branch-misses:u # 0.44% of all branches (83.33%) 158.013578100 seconds time elapsed Slower (r260348) on Zen: Performance counter stats for 'numactl -C 0 -l specinvoke': 166747.570030 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 4354 page-faults:u # 0.026 K/sec 520147919104 cycles:u # 4619521659 stalled-cycles-frontend:u # 0.89% frontend cycles idle (83.32%) 11565577319 stalled-cycles-backend:u # 2.22% backend cycles idle (83.32%) 1133497632829 instructions:u # 2.18 insn per cycle # 0.01 stalled cycles per insn (83.36%) 11583199072 branches:u # 69.465 M/sec (83.33%) 50821264 branch-misses:u # 0.44% of all branches (83.32%) 166.898923990 seconds time elapsed The second performance drop on Zen happened at r265795, albeit only by 3% and the revision does not seem to have any effect on the Intel CPU (and thus given how weirdly the benchmark can sometimes behave, may not be that interesting). Just before the second drop (r265794): Performance counter stats for 'numactl -C 0 -l specinvoke': 165315.997872 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 4354 page-faults:u # 0.026 K/sec 520201473687 cycles:u # 4890796962 stalled-cycles-frontend:u # 0.94% frontend cycles idle (83.37%) 11565134531 stalled-cycles-backend:u # 2.22% backend cycles idle (83.32%) 1132849187518 instructions:u # 2.18 insn per cycle # 0.01 stalled cycles per insn (83.31%) 11591493304 branches:u # 70.117 M/sec (83.37%) 50879513 branch-misses:u # 0.44% of all branches (83.32%) 165.498590592 seconds time elapsed Second drop (r265795): Performance counter stats for 'numactl -C 0 -l specinvoke': 170908.963939 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 4430 page-faults:u # 0.026 K/sec 539336426342 cycles:u # 3889378937 stalled-cycles-frontend:u # 0.72% frontend cycles idle (83.36%) 11564727183 stalled-cycles-backend:u # 2.14% backend cycles idle (83.32%) 1146203876321 instructions:u # 2.13 insn per cycle # 0.01 stalled cycles per insn (83.31%) 11589809180 branches:u # 67.813 M/sec (83.37%) 50679537 branch-misses:u # 0.44% of all branches (83.32%) 171.089470855 seconds time elapsed Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)