https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120120
Bug ID: 120120
Summary: gcc-16: performance regression with -O3 compared to
gcc-15
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: manuel.lauss at googlemail dot com
Target Milestone: ---
Created attachment 61325
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61325&action=edit
example code taking the perf hit at O3
On some code I use, I noticed a large performance regression in gcc-16,
starting at around 21.04.2025. I've attached sample C code which according to
perf takes almost all processing time.
Happens with "-O3 -march=znver5 -mtune=znver5 -pipe", at -O2 both -15 and -16
are equally slow.
Perf stats:
gcc-15:
Performance counter stats for './sanplay -2 RAM.SAN':
6,33 msec task-clock:u # 0,949 CPUs
utilized
808 page-faults:u # 127,589 K/sec
85.738.923 instructions:u # 3,10 insn per
cycle
# 0,06 stalled cycles per
insn
27.659.116 cycles:u # 4,368 GHz
4.788.925 stalled-cycles-frontend:u # 17,31% frontend
cycles idle
8.000.727 branches:u # 1,263 G/sec
275.954 branch-misses:u # 3,45% of all
branches
gcc-16:
Performance counter stats for './sanplay -2 /home/mano/games/Outlaws/RAM.SAN':
13,02 msec task-clock:u # 0,974 CPUs
utilized
314.392.362 instructions:u # 4,97 insn per
cycle
# 0,02 stalled cycles per
insn
63.277.723 cycles:u # 4,861 GHz
5.510.316 stalled-cycles-frontend:u # 8,71% frontend
cycles idle
53.730.810 branches:u # 4,127 G/sec
305.375 branch-misses:u # 0,57% of all
branches
The amount of instructions executed is 3.6x higher; on a larger example file
it's up to 4.5x instructions executed; this is not zen5 specific but happens on
a haswell as well. At -O2 both gcc-15 and gcc-16 have identical performance.
Full source is at https://github.com/mlauss2/sandec
Demo file can be grabbed from
https://samples.mplayerhq.hu/game-formats/la-san/outlaws/ram.san
I'll do a bisection next.
Thanks!
Manuel