Hi all, Recently, we have encountered several random performance regressions in benchmarks commit to commit. It is caused by cross cacheline issue for tight loops.
We are trying to solve the issue by two patches. One is adjusting the loop alignment for generic tune, the other is aligning tight and hot loops more aggressively. For SPECINT, we get a 0.85% improvement overall in rates, under option -O2 -march=x86-64-v3 -mtune=generic on Emerald Rapids. BenchMarks EMR Rates 500.perlbench_r -1.21% 502.gcc_r 0.78% 505.mcf_r 0.00% 520.omnetpp_r 0.41% 523.xalancbmk_r 1.33% 525.x264_r 2.83% 531.deepsjeng_r 1.11% 541.leela_r 0.00% 548.exchange2_r 2.36% 557.xz_r 0.98% Geomean-int 0.85% Side effect is that we get a 1.40% increase in codesize. BenchMarks EMR Codesize 500.perlbench_r 0.70% 502.gcc_r 0.67% 505.mcf_r 3.26% 520.omnetpp_r 0.31% 523.xalancbmk_r 1.15% 525.x264_r 1.11% 531.deepsjeng_r 1.40% 541.leela_r 1.31% 548.exchange2_r 3.06% 557.xz_r 1.04% Geomean-int 1.40% Bootstrapped and regtested on x86_64-pc-linux-gnu. After we committed into trunk for a month, if there isn't any unexpected happen. We planned to backport it to GCC14.2. Thx, Haochen Haochen Jiang (1): Adjust generic loop alignment from 16:11:8 to 16 for Intel processors liuhongt (1): Align tight&hot loop without considering max skipping bytes. gcc/config/i386/i386.cc | 148 ++++++++++++++++++++++++++++++- gcc/config/i386/i386.md | 10 ++- gcc/config/i386/x86-tune-costs.h | 2 +- 3 files changed, 154 insertions(+), 6 deletions(-) -- 2.31.1