Also cc Honza and Richard since we touched generic tune. Thx, Haochen
> -----Original Message----- > From: Haochen Jiang <haochen.ji...@intel.com> > Sent: Wednesday, May 15, 2024 11:04 AM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao <hongtao....@intel.com>; ubiz...@gmail.com > Subject: [PATCH 0/2] Align tight loops to solve cross cacheline issue > > Hi all, > > Recently, we have encountered several random performance regressions in > benchmarks commit to commit. It is caused by cross cacheline issue for tight > loops. > > We are trying to solve the issue by two patches. One is adjusting the loop > alignment for generic tune, the other is aligning tight and hot loops more > aggressively. > > For SPECINT, we get a 0.85% improvement overall in rates, under option > -O2 -march=x86-64-v3 -mtune=generic on Emerald Rapids. > > BenchMarks EMR Rates > 500.perlbench_r -1.21% > 502.gcc_r 0.78% > 505.mcf_r 0.00% > 520.omnetpp_r 0.41% > 523.xalancbmk_r 1.33% > 525.x264_r 2.83% > 531.deepsjeng_r 1.11% > 541.leela_r 0.00% > 548.exchange2_r 2.36% > 557.xz_r 0.98% > Geomean-int 0.85% > > Side effect is that we get a 1.40% increase in codesize. > > BenchMarks EMR Codesize > 500.perlbench_r 0.70% > 502.gcc_r 0.67% > 505.mcf_r 3.26% > 520.omnetpp_r 0.31% > 523.xalancbmk_r 1.15% > 525.x264_r 1.11% > 531.deepsjeng_r 1.40% > 541.leela_r 1.31% > 548.exchange2_r 3.06% > 557.xz_r 1.04% > Geomean-int 1.40% > > Bootstrapped and regtested on x86_64-pc-linux-gnu. > > After we committed into trunk for a month, if there isn't any unexpected > happen. We planned to backport it to GCC14.2. > > Thx, > Haochen > > Haochen Jiang (1): > Adjust generic loop alignment from 16:11:8 to 16 for Intel processors > > liuhongt (1): > Align tight&hot loop without considering max skipping bytes. > > gcc/config/i386/i386.cc | 148 ++++++++++++++++++++++++++++++- > gcc/config/i386/i386.md | 10 ++- > gcc/config/i386/x86-tune-costs.h | 2 +- > 3 files changed, 154 insertions(+), 6 deletions(-) > > -- > 2.31.1