RE: [PATCH 0/2] Align tight loops to solve cross cacheline issue

Jiang, Haochen Tue, 14 May 2024 20:30:24 -0700

Also cc Honza and Richard since we touched generic tune.

Thx,
Haochen


> -----Original Message-----
> From: Haochen Jiang <haochen.ji...@intel.com>
> Sent: Wednesday, May 15, 2024 11:04 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao <hongtao....@intel.com>; ubiz...@gmail.com
> Subject: [PATCH 0/2] Align tight loops to solve cross cacheline issue
> 
> Hi all,
> 
> Recently, we have encountered several random performance regressions in
> benchmarks commit to commit. It is caused by cross cacheline issue for tight
> loops.
> 
> We are trying to solve the issue by two patches. One is adjusting the loop
> alignment for generic tune, the other is aligning tight and hot loops more
> aggressively.
> 
> For SPECINT, we get a 0.85% improvement overall in rates, under option
> -O2 -march=x86-64-v3 -mtune=generic on Emerald Rapids.
> 
> BenchMarks      EMR Rates
> 500.perlbench_r -1.21%
> 502.gcc_r       0.78%
> 505.mcf_r       0.00%
> 520.omnetpp_r   0.41%
> 523.xalancbmk_r 1.33%
> 525.x264_r      2.83%
> 531.deepsjeng_r 1.11%
> 541.leela_r     0.00%
> 548.exchange2_r 2.36%
> 557.xz_r        0.98%
> Geomean-int     0.85%
> 
> Side effect is that we get a 1.40% increase in codesize.
> 
> BenchMarks      EMR Codesize
> 500.perlbench_r 0.70%
> 502.gcc_r       0.67%
> 505.mcf_r       3.26%
> 520.omnetpp_r   0.31%
> 523.xalancbmk_r 1.15%
> 525.x264_r      1.11%
> 531.deepsjeng_r 1.40%
> 541.leela_r     1.31%
> 548.exchange2_r 3.06%
> 557.xz_r        1.04%
> Geomean-int     1.40%
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu.
> 
> After we committed into trunk for a month, if there isn't any unexpected
> happen. We planned to backport it to GCC14.2.
> 
> Thx,
> Haochen
> 
> Haochen Jiang (1):
>   Adjust generic loop alignment from 16:11:8 to 16 for Intel processors
> 
> liuhongt (1):
>   Align tight&hot loop without considering max skipping bytes.
> 
>  gcc/config/i386/i386.cc          | 148 ++++++++++++++++++++++++++++++-
>  gcc/config/i386/i386.md          |  10 ++-
>  gcc/config/i386/x86-tune-costs.h |   2 +-
>  3 files changed, 154 insertions(+), 6 deletions(-)
> 
> --
> 2.31.1

RE: [PATCH 0/2] Align tight loops to solve cross cacheline issue

Reply via email to