Re: [PATCH 0/2] Align tight loops to solve cross cacheline issue

Hongtao Liu Sun, 19 May 2024 20:16:13 -0700

On Wed, May 15, 2024 at 11:30 AM Jiang, Haochen <[email protected]> wrote:
>
> Also cc Honza and Richard since we touched generic tune.
>
> Thx,
> Haochen
>
> > -----Original Message-----
> > From: Haochen Jiang <[email protected]>
> > Sent: Wednesday, May 15, 2024 11:04 AM
> > To: [email protected]
> > Cc: Liu, Hongtao <[email protected]>; [email protected]
> > Subject: [PATCH 0/2] Align tight loops to solve cross cacheline issue
> >
> > Hi all,
> >
> > Recently, we have encountered several random performance regressions in
> > benchmarks commit to commit. It is caused by cross cacheline issue for tight
> > loops.
> >
> > We are trying to solve the issue by two patches. One is adjusting the loop
> > alignment for generic tune, the other is aligning tight and hot loops more
> > aggressively.
> >
> > For SPECINT, we get a 0.85% improvement overall in rates, under option
> > -O2 -march=x86-64-v3 -mtune=generic on Emerald Rapids.
> >
> > BenchMarks      EMR Rates
> > 500.perlbench_r -1.21%
> > 502.gcc_r       0.78%
> > 505.mcf_r       0.00%
> > 520.omnetpp_r   0.41%
> > 523.xalancbmk_r 1.33%
> > 525.x264_r      2.83%
> > 531.deepsjeng_r 1.11%
> > 541.leela_r     0.00%
> > 548.exchange2_r 2.36%
> > 557.xz_r        0.98%
> > Geomean-int     0.85%
> >
> > Side effect is that we get a 1.40% increase in codesize.
> >
> > BenchMarks      EMR Codesize
> > 500.perlbench_r 0.70%
> > 502.gcc_r       0.67%
> > 505.mcf_r       3.26%
> > 520.omnetpp_r   0.31%
> > 523.xalancbmk_r 1.15%
> > 525.x264_r      1.11%
> > 531.deepsjeng_r 1.40%
> > 541.leela_r     1.31%
> > 548.exchange2_r 3.06%
> > 557.xz_r        1.04%
> > Geomean-int     1.40%
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu.
> >
> > After we committed into trunk for a month, if there isn't any unexpected
> > happen. We planned to backport it to GCC14.2.
> >
> > Thx,
> > Haochen
> >
> > Haochen Jiang (1):
> >   Adjust generic loop alignment from 16:11:8 to 16 for Intel processors
For this one, current znver{1,2,3,4,5}_cost already set loop align as
16, so I think it should be fine set it to generic_cost.
> >
> > liuhongt (1):
> >   Align tight&hot loop without considering max skipping bytes.
For this one, although we have seen similar growth on AMD's
processors, it's still nice to have someone from AMD to look at this
to see if it's what they need.
> >
> >  gcc/config/i386/i386.cc          | 148 ++++++++++++++++++++++++++++++-
> >  gcc/config/i386/i386.md          |  10 ++-
> >  gcc/config/i386/x86-tune-costs.h |   2 +-
> >  3 files changed, 154 insertions(+), 6 deletions(-)
> >
> > --
> > 2.31.1
>



-- 
BR,
Hongtao

Re: [PATCH 0/2] Align tight loops to solve cross cacheline issue

Reply via email to