[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

vmakarov at gcc dot gnu.org via Gcc-bugs Thu, 10 Feb 2022 07:17:34 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178


--- Comment #30 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #29)
> (In reply to Vladimir Makarov from comment #28)
> > Could somebody benchmark the following patch on zen2 470.lbm.
> 
> Code generation changes quite a bit, with the patch the offending function
> is 16 bytes larger.  I see no large immediate moves to GPRs anymore but
> there is still a lot of spilling of XMMs to GPRs.  Performance is
> unchanged by the patch:
> 
> 470.lbm         13740        128        107 S   13740        128        107 S
> 470.lbm         13740        128        107 *   13740        128        107 S
> 470.lbm         13740        128        107 S   13740        128        107 *
> 
> 

Thank you very much for testing the patch, Richard.  The results mean no go for
the patch to me.

> Without knowing much of the code I wonder if we can check whether the move
> will be to a reg in GENERAL_REGS?  That is, do we know whether there are
> (besides some special constants like zero), immediate moves to the
> destination register class?
>

There are no such info from the target code.  Ideally we need to have the cost
of loading *particular* immediate value into register class on the same cost
basis
as load/store.  Still to use this info efficiently choosing alternatives should
be based on costs not on the hints and some machine independent general
heuristics (as now).


> That said, given the result on LBM I'd not change this at this point.
> 
> Honza wanted to look at the move pattern to try to mitigate the
> GPR spilling of XMMs.
> 
> I do think that we need to take costs into account at some point and get
> rid of the reload style hand-waving with !?* in the move patterns.

In general I am agree with the direction but it will be quite hard to do.  I
know it well from my experience to change register class cost calculation
algorithm in IRA (the experimental code can be found on the branch ira-select).
I expect huge number of test failures and some benchmark performance
degradation practically for any targets and a big involvement of target
maintainers to fix them.  Although it is possible to try to do this for one
target at the time.

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

Reply via email to