12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

rsandifo at gcc dot gnu.org via Gcc-bugs Wed, 08 Dec 2021 07:02:10 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782


--- Comment #18 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
(In reply to Tamar Christina from comment #17)
> > On “CALL_FREQ grows much quicker than BB_FREQ”: for r104, the
> > ALLOCNO_FREQ ought in principle to be fixed for a given loop iteration
> > count.  It shouldn't grow or shrink based on the value of SPILLED.
> > That's because every execution of the loop body involves exactly one
> > reference to r104.  SPILLED specifies the probability that that single
> > reference is the “call” use rather than the “non-call” use, but it doesn't
> > change the total number of references per iteration.
> > 
> > So I think the only reason we see the different ALLOCNO_FREQs in:
> > 
> >    ALLOCNO_FREQ 989, …
> > 
> > vs:
> > 
> >    ALLOCNO_FREQ 990, …
> > 
> > is round-off error.  If the values had more precision, I think we'd
> > have a fixed ALLOCNO_FREQ and a varying ALLOCNO_CALL_FREQ.
> 
> yeah, that's plausible, as far as I can tell the FREQ are always scaled by
> REG_FREQ_FROM_EDGE_FREQ into [0, BB_FREQ_MAX] and that indeed does an
> integer division.  The general problem is that the IPA frequences don't
> really seem to have any bounded range and so it always needs to scale.
> 
> So I think you're always going to have this error one way or another
> which may or may not work to your advantage on any given program.
> 
> Maybe we need a way to be a bit more tolerant of this rounding error
> instead?
I guess the thing is: with the costs that the target is giving IRA,
there isn't much of a difference between the cost of the “0.5 way”
and the cost of the “0.51 way” for values of SPILLED around the
0.5 mark.

For spilling r104 we have:

  Factor            Cost type  What
  ======            =========  ====
  1                 store      store outside the loop when R is set
  1024×SPILLED      load       load inside the loop when R is used (call)
  1024×(1-SPILLED)  load       load inside the loop when R is used (non-call)

For allocating a call-clobbered register to r104 we have:

  Factor            Cost type  What
  ======            =========  ====
  1024×SPILLED      store      store before the call
  1024×SPILLED      load       load after the call

When loads and store have equal weight, that weight cancels out,
giving:

    1025  vs.  2048×SPILLED

So the costs are equal for SPILLED=1025/2048.  Above that spilling
is cheaper, below that allocating a call-clobbered register is
cheaper.

IRA has done well to find the exact tipping point (based on the
assumptions it has been told to use), but for values of SPILLED
around the tipping point, the two costs are very close.

So with the current cost model, we're not falling off a cliff
cost-wise when we hit the tipping point or rounding error.
The costs coming in give the impression that there isn't much
to chose between the two approaches when SPILLED is roughly
half-and-half.

Now if SPILLED is completely off the mark, e.g. we say it has
a probability of 0.51 but the actual runtime probability is more
like 0.1, then clearly we're in trouble.  I'm not sure what we can
do about that though.  Perhaps if the edge frequencies (and derived
block frequencies) are pure guesses, we should weight based on
the approximate cost of the block instead.  I.e. the call block
gets a low frequency because calls are expensive, while the
non-call block gets a high frequency because it does simple
arithmetic.  Thus any new instructions added to the call block
are likely to have a proportionately lower effect than any new
instructions added to the non-call block.

I hope we don't have to do that though :-)

> > > > which is cheaper than both the current approaches.  We don't do that
> > > > optimisation yet though, so the current costing seems to reflect what we
> > > > currently generate.
> > > 
> > > In many (if not most) Arches stores are significantly cheaper than the 
> > > loads
> > > though. So the store before the call doesn't end up making that much of a
> > > difference, but yes it adds up if you have many of them.
> > Yeah.  Could we fix the problem that way instead?  The only reason IRA is
> > treating loads and stores as equal cost is because aarch64 asked it to :-)
> 
> I tried a quick check and it does fix the testcase but not the benchmark.
> which
> is not entirely unexpected thinking about it because x86 does correctly
> model the
> store costs.
Yeah, was afraid that would be the case.

> I can try fixing the costs correctly and try reducing again.  It looks like
> it still
> thinks spilling to memory is cheaper than caller saves reloads.
Thanks.  I think that would help.  The current aarch64 costs are likely
to distort things if we try to reduce using them.

IMO the initial reduction has been useful because it gives a nice
crystallised example of why the load/store cost ratio matters.

[Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

Reply via email to