Pat Haugen <pthau...@linux.vnet.ibm.com> writes:
> This patch attempts to fix problems with the first scheduling pass
> creating too much register pressure. It does this by enabling the target
> hook to compute the pressure classes for rs6000 target since the first
> thing I observed while investigating the testcase in the subject PR is
> that IRA was picking NON_SPECIAL_REGS as a pressure class which led to
> the sched-pressure code computing too high of a value for number of regs
> available for pseudos preferring GENERAL_REGS. It also enables
> -fsched-pressure by default, using the 'model' algorithm.
>
> I ran various runs of cpu20006 to determine the set of pressure classes
> and which sched-pressure algorithm to use. Net result is that with these
> patches I see 6 benchmarks improve in the 2.4-6% range but there are
> also a couple 2% degradations which will need follow up in GCC 8. There
> was also one benchmark that showed a much bigger improvement with the
> 'weighted' sched-pressure algorithm that also needs follow up
> ('weighted' was not chosen as default since it showed more
> degradations).

I remember trying this on either Power 8 or Power 7 (not sure which) and
seeing something similar.  I think the problem was that both algorithms
use pressure classes to detect pressure points.  For -mvsx we used
VSX_REGS as a pressure class, so all scalar floating-point operations
would be counted against that class.  Since only FLOAT_REGS can be used
for scalar operations, we'd be overoptimistic about the number of scalar
floating-point values that could be live at once.

I think both algorithms were affected by this.  The difference was that
the "weighted" approach was in general more pessimistic (particularly in
its handling of register deaths) and so was less sensitive to the accuracy
of the pressure calculation.  The "model" approach was more optimistic and
and so needed a fairly accurate pressure calculation.  In this situation it
would overcommit and try to use twice as many floating-point registers as
were available.

I think the fix would be to keep tallies for both FLOAT_REGS and VSX_REGS.
The pressure on FLOAT_REGS would then be the maximum of the FLOAT_REGS tally
and (the VSX_REGS tally - the number of registers outside FLOAT_REGS).
The pressure on VSX_REGS would be the sum of the VSX_REGS and FLOAT_REGS
tallies.  In the end I never had time to try that though.

Thanks,
Richard

Reply via email to