Hi Jeff,

On 10/11/18 00:04, Jeff Law wrote:
On 11/8/18 5:10 AM, Kyrill Tkachov wrote:
<Sending this on behalf of Richard Sandiford>

This patch fixes a flaw in the relationship between the way that
SCHED_PRESSURE_MODEL calculates the alap and depth vs how it uses
them in model_order_p.  A comment in model_order_p says:

   /* Combine the length of the longest path of satisfied true dependencies
      that leads to each instruction (depth) with the length of the longest
      path of any dependencies that leads from the instruction (alap).
      Prefer instructions with the greatest combined length.  If the
combined
      lengths are equal, prefer instructions with the greatest depth.

      The idea is that, if we have a set S of "equal" instructions that each
      have ALAP value X, and we pick one such instruction I, any
true-dependent
      successors of I that have ALAP value X - 1 should be preferred over S.
      This encourages the schedule to be "narrow" rather than "wide".
      However, if I is a low-priority instruction that we decided to
      schedule because of its model_classify_pressure, and if there
      is a set of higher-priority instructions T, the aforementioned
      successors of I should not have the edge over T.  */

The expectation was that scheduling an instruction X would give a
greater priority to the highest-priority successor instructions Y than
X had: Y.depth would be X.depth + 1 and Y.alap would be X.alap - 1,
giving an equal combined height, but with the greater depth winning as
a tie-breaker. But this doesn't work if the alap value was ultimately
determined by an anti-dependence.

This is particularly bad when --param max-pending-list-length kicks in,
since we then start adding fake anti-dependencies in order to keep the
list length down.  These fake dependencies tend to be on the critical
path.

The attached patch avoids that by making the alap calculation only
look at true dependencies.  This shouldn't be too bad, since we use
INSN_PRIORITY as the final tie-breaker than that does take
anti-dependencies into account.

This reduces the number of spills in the hot function from 436.cactusADM
by 14% on aarch64 at -O3 (and the number of instructions in general).
SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall).
Thanks to Wilco for the benchmarking.

Bootstrapped and tested on aarch64-none-linux-gnu.

Is this ok for trunk?

Thanks,
Kyrill

2018-11-08  Richard Sandiford  <richard.sandif...@arm.com>

gcc/
     * haifa-sched.c (model_analyze_insns): Only add 1 to the consumer's
     ALAP if the dependence is a true dependence.
So at the least the documentation of the ALAP field would need to be
updated as well as the comment you referenced (the "any dependencies").

Ah, I can easily update the patch for that.

But more importantly, it seems like blindly ignoring anti dependencies
is just a hack that happens to work.  I wonder if we could somehow mark
the fake dependencies we add, and avoid bumping the ALAP when we
encounter those fake dependencies.

I did experiment with this. I added a new property to dep_t
to mark it as "artificial", that is created in the parts of sched-deps.c
that add dependencies when MAX_PENDING_LIST_LENGTH is exceeded.

Then ALAP is bumped only when the dependency is not artificial in this way.
This didn't help much on the testcase we were looking at (the hot function in 
cactus from SPEC2006).

The code size increase and number of spills decreased by only 6 (out of ~800) 
whereas with Richards'
patch it improved much more (~140 decrease, with a corresponding improvement in 
stack usage and code size).

Richard did suggest that anti-dependencies are already taken into account in 
the INSN_PRIORITY tie-breaker,
so perhaps that is a better scheme indeed?


It probably wouldn't be a bad idea to look at the default for
MAX_PENDING_LIST_LENGTH.  Based on the current default value and the
comments in the code that value could well have been tuned 25 or more
years ago!

Probably. I see that s390 and spu increase that param in their backends to much 
larger values than the default
I played around with increasing it on aarch64. It improved things somewhat, but 
Richard's patch still gave superior results.

Thanks,
Kyrill

How often are we falling over that during a bootstrap and
during spec builds?


Jeff

Reply via email to