Hi Jeff,
On 16/11/18 17:08, Jeff Law wrote:
On 11/16/18 5:35 AM, Kyrill Tkachov wrote:
But more importantly, it seems like blindly ignoring anti dependencies
is just a hack that happens to work. I wonder if we could somehow mark
the fake dependencies we add, and avoid bumping the ALAP when we
encounter those fake dependencies.
I did experiment with this. I added a new property to dep_t
to mark it as "artificial", that is created in the parts of sched-deps.c
that add dependencies when MAX_PENDING_LIST_LENGTH is exceeded.
Then ALAP is bumped only when the dependency is not artificial in this way.
This didn't help much on the testcase we were looking at (the hot
function in cactus from SPEC2006).
The code size increase and number of spills decreased by only 6 (out of
~800) whereas with Richards'
patch it improved much more (~140 decrease, with a corresponding
improvement in stack usage and code size).
Richard did suggest that anti-dependencies are already taken into
account in the INSN_PRIORITY tie-breaker,
so perhaps that is a better scheme indeed?
ISTM that your experiment indicates that it's not the artificial
dependencies that are the problem. It's the real anti-dependencies
that are mucking things up. That's fine, it just means I don't think we
want/need to do anything special for the artificial dependencies.
I must apologise. Since I sent this out earlier I found a bug in my
implementation
of the above experiment which meant I wasn't marking the dependencies properly
in all cases :(
With that fixed, the approach removes ~100 spills which is much better than
what I reported initially
however still not as good as Richards' patch (removed ~140 spills).
I've kicked off a SPEC2006 benchmarking run to see if it has any any effect.
We certainly use INSN_PRIORITY as one of the (many) tie breakers in the
rank_for_schedule routine. BUt I don't know if that's a better place or
not.
I trust Richard, so if he thinks the patch is the best approach, then
let's go with it after that trivial comment fix I mentioned in my
previous message.
I'll respin Richard's patch with the comment updates and resend that,
unless the benchmark run above shows something interesting.
It probably wouldn't be a bad idea to look at the default for
MAX_PENDING_LIST_LENGTH. Based on the current default value and the
comments in the code that value could well have been tuned 25 or more
years ago!
Probably. I see that s390 and spu increase that param in their backends
to much larger values than the default
I played around with increasing it on aarch64. It improved things
somewhat, but Richard's patch still gave superior results.
ACK. Thanks for testing. If you want to adjust, that would seem fine
as a follow-up.
I'd need to benchmark such a change, but thanks.
MAX_PENDING_LIST_LENGTH only exists to limit compile-time, right?
Thanks, and sorry for the confusion.
Kyrill
jeff