On 16/12/15 12:18, Bernd Schmidt wrote:
On 12/15/2015 05:21 PM, Kyrill Tkachov wrote:
Then for the shift pattern in the MD file we'd have to dynamically
select the scheduling type depending on whether or not the shift
amount is 1 and the costs line up?
Yes. This isn't unusual, take a look at i386.md where you have a lot of
switches on attr type to decide which string to print.
I'm just worried that if we take this idea to its logical conclusion, we have
to add a new canonicalisation rule:
"all (plus x x) expressions shall be expressed as (ashift x 1)".
Such a rule seems too specific to me and all targets would have to special-case
it in their MD patterns and costs
if they ever wanted to treat an add and a shift differently.
In this particular case we'd have
to conditionalise the scheduling string selection on a particular CPU tuning
and the shift amount, which will make
the pattern much harder to read.
To implement this properly we'd also have to
The price we pay when trying these substitutions is an iteration over
the rtx with FOR_EACH_SUBRTX_PTR. recog gets called only if that
iteration actually performed a substitution of x + x into x << 1. Is
that too high a price to pay? (I'm not familiar with the performance
characteristics of the FOR_EACH_SUBRTX machinery)
It depends on how many of these transforms we are going to try; it also feels
very hackish, trying to work around the core design of the combiner. IMO it
would be better for machine descriptions to work with the pass rather than
against it.
Perhaps I'm lacking the historical context, but what is the core design of the
combiner?
Why should the backend have to jump through these hoops if it already
communicates to the midend
(through correct rtx costs) that a shift is more expensive than a plus?
I'd be more inclined to agree that this is perhaps a limitation in recog rather
than combine,
but still not a backend problem.
If you can somehow arrange for the (plus x x) to be turned into a shift while
substituting that might be yet another approach to try.
I did investigate where else we could make this transformation.
For the zero_extend+shift case (the ubfiz instruction from the testcase in my
original submission)
we could fix this by modifying make_extraction to convert its argument to a
shift from (plus x x)
as, in that context, shifts are undoubtedly more likely to simplify with the
various extraction
operations that it's trying to perform.
That leaves the other case (orr + shift), where converting to a shift isn't a
simplification in any way, but the backend happens to have an instruction that
matches the
combined orr+shift form. There we want to perform the transformation purely to
aid
recognition, not out of any simplification considerations. That's what I'm
trying to figure out
how to do now.
However, we want to avoid doing it unconditionally because if we have just a
simple set
of y = x + x we want to leave it as a plus rather than a shift because it's
cheaper
on that target.
Thanks,
Kyrill
Bernd