On 2016-11-10 13:30, Bin.Cheng wrote:
Hi,
I see the cost problem with your test now. When computing an address
type iv_use with a candidate, the computation consists of two parts,
for computation can be represented by addressing mode, it is done in
memory reference; for computation cannot be represented by addressing
mode, it is done outside of memory reference. The final cost is added
up from the two computation parts.
For address iv_use:
MEM[base + biv << scale + offset]
when it is computed with below candidate on target only supports [base
+ biv << scale] addressing mode:
biv
The computations would be like:
base' = base + offset
MEM[base' + biv << scale]
Both computations has its own cost, the first one is normal RTX cost,
the second one is addressing mode cost. Final cost is added up from
both parts.
Normally, all these cost should be added up in cost model, but there
should be one exception found in your test: If iv_uses of a group has
exactly the same iv ({base, step}), the first part computation (RTX)
can be shared among all iv_uses, thus the cost should only counted one
time. That is, we should be able to model such CSE opportunities.
Apparently, we can't CSE the second part computation, of course there
won't be CSE opportunities in address expression anyway.
Hi Bin,
Yes, that is exactly what happens. And this computation might be cheaper
than initialization and increment of new iv and it would be more
preferable.
That said, this patch should make difference between cost of RTX
computation and address expression, and only add up RTX cost once if
it can be CSEed. Well, it might be not trivial to check CSE
opportunities of RTX computation, for example, some iv_uses of the
group are the same, others are not.
Thanks,
bin
Since uses in a given group have the same base and step, they can only
differ by offsets. Among those, equivalent offsets can be CSE'd. Then,
perhaps it's possible to use a hash set of unique offsets in this group
cost estimation loop, and count RTX computation cost only when adding a
new entry to the set. What do you think about this approach?
While working on this issue, I've found another problem: that costs may
become negative. That looks unintended, I have filed a new bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78332
Thanks,
Evgeny.