[Bug target/97127] FMA3 code transformation leads to slowdown on Skylake

already5chosen at yahoo dot com Tue, 22 Sep 2020 03:01:44 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127


--- Comment #6 from Michael_S <already5chosen at yahoo dot com> ---
Why do you see it as addition of peephole pattern?
I see it as removal. Like, "do what's written in the source and don't try to be
tricky".
Probably, I am too removed from how compilers work :(

Or, may be, handle it at the level of cost of instructions?
I don't know how gcc works internally, but it seems that currently the cost of
register move [on Haswell and Skylake] is underestimated.
Although it is true that register move has no cost in terms of execution ports
and latency, it still has the same cost as, say, integer ALU instructions, in
terms of the front end and renamer.

Also, as pointed out above by Alexander, the cost of FMA3 with (base+index) or
(index*scale) memory operands could also be underestimated.

Unlike Alexander, I am not sure that the difference between (base+index) and
(base) is really what matters. IMO, the cost of FMA3 with *any* memory operands
is underestimated, but I am not going to insist on that.


In the ideal world compiler should reason as I do it myself when coding in asm:
estimate which resource is critical in the given loop and then try to reduce
pressure on this particular resource. In this particular loop a critical
resource appears to be renamer, so the cost of instructions should be seen as
cost at renamer. In other situations critical resource could be throughput of
particular issue port. In yet another situation it could be latency of
instructions that form dependency chain across multiple iterations of the loop.
The later is especially common in "reduction" algorithms of which dot product
is the most common example.
A single value of instruction cost simply can't cover all these different cases
in satisfactory manner.
May be, gcc it's already here, I don't know.

[Bug target/97127] FMA3 code transformation leads to slowdown on Skylake

Reply via email to