On 01/27/2017 07:19 AM, Segher Boessenkool wrote:
On Fri, Jan 27, 2017 at 02:30:49PM +0100, Richard Biener wrote:
Ok, maybe with -fno-trapping-math we don't consider that case but even
then generating
a NaN is usually dreadfully slow so avoiding speculation of such insns
looks good in
any case (w/o considering its cost).

And -ffast-math includes -ffinite-math-only.  No, the testcase never
takes the square root of number smaller than zero, it isn't *that* slow ;-)

Well, the testcase as written doesn't but if you speculate the sqrt it might?

Yeah true.  Except we have -ffast-math so we told the compiler that is
just fine to do.

Things slow down so much because there is a loop immediately followed
by a square root insn, and sched-rgn decides it is a good idea to move
it to inside the loop.  Which is a bad idea no matter what the frequency
of the loop is because 1) we do not get such profiles very correct, and
2) sqrt is really expensive.

I understood that but then moving sth inside a loop is almost never a win.

It defaults to moving something if it has space for it in the schedule
and it is executed at least 40% of the time (I think).

Can't "not modeled" insns not be marked somehow in the pipeline description?

Well, the only thing from the pipeline description that is used here is
the insn latency, which isn't all that much higher than "normal" FP insns.
And simply "not decribed properly" won't do much good -- if we could
(without blowing up the automata) we would, and sched-rgn would then
still speculate this.
And I think this is the core of the issue. We have multiple ports that don't necessarily fully describe the latency, issue rates, etc of certain insns like div/sqrt/rsqrt. There are good reasons for doing that.

Because of the partial description, the scheduler may think those insns fit into a pipeline bubble within the loop, when reality they do not.

The scheduler currently has no way of knowing what insns have this property. While there are cases where we'd like to speculate a div or sqrt to give it more time to complete without stalls -- there's no good way to do that without fully describing them to the scheduler.

My preference would be somehow either mark those insns as not fully modeled and avoid speculating on them. Or invent a target hook to allow the scheduler to query the backend.

Note that these could be used elsewhere -- for example delay slot scheduling and predication. Delay slot scheduling does speculation and there's ports that simply refuse to allow certain instructions (div/sqrt on one port, I think all FP stuff on another) to avoid these kinds of problems.

Similarly nullification/predication often work by wiping out the final posting of results into the register file. So imagine a non-pipelined div/sqrt. Predicating a div/sqrt instruction will actually keep the pipeline busy computing results that will be thrown away and preventing other useful work from occurring. And, yes, this really does happen. THe PA suffered from these problems.

jeff

Reply via email to