[Bug target/56309] -O3 optimizer generates conditional moves instead of compare and branch resulting in almost 2x slower code

ubizjak at gmail dot com Thu, 14 Feb 2013 00:22:58 -0800


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309




--- Comment #5 from Uros Bizjak <ubizjak at gmail dot com> 2013-02-14 08:22:33 
UTC ---

For reference, some quotes from Honza:



-- PR54073--

The decision on whether to use cmov or jmp was always tricky on x86

architectures. Cmov increase dependency chains, register pressure (both values

needs to be loaded in) and has long opcode. So jump sequence, if well

predicted, flows better through the out-of-order core. If badly predicted it

is, of course, a disaster. I think more modern CPUs solved the problems with

long latency of cmov, but the dependency chains are still there.



[...]



We should do something about rnflow. I will look into that.

The usual wisdom is that lacking profile feedback one should handle non-loop

branhces as inpredctable and loop branches as predictable, so all handled by

ifconvert fals to the first category. With profile feedback one can see branch

probability and if it is close to 0 or REG_BR_PROB_BASE tread the branch as

predictable. We handle this with predictable_edge_p parameter passed to

BRANCH_COST (that by itself is a gross, but for years we was not able to come

with something saner)

-- /PR54073 --



-- PR53046 --

Well, as I wrote to the other PR, the main problem of cmov is extension of

dependency chain.  For well predicted sequence with conditional jump there is

no update of rbs so the loop executes faster, because the

loads/stores/comparisons executes "in parallel". The load in the next iteration

can then happen speculatively before the condition from previous iteration is

resolved. With cmov in it, there is dependence on rbx for all the other

computations in the loop.



I guess there is no localy available information suggesting suggesting that the

particular branch is well predictable, at least without profile feedback (where

we won't disable the conversion anyway).



[...]

-- /PR53046 --

[Bug target/56309] -O3 optimizer generates conditional moves instead of compare and branch resulting in almost 2x slower code

Reply via email to