https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93165

--- Comment #7 from ncm at cantrip dot org ---
(In reply to Richard Biener from comment #6)
> (In reply to Andrew Pinski from comment #4)
> > (In reply to Alexander Monakov from comment #3)
> > > So perhaps an unpopular opinion, but I'd say a
> > > __builtin_branchless_select(c, a, b) (guaranteed to live throughout
> > > optimization pipeline as a non-branchy COND_EXPR) is badly missing.
> > 
> > I am going to say otherwise.  Many of the time conditional move is faster
> > than using a branch; even if the branch is predictable (there are a few
> > exceptions) on most non-Intel/AMD targets.  This is because the conditional
> > move is just one cycle and only a "predictable" branch is one cy`le too.
> 
> The issue with a conditional move is that it adds a data dependence while
> branches are usually speculated and thus have zero overhead in the execution
> stage.  The extra dependence can easily slow things down dependent on the
> (three!) instructions feeding the conditional move (condition, first and
> second source).  This is why well-predicted branches are often so much
> faster.
> 
> > It is even worse when doing things like:
> > if (a && b)
> > where on aarch64, this can be done using only one cmp followed by one ccmp.
> > NOTE on PowerPC, you could use in theory crand/cror (though this is not done
> > currently and I don't know if they are profitable in any recent design).
> > 
> > Plus aarch64 has conditional add and a few other things which improve the
> > idea of a conditional move.
> 
> I can see conditional moves are almost always a win on less
> pipelined/speculative implementations.

Nobody wants a change that makes code slower on our pipelined/
speculative targets, but this is a concrete case where code is 
already made slower. If the code before optimization has no 
branch, as in the case of "a = (m & b)|(~m & c)", we can be 
certain that replacing it with a cmov does not introduce any 
new data dependence.

Anyway, for the case of ?:, where cmov would replace a branch, 
Gcc is already happy to substitute a cmov instruction. Gcc just 
refuses to put in a second cmov, after it, for no apparent reason.

Reply via email to