https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93165
--- Comment #7 from ncm at cantrip dot org --- (In reply to Richard Biener from comment #6) > (In reply to Andrew Pinski from comment #4) > > (In reply to Alexander Monakov from comment #3) > > > So perhaps an unpopular opinion, but I'd say a > > > __builtin_branchless_select(c, a, b) (guaranteed to live throughout > > > optimization pipeline as a non-branchy COND_EXPR) is badly missing. > > > > I am going to say otherwise. Many of the time conditional move is faster > > than using a branch; even if the branch is predictable (there are a few > > exceptions) on most non-Intel/AMD targets. This is because the conditional > > move is just one cycle and only a "predictable" branch is one cy`le too. > > The issue with a conditional move is that it adds a data dependence while > branches are usually speculated and thus have zero overhead in the execution > stage. The extra dependence can easily slow things down dependent on the > (three!) instructions feeding the conditional move (condition, first and > second source). This is why well-predicted branches are often so much > faster. > > > It is even worse when doing things like: > > if (a && b) > > where on aarch64, this can be done using only one cmp followed by one ccmp. > > NOTE on PowerPC, you could use in theory crand/cror (though this is not done > > currently and I don't know if they are profitable in any recent design). > > > > Plus aarch64 has conditional add and a few other things which improve the > > idea of a conditional move. > > I can see conditional moves are almost always a win on less > pipelined/speculative implementations. Nobody wants a change that makes code slower on our pipelined/ speculative targets, but this is a concrete case where code is already made slower. If the code before optimization has no branch, as in the case of "a = (m & b)|(~m & c)", we can be certain that replacing it with a cmov does not introduce any new data dependence. Anyway, for the case of ?:, where cmov would replace a branch, Gcc is already happy to substitute a cmov instruction. Gcc just refuses to put in a second cmov, after it, for no apparent reason.