https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713
--- Comment #7 from Roger Sayle <roger at nextmovesoftware dot com> --- I agree in the general case, a conditional jump (that depends only on the condition flags) potentially has a shorter dependence chain than a cmov (which depends on the condition flags and two registers). But in this case, the condition codes can't be determined any faster than the register operands. I believe the branch prediction argument is a red herring. Yes, with clever hardware and luck, the CPU can predict which instruction will be executed next after a conditional jump, but it can always know which instruction is executed next after a cmov. A cmov (and multiple cmovs) can be scheduled and executed out-of-order without speculation. Hence branch prediction is only a factor when dependency chain lengths are an issue/unequal (or the cmov is slower than a correctly predicted branch). An understanding the data distribution is also irrelevant if the best/fastest (correctly predicted) branch implementation is no better/faster than the cmov. But I can also imagine microarchitectures where predicted conditional jumps are free (requiring zero cycles) and where the condition code "test" is eliminated having been set/forwarded from an earier instruction, in which case a zero-latency abs is about as good as you can get. Are we assuming a target with "predicted_branch_cost < conditional_move_cost"? I wouldn't be surprised if GCC internally assumes these are both always COSTS_N_INSNS(1). If conditional_move_cost <= predicted_branch_cost <= mispredicted_branch_cost then the cmov should always preferred (independent of branch probabilities or __builtin_expect hints). If predicted_branch_cost <= mispredicted_branch_cost <= conditional_move_cost, the branch should always be preferred, and the cmov shouldn't be part of the ISA. The interesting domain of trade-offs is where/when predicted_branch_cost < conditional_move_cost <= mispredicted_branch_cost (which I'm not yet convinced is the case here). Do we have any numbers that show the branch is better (for this case) on real hardware, than can't be explained by other factors? For example, on ABS where the inputs are always positive.