https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110007
--- Comment #5 from Richard Yao <richard.yao at alumni dot stonybrook.edu> --- (In reply to Andrew Pinski from comment #4) > (In reply to Richard Yao from comment #3) > > (In reply to Andrew Pinski from comment #2) > > > (In reply to Richard Yao from comment #0) > > > > Having the ability to specify __builtin_unpredictable() as a hint to > > > > encourage the compiler to use cmov would be useful for implementing > > > > algorithms like binary search that have unpredictable branches. > > > > __builtin_expect_with_probability() looks like a possible substitute, > > > > but > > > > Clang does not support it and it does not always work as described in > > > > #110001. > > > > > > PR 110001 has nothing to do with __builtin_expect_with_probability and > > > > I mentioned it briefly in PR 110001, but I guess I should make it explicit. > > See line 58 here: > > > > https://gcc.godbolt.org/z/ef3Yfchzv > > > > That is made into a jump, when all that is necessary is cmov, which is what > > Clang generates. In the example I had posted in PR 110001, I had not used > > __builtin_expect_with_probability() because it made no difference, but here > > I am using it to show that cmov is not used. > > > > If we look at the -fdump-tree-optimized-lineno dump we see why it was not > turned into a cmov: > [/app/example.c:58:12 discrim 2] if (a_111 <= key_128(D)) > goto <bb 28>; [50.00%] > else > goto <bb 23>; [50.00%] > > <bb 23> [local count: 447392427]: > [/app/example.c:75:40] _268 = (long unsigned int) i_82; > [/app/example.c:75:40] _271 = _268 * 4; > [/app/example.c:75:34] _274 = array_89(D) + _271; > [/app/example.c:5:6] pretmp_277 = *_274; > goto <bb 28>; [100.00%] > > > There is a load moved inside the conditional. Is executing an unpredictable branch cheaper than executing a redundant load? This a patch against the assembly output from GCC 12.3 modifies this to use cmov: diff --git a/out.s b/out.s index d796087..f0f009c 100644 --- a/out.s +++ b/out.s @@ -317,15 +317,11 @@ custom_binary_search_fast: cmovle %ecx, %edx .L43: leal 1(%rdx), %ecx - movq %rcx, %rax - movl (%rdi,%rcx,4), %ecx - cmpl %esi, %ecx - jle .L15 + cmpl %esi, (%rdi,%rcx,4) + cmovle %ecx, %edx .L25: movl %edx, %eax movl (%rdi,%rax,4), %ecx - movl %edx, %eax -.L15: cmpl %ecx, %esi setl %cl setg %dl Micro-benchmarking the two suggests that the answer is yes on Zen 3, although I do not understand why.