https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96633
--- Comment #2 from Alexander Monakov <amonakov at gcc dot gnu.org> --- Martin added me to CC so I assume he wants me to chime in. First of all, I find Nathan's behavior in that gcc@ thread distasteful at best (but if you ask me, such responses are simply more harm than good; link: https://lwn.net/ml/gcc/1a363f89-6f98-f583-e22a-a7fc02efb...@acm.org/ ). Next, statements like "I've determined the following is abot 12% faster" don't carry weight without details such as the CPU family, structure of the benchmark and the workload. Obviously, on input that lacks whitespace GCC's original code is faster as the initial branch is 100% predictable. Likewise, if the input was taken from /dev/random, the 12% figure is irrelevant to real-world uses of such code. What the benchmark is doing with the return value of the function also matters a lot. With that out of the way: striving to get efficient branchless code on this code is not very valuable in practice, because the caller is likely to perform a conditional branch on the result anyway. So making isWhitespace branchless simply moves the misprediction cost to the caller, making the overall code slower. (but of course such considerations are too complex for the compiler's limited brain) In general such "bitmask tests" will benefit from the BT instruction on x86 (not an extension, was in the ISA since before I was born), plus CMOV to get the right mask if it doesn't fit in a register. For 100% branchless code we want to generate code similar to: char is_ws(char c) { unsigned long long mask = 1ll<<' ' | 1<<'\t' | 1<<'\r' | 1<<'\n'; unsigned long long v = c; if (v > 32) #if 1 mask = 0; #else return 0; #endif char r; asm("bt %1, %2; setc %0" : "=r"(r) : "r"(v), "r"(mask)); return r; } movsbq %dil, %rax movl $0, %edx movabsq $4294977024, %rdi cmpq $33, %rax cmovnb %rdx, %rdi bt %rax, %rdi; setc %al ret (note we get %edx zeroing suboptimal, should have used xor %edx, %edx) This is generalizable to any input type, not just char. We even already get the "test against a mask" part of the idea right ;) Branchy testing is even cheaper with BT: void is_ws_cb(unsigned char c, void f(void)) { unsigned long long mask = 1ll<<' ' | 1<<'\t' | 1<<'\r' | 1<<'\n'; if (c <= 32 && (mask & (1ll<<c))) f(); } cmpb $32, %dil ja .L5 movabsq $4294977024, %rax btq %rdi, %rax jnc .L5 jmp *%rsi .L5: ret