http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46235
--- Comment #4 from Tony Poppleton <tony.poppleton at gmail dot com> 2011-01-28 18:08:15 UTC --- As a quick test, I commented out the block with the following comment in fold-const.c: /* If this is an EQ or NE comparison with zero and ARG0 is (1 << foo) & bar, convert it to (bar >> foo) & 1. Both require two operations, but the latter can be done in one less insn on machines that have only two-operand insns or on which a constant cannot be the first operand. */ This produces the following asm code: movl $1, %edx movl %edi, %eax movl %esi, %ecx movl %edx, %edi sall %cl, %edi testl %eax, %edi cmove %edx, %eax ret (using modified GCC 4.6.0 20110122) So whilst I was hoping for an easy quick-fix, it appears that the required optimization to convert it into a "btl" test isn't there later on in the compile. Incidentally, from looking at http://gmplib.org/~tege/x86-timing.pdf, it appears that "bt" is slow on P4 architecture (8 cycles if I am reading it correctly? which sounds slow), so the llvm code in the bug description isn't necessarily an optimization on this arch. Newer chips would probably still benefit though.