When counting the number of trailing zeros using ffs, it is not uncommon to want a 0-based result and a special value for the zero case. Example:
extern int ffs(int); int my_ctz(int x) { return x == 0 ? 24 : ffs(x) - 1; } The generated code (-O3 on x86-64) looks like: .LFB2: testl %edi, %edi movl $24, %eax jne .L6 rep ; ret .p2align 4,,7 .L6: bsfl %edi, %eax movl $-1, %edx <- cmove %edx, %eax <- addl $1, %eax <- subl $1, %eax <- ret Notice that the last four lines are useless. The optimal assembly code would be: bsfl %edi, %eax movl $24, %edx cmove %edx, %eax ret Tested on GCC 4.2.1 and 4.3.0 (20070613) for x86-32 and x86-64 target. When using __builtin_ctz instead in my_ctz, the code is (almost) optimal. But unfortunately there is no "ctz" posix function, while there is an "ffs" one. That's why ffs is used in "portable" code. Anyway, as a consequence of the code of __builtin_ctz being optimal, we can force the compiler to generate optimal code by defining an ffs function by hand: int my_ffs(int x) { return x == 0 ? 0 : __builtin_ctz(x) + 1; } int my_ctz(int x) { return x == 0 ? 24 : my_ffs(x) - 1; } The compiler generates for my_ctz: bsfl %edi, %edx movl $24, %eax testl %edi, %edi cmovne %edx, %eax ret So the definition of __builtin_ffs should really be modified on x86, so that it expands to the code described in my_ffs. Compiler optimizations will then apply and they will remove the useless tests and arithmetic operations. -- Summary: Code for __builtin_ffs does not benefit from compiler optimizations Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: guillaume dot melquiond at ens-lyon dot fr http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32433