When counting the number of trailing zeros using ffs, it is not uncommon to
want a 0-based result and a special value for the zero case. Example:
extern int ffs(int);
int my_ctz(int x) { return x == 0 ? 24 : ffs(x) - 1; }
The generated code (-O3 on x86-64) looks like:
.LFB2:
testl %edi, %edi
movl $24, %eax
jne .L6
rep ; ret
.p2align 4,,7
.L6:
bsfl %edi, %eax
movl $-1, %edx <-
cmove %edx, %eax <-
addl $1, %eax <-
subl $1, %eax <-
ret
Notice that the last four lines are useless.
The optimal assembly code would be:
bsfl %edi, %eax
movl $24, %edx
cmove %edx, %eax
ret
Tested on GCC 4.2.1 and 4.3.0 (20070613) for x86-32 and x86-64 target.
When using __builtin_ctz instead in my_ctz, the code is (almost) optimal. But
unfortunately there is no "ctz" posix function, while there is an "ffs" one.
That's why ffs is used in "portable" code.
Anyway, as a consequence of the code of __builtin_ctz being optimal, we can
force the compiler to generate optimal code by defining an ffs function by
hand:
int my_ffs(int x) { return x == 0 ? 0 : __builtin_ctz(x) + 1; }
int my_ctz(int x) { return x == 0 ? 24 : my_ffs(x) - 1; }
The compiler generates for my_ctz:
bsfl %edi, %edx
movl $24, %eax
testl %edi, %edi
cmovne %edx, %eax
ret
So the definition of __builtin_ffs should really be modified on x86, so that it
expands to the code described in my_ffs. Compiler optimizations will then apply
and they will remove the useless tests and arithmetic operations.
--
Summary: Code for __builtin_ffs does not benefit from compiler
optimizations
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: guillaume dot melquiond at ens-lyon dot fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32433