Hi Eric, Your ffs, ffsl, ffsll modules make use of __builtin_ffs for GCC >= 3.4. But the same compiler versions also have __builtin_ffsl and __builtin_ffsll. Their use simplifies (and most certainly speeds up) the code generated by GCC. For example, on MacOS X in 64-bit mode, currently the code is
.globl _ffsll _ffsll: LFB2: pushq %rbp LCFI0: xorl %eax, %eax testq %rdi, %rdi movq %rsp, %rbp LCFI1: je L6 xorl %edx, %edx testl %edi, %edi movl %edi, %eax jne L5 .align 4,0x90 L8: shrq $32, %rdi addl $32, %edx testl %edi, %edi movl %edi, %eax je L8 L5: bsfl %eax, %eax movl $-1, %ecx cmove %ecx, %eax incl %eax addl %edx, %eax L6: leave ret LFE2: whereas with the attached patch it becomes: .globl _ffsll _ffsll: LFB2: pushq %rbp LCFI0: movq $-1, %rax bsfq %rdi, %rdi movq %rsp, %rbp LCFI1: leave cmove %rax, %rdi incq %rdi movl %edi, %eax ret LFE2: Of course, I've verified that the test suite still passes. 2011-10-13 Bruno Haible <br...@clisp.org> ffsl, ffsll: Optimize for GCC. * lib/ffsl.h (FUNC): Use GCC_BUILTIN if defined. * lib/ffsl.c (GCC_BUILTIN): New macro. * lib/ffsll.c (GCC_BUILTIN): Likewise. --- lib/ffsl.h.orig Fri Oct 14 00:47:36 2011 +++ lib/ffsl.h Fri Oct 14 00:40:34 2011 @@ -31,6 +31,9 @@ int FUNC (TYPE i) { +#if (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) && defined GCC_BUILTIN + return GCC_BUILTIN (i); +#else int result = 0; unsigned TYPE j = i; @@ -44,4 +47,5 @@ j >>= CHAR_BIT * sizeof (unsigned int); result += CHAR_BIT * sizeof (unsigned int); } +#endif } --- lib/ffsl.c.orig Fri Oct 14 00:47:36 2011 +++ lib/ffsl.c Fri Oct 14 00:40:35 2011 @@ -1,3 +1,4 @@ #define FUNC ffsl #define TYPE long int +#define GCC_BUILTIN __builtin_ffsl #include "ffsl.h" --- lib/ffsll.c.orig Fri Oct 14 00:47:36 2011 +++ lib/ffsll.c Fri Oct 14 00:40:33 2011 @@ -1,3 +1,4 @@ #define FUNC ffsll #define TYPE long long int +#define GCC_BUILTIN __builtin_ffsll #include "ffsl.h" -- In memoriam Bekir Çoban-zade <http://en.wikipedia.org/wiki/Bekir_Çoban-zade>