https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #33 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #32)
> get_order is a wrapper around ffs64.  This can be implemented w/o asm
> statement as follows:
> int
> my_fls64 (__u64 x)
> {
>   if (!x)
>       return 0;
>   return 64 - __builtin_clzl (x);
> }
> 
> This results in longer assembly than the kernel asm implementation. If
> that matters I would replace builtin_constnat_p part of get_order by this
> implementation that is more transparent to the code size estimation and
> things will get inlined.

Better __builtin_clzll so that it works also on 32-bit arches.
Anyway, if kernel's fls64 results in better code than the my_fls64, we should
look at GCC's code generation for that case.

And, perhaps kernel's const_ilog2 should be reimplemented using __builtin_clz*?
Or, maybe even better, keep const_ilog2 as is because as it is declared it
should be usable even in pedantic C constant expressions, and just change ilog2
to:
#define ilog2(n) \
( \
        __builtin_constant_p(n) ?       \
        ((n) < 2 ? 0 : 63 - __builtin_clzll (n)) : \
        (sizeof(n) <= 4) ?              \
        __ilog2_u32(n) :                \
        __ilog2_u64(n)                  \
 )

Reply via email to