https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #33 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #32)
> get_order is a wrapper around ffs64. This can be implemented w/o asm
> statement as follows:
> int
> my_fls64 (__u64 x)
> {
> if (!x)
> return 0;
> return 64 - __builtin_clzl (x);
> }
>
> This results in longer assembly than the kernel asm implementation. If
> that matters I would replace builtin_constnat_p part of get_order by this
> implementation that is more transparent to the code size estimation and
> things will get inlined.
Better __builtin_clzll so that it works also on 32-bit arches.
Anyway, if kernel's fls64 results in better code than the my_fls64, we should
look at GCC's code generation for that case.
And, perhaps kernel's const_ilog2 should be reimplemented using __builtin_clz*?
Or, maybe even better, keep const_ilog2 as is because as it is declared it
should be usable even in pedantic C constant expressions, and just change ilog2
to:
#define ilog2(n) \
( \
__builtin_constant_p(n) ? \
((n) < 2 ? 0 : 63 - __builtin_clzll (n)) : \
(sizeof(n) <= 4) ? \
__ilog2_u32(n) : \
__ilog2_u64(n) \
)