https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #33 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Jan Hubicka from comment #32) > get_order is a wrapper around ffs64. This can be implemented w/o asm > statement as follows: > int > my_fls64 (__u64 x) > { > if (!x) > return 0; > return 64 - __builtin_clzl (x); > } > > This results in longer assembly than the kernel asm implementation. If > that matters I would replace builtin_constnat_p part of get_order by this > implementation that is more transparent to the code size estimation and > things will get inlined. Better __builtin_clzll so that it works also on 32-bit arches. Anyway, if kernel's fls64 results in better code than the my_fls64, we should look at GCC's code generation for that case. And, perhaps kernel's const_ilog2 should be reimplemented using __builtin_clz*? Or, maybe even better, keep const_ilog2 as is because as it is declared it should be usable even in pedantic C constant expressions, and just change ilog2 to: #define ilog2(n) \ ( \ __builtin_constant_p(n) ? \ ((n) < 2 ? 0 : 63 - __builtin_clzll (n)) : \ (sizeof(n) <= 4) ? \ __ilog2_u32(n) : \ __ilog2_u64(n) \ )