There's no need to subtract "bit" in what is passed to __scanbit(), as the
other bits are zero anyway after the shift (in __find_next_bit()) or can
be made so (in __find_next_zero_bit()) by flipping negation and shift. (We
actually leverage the same facts in find_next{,_zero}_bit() as well.) This
way in __scanbit() the TZCNT alternative can be engaged.

Signed-off-by: Jan Beulich <[email protected]>
---
Register allocation (and hence effects on code size of the change here)
is pretty "interesting". The compiler doesn't look to realize that while
for 64-bit insns it doesn't matter which GPR is used (a REX prefix is
needed anyway), 32-bit insns can be helped by preferring the low 8 GPRs.
(Granted the inline assembly in __scanbit() may also be a little difficult
to deal with.)

--- a/xen/arch/x86/bitops.c
+++ b/xen/arch/x86/bitops.c
@@ -35,8 +35,8 @@ unsigned int __find_next_bit(
     if ( bit != 0 )
     {
         /* Look for a bit in the first word. */
-        set = __scanbit(*p >> bit, BITS_PER_LONG - bit);
-        if ( set < (BITS_PER_LONG - bit) )
+        set = __scanbit(*p >> bit, BITS_PER_LONG);
+        if ( set < BITS_PER_LONG )
             return (offset + set);
         offset += BITS_PER_LONG - bit;
         p++;
@@ -85,8 +85,8 @@ unsigned int __find_next_zero_bit(
     if ( bit != 0 )
     {
         /* Look for zero in the first word. */
-        set = __scanbit(~(*p >> bit), BITS_PER_LONG - bit);
-        if ( set < (BITS_PER_LONG - bit) )
+        set = __scanbit(~*p >> bit, BITS_PER_LONG);
+        if ( set < BITS_PER_LONG )
             return (offset + set);
         offset += BITS_PER_LONG - bit;
         p++;

Reply via email to