There's no need to subtract "bit" in what is passed to __scanbit(), as the
other bits are zero anyway after the shift (in __find_next_bit()) or can
be made so (in __find_next_zero_bit()) by flipping negation and shift. (We
actually leverage the same facts in find_next{,_zero}_bit() as well.) This
way in __scanbit() the TZCNT alternative can be engaged.Signed-off-by: Jan Beulich <[email protected]> --- Register allocation (and hence effects on code size of the change here) is pretty "interesting". The compiler doesn't look to realize that while for 64-bit insns it doesn't matter which GPR is used (a REX prefix is needed anyway), 32-bit insns can be helped by preferring the low 8 GPRs. (Granted the inline assembly in __scanbit() may also be a little difficult to deal with.) --- a/xen/arch/x86/bitops.c +++ b/xen/arch/x86/bitops.c @@ -35,8 +35,8 @@ unsigned int __find_next_bit( if ( bit != 0 ) { /* Look for a bit in the first word. */ - set = __scanbit(*p >> bit, BITS_PER_LONG - bit); - if ( set < (BITS_PER_LONG - bit) ) + set = __scanbit(*p >> bit, BITS_PER_LONG); + if ( set < BITS_PER_LONG ) return (offset + set); offset += BITS_PER_LONG - bit; p++; @@ -85,8 +85,8 @@ unsigned int __find_next_zero_bit( if ( bit != 0 ) { /* Look for zero in the first word. */ - set = __scanbit(~(*p >> bit), BITS_PER_LONG - bit); - if ( set < (BITS_PER_LONG - bit) ) + set = __scanbit(~*p >> bit, BITS_PER_LONG); + if ( set < BITS_PER_LONG ) return (offset + set); offset += BITS_PER_LONG - bit; p++;
