On Thu, Jun 19, 2014 at 10:36:18AM +0200, Paolo Bonzini wrote: > Il 22/12/2013 12:32, Aurelien Jarno ha scritto: > >find_first_bit has started to be used heavily in TCG code. The current > >implementation based on find_next_bit is not optimal and can't be > >optimized be the compiler if the bit array has a fixed size, which is > >the case most of the time. > > If you mean by fully unrolling the loop, that's right. > > However... > > >- return find_next_bit(addr, size, 0); > >+ unsigned long result, tmp; > >+ > >+ for (result = 0; result < size; result += BITS_PER_LONG) { > >+ tmp = *addr++; > >+ if (tmp) { > >+ result += ctzl(tmp); > >+ return result < size ? result : size; > >+ } > >+ } > >+ /* Not found */ > >+ return size; > > } > > > > /** > > > > ... you probably want to limit this to bitmaps that are of constant > size, and small enough that the compiler will unroll them. > > So it probably would be a good idea to add an > > if (!__builtin_constant_p(size) || size > 8 * BITS_PER_LONG) > return find_next_bit(addr, size, 0); > > Not urgent since TCG is the only user of find_first_bit right now, > but worth considering.
In practice on x86_64, this function takes 27 instructions in the general case, and 18 instructions in the fixed case, even for big sizes. I therefore think that checking if the size is constant is a good idea, but we should not make any test on the size itself and trust the compiler to correctly decide if the loop should be unrolled or not. I will send a patch about that in the next days. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net