On Fri, Jun 20, 2014 at 10:58:31AM +0200, Paolo Bonzini wrote: > Il 20/06/2014 10:48, Aurelien Jarno ha scritto: > >In practice on x86_64, this function takes 27 instructions in the > >general case, and 18 instructions in the fixed case, even for big > >sizes. I therefore think that checking if the size is constant is a good > >idea, but we should not make any test on the size itself and trust the > >compiler to correctly decide if the loop should be unrolled or not. > > But if the size is large enough that the compiler will (likely) not > unroll the function, then it should pay off to use the more > optimized code in find_next_bit.
The point there is that given find_next_bit is a generalized version of find_first_bit, it is actually slower. I originally noticed that by running profiling tools and noticing this function appeared relatively high for what it is supposed to do. > This of course is unless you expect find_first_bit to return a small > value and not be used in a loop; and dually expect find_next_bit's > usage to be more like walking sparser bitmaps in a loop. I think that's the point. In the TCG case, this is used to map the temp allocation to answer the question "give me a free temp". That said people might invent new usages. > This actually makes sense, and then there's no need to change anything. > > Paolo > -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net