Re: vcpumask_to_pcpumask() case study

Andrew Cooper Fri, 07 Jun 2024 05:36:26 -0700

On 03/06/2024 10:19 pm, Jan Beulich wrote:
> On 01.06.2024 20:50, Andrew Cooper wrote:
>> One of the followon items I had from the bitops clean-up is this:
>>
>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>> index 648d6dd475ba..9c3a017606ed 100644
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -3425,7 +3425,7 @@ static int vcpumask_to_pcpumask(
>>              unsigned int cpu;
>>  
>>              vcpu_id = ffsl(vmask) - 1;
>> -            vmask &= ~(1UL << vcpu_id);
>> +            vmask &= vmask - 1;
>>              vcpu_id += vcpu_bias;
>>              if ( (vcpu_id >= d->max_vcpus) )
>>                  return 0;
>>
>> which yields the following improvement:
>>
>>   add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-34 (-34)
>>   Function                                     old     new   delta
>>   vcpumask_to_pcpumask                         519     485     -34
> Nice. At the risk of getting flamed again for raising dumb questions:
> Considering that elsewhere "trickery" like the &= mask - 1 here were
> deemed not nice to have (at least wanting to be hidden behind a
> suitably named macro; see e.g. ISOLATE_LSB()), wouldn't __clear_bit()
> be suitable here too, and less at risk of being considered "trickery"?


__clear_bit() is even worse, because it forces the bitmap to be spilled
to memory.  It hopefully wont when I've given the test/set helpers the
same treatment as ffs/fls.

I'm not really a fan of the bithack.  Like the others, it's completely
unintuitive unless you know it.

However, the improvements speak for themselves and in this one case, the
best chance of people recognising it is in full; hiding any part behind
a macro (of any name) obscures things further.

> But yes, that would eliminate the benefit of making the bit clearing
> independent of the ffsl() result. And personally I'm fine anyway with
> the form as suggested.
>
>> While I (the programmer) can reason the two expressions are equivalent,
>> the compiler can't,
> Why is it you think it can't? There's no further knowledge that you
> as a human need to rely on for this, afaics. If ffsl() uses the
> built-in (as it now does), the compiler has full insight into what's
> going on. It's just that compiler engineers may not deem it worth the
> effort to carry out such a special-case optimization.

On x86, it's a block of asm, not the builtin.

But even with the builtin, just because there is a builtin transforming
a common expression into efficient assembly, doesn't mean there's
semantic reasoning about the result.

https://godbolt.org/z/eah1374a1

I can't find any way to get any compiler to reason about bit in order to
avoid the shift (or rotate, for Clang).

~Andrew

Re: vcpumask_to_pcpumask() case study

Reply via email to