Re: vcpumask_to_pcpumask() case study

Andrew Cooper Mon, 10 Jun 2024 03:12:45 -0700

On 10/06/2024 8:15 am, Jan Beulich wrote:
> On 07.06.2024 14:35, Andrew Cooper wrote:
>> On 03/06/2024 10:19 pm, Jan Beulich wrote:
>>> On 01.06.2024 20:50, Andrew Cooper wrote:
>>>> One of the followon items I had from the bitops clean-up is this:
>>>>
>>>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>>>> index 648d6dd475ba..9c3a017606ed 100644
>>>> --- a/xen/arch/x86/mm.c
>>>> +++ b/xen/arch/x86/mm.c
>>>> @@ -3425,7 +3425,7 @@ static int vcpumask_to_pcpumask(
>>>>              unsigned int cpu;
>>>>  
>>>>              vcpu_id = ffsl(vmask) - 1;
>>>> -            vmask &= ~(1UL << vcpu_id);
>>>> +            vmask &= vmask - 1;
>>>>              vcpu_id += vcpu_bias;
>>>>              if ( (vcpu_id >= d->max_vcpus) )
>>>>                  return 0;
>>>>
>>>> which yields the following improvement:
>>>>
>>>>   add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-34 (-34)
>>>>   Function                                     old     new   delta
>>>>   vcpumask_to_pcpumask                         519     485     -34
>>> Nice. At the risk of getting flamed again for raising dumb questions:
>>> Considering that elsewhere "trickery" like the &= mask - 1 here were
>>> deemed not nice to have (at least wanting to be hidden behind a
>>> suitably named macro; see e.g. ISOLATE_LSB()), wouldn't __clear_bit()
>>> be suitable here too, and less at risk of being considered "trickery"?
>> __clear_bit() is even worse, because it forces the bitmap to be spilled
>> to memory.  It hopefully wont when I've given the test/set helpers the
>> same treatment as ffs/fls.
> Sorry, not directly related here: When you're saying "when I've given"
> does that mean you'd like Oleksii's "xen: introduce generic non-atomic
> test_*bit()" to not go in once at least an Arm ack has arrived?


If we weren't deep in a code freeze, I'd be insisting on some changes in
that patch.

For now, I'll settle for not introducing regressions, so it needs at
least one more spin (there's a MISRA and UB regression I spotted, but I
haven't reviewed it in detail yet).

But yes - they're going to end up rather different when I've applied all
the compile time optimisations which are available.

~Andre

Re: vcpumask_to_pcpumask() case study

Reply via email to