On 10/06/2024 8:15 am, Jan Beulich wrote: > On 07.06.2024 14:35, Andrew Cooper wrote: >> On 03/06/2024 10:19 pm, Jan Beulich wrote: >>> On 01.06.2024 20:50, Andrew Cooper wrote: >>>> One of the followon items I had from the bitops clean-up is this: >>>> >>>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c >>>> index 648d6dd475ba..9c3a017606ed 100644 >>>> --- a/xen/arch/x86/mm.c >>>> +++ b/xen/arch/x86/mm.c >>>> @@ -3425,7 +3425,7 @@ static int vcpumask_to_pcpumask( >>>> unsigned int cpu; >>>> >>>> vcpu_id = ffsl(vmask) - 1; >>>> - vmask &= ~(1UL << vcpu_id); >>>> + vmask &= vmask - 1; >>>> vcpu_id += vcpu_bias; >>>> if ( (vcpu_id >= d->max_vcpus) ) >>>> return 0; >>>> >>>> which yields the following improvement: >>>> >>>> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-34 (-34) >>>> Function old new delta >>>> vcpumask_to_pcpumask 519 485 -34 >>> Nice. At the risk of getting flamed again for raising dumb questions: >>> Considering that elsewhere "trickery" like the &= mask - 1 here were >>> deemed not nice to have (at least wanting to be hidden behind a >>> suitably named macro; see e.g. ISOLATE_LSB()), wouldn't __clear_bit() >>> be suitable here too, and less at risk of being considered "trickery"? >> __clear_bit() is even worse, because it forces the bitmap to be spilled >> to memory. It hopefully wont when I've given the test/set helpers the >> same treatment as ffs/fls. > Sorry, not directly related here: When you're saying "when I've given" > does that mean you'd like Oleksii's "xen: introduce generic non-atomic > test_*bit()" to not go in once at least an Arm ack has arrived?
If we weren't deep in a code freeze, I'd be insisting on some changes in that patch. For now, I'll settle for not introducing regressions, so it needs at least one more spin (there's a MISRA and UB regression I spotted, but I haven't reviewed it in detail yet). But yes - they're going to end up rather different when I've applied all the compile time optimisations which are available. ~Andre
