On 26.03.2025 10:34, Andrew Cooper wrote:
> On 26/03/2025 9:20 am, Jan Beulich wrote:
>> On 25.03.2025 19:52, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/include/asm/bitops.h
>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>> @@ -488,10 +488,16 @@ static always_inline unsigned int 
>>> arch_hweightl(unsigned long x)
>>>       *
>>>       * This limits the POPCNT instruction to using the same ABI as a 
>>> function
>>>       * call (input in %rdi, output in %eax) but that's fine.
>>> +     *
>>> +     * On Intel CPUs prior to Cannon Lake, the POPCNT instruction has a 
>>> false
>>> +     * input dependency on it's destination register (errata HSD146, SKL029
>>> +     * amongst others), impacting loops such as bitmap_weight().  Insert an
>>> +     * XOR to manually break the dependency.
>>>       */
>> With this being an Intel-only issue, wouldn't we better ...
>>
>>>      alternative_io("call arch_generic_hweightl",
>>> +                   "xor %k[res], %k[res]\n\t"
>> ... put this line in #ifdef CONFIG_INTEL then? The extra overhead is small, 
>> but
>> I see no reason not to avoid it if we can. (I realize that's not quite as
>> straightforward as it reads, for alternative_io() being a macro.)
> 
> For an XOR, no; not worth the complexity.
> 
> Besides, this gets used a whole 5 locations in the hypervisor, after I
> cleaned up the paths which shouldn't have been using hweight() in the
> first place.

Well, okay with me then:
Acked-by: Jan Beulich <[email protected]>

Jan

Reply via email to