On 26.03.2025 10:34, Andrew Cooper wrote:
> On 26/03/2025 9:20 am, Jan Beulich wrote:
>> On 25.03.2025 19:52, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/include/asm/bitops.h
>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>> @@ -488,10 +488,16 @@ static always_inline unsigned int
>>> arch_hweightl(unsigned long x)
>>> *
>>> * This limits the POPCNT instruction to using the same ABI as a
>>> function
>>> * call (input in %rdi, output in %eax) but that's fine.
>>> + *
>>> + * On Intel CPUs prior to Cannon Lake, the POPCNT instruction has a
>>> false
>>> + * input dependency on it's destination register (errata HSD146, SKL029
>>> + * amongst others), impacting loops such as bitmap_weight(). Insert an
>>> + * XOR to manually break the dependency.
>>> */
>> With this being an Intel-only issue, wouldn't we better ...
>>
>>> alternative_io("call arch_generic_hweightl",
>>> + "xor %k[res], %k[res]\n\t"
>> ... put this line in #ifdef CONFIG_INTEL then? The extra overhead is small,
>> but
>> I see no reason not to avoid it if we can. (I realize that's not quite as
>> straightforward as it reads, for alternative_io() being a macro.)
>
> For an XOR, no; not worth the complexity.
>
> Besides, this gets used a whole 5 locations in the hypervisor, after I
> cleaned up the paths which shouldn't have been using hweight() in the
> first place.
Well, okay with me then:
Acked-by: Jan Beulich <[email protected]>
Jan