On 25/09/2025 11:46 am, Jan Beulich wrote:
> Along with Zen2 (which doesn't expose ERMS), both families reportedly
> suffer from sub-optimal aliasing detection when deciding whether REP MOVSB
> can actually be carried out the accelerated way. Therefore we want to
> avoid its use in the common case (memset(), copy_page_hot()).
>
> Reported-by: Andrew Cooper <[email protected]>
> Signed-off-by: Jan Beulich <[email protected]>
> ---
> Question is whether merely avoiding REP MOVSB (but not REP MOVSQ) is going
> to be good enough.

In the problem case, MOVSQ is 8 times less bad than MOVSB, but they're
both slower than alternative algorithms.

>
> --- a/xen/arch/x86/copy_page.S
> +++ b/xen/arch/x86/copy_page.S
> @@ -57,6 +57,6 @@ END(copy_page_cold)
>          .endm
>  
>  FUNC(copy_page_hot)
> -        ALTERNATIVE copy_page_movsq, copy_page_movsb, X86_FEATURE_ERMS
> +        ALTERNATIVE copy_page_movsq, copy_page_movsb, 
> X86_FEATURE_XEN_REP_MOVSB
>          RET
>  END(copy_page_hot)

Hmm.

Overall I think this patch is an improvement.

But, for any copy_page variants, we know both pointers are 4k aligned,
so will not tickle the problem case.

This does mess with the naming of the synthetic feature.

> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -1386,6 +1386,10 @@ static void cf_check init_amd(struct cpu
>  
>       check_syscfg_dram_mod_en();
>  
> +     if (c == &boot_cpu_data && cpu_has(c, X86_FEATURE_ERMS)
> +         && c->family != 0x19 /* Zen3/4 */)

Even if this is Linux style, && on the previous line please.

~Andrew

Reply via email to