On 25.09.2025 14:18, Teddy Astie wrote: > Le 25/09/2025 à 12:48, Jan Beulich a écrit : >> Along with Zen2 (which doesn't expose ERMS), both families reportedly >> suffer from sub-optimal aliasing detection when deciding whether REP MOVSB >> can actually be carried out the accelerated way. Therefore we want to >> avoid its use in the common case (memset(), copy_page_hot()). > > s/memset/memcpy (memset probably uses rep stosb which is not affected IIUC)
Oops, yes. >> Reported-by: Andrew Cooper <[email protected]> >> Signed-off-by: Jan Beulich <[email protected]> >> --- >> Question is whether merely avoiding REP MOVSB (but not REP MOVSQ) is going >> to be good enough. > > This probably wants to be checked with benchmarks of rep movsb vs rep > movsq+b (current non-ERMS algorithm). If the issue also occurs with rep > movsq, it may be preferable to keep rep movsb even considering this issue. Why? Then REP MOVSB is 8 times slower than REP MOVSQ. >> --- a/xen/arch/x86/cpu/amd.c >> +++ b/xen/arch/x86/cpu/amd.c >> @@ -1386,6 +1386,10 @@ static void cf_check init_amd(struct cpu >> >> check_syscfg_dram_mod_en(); >> >> + if (c == &boot_cpu_data && cpu_has(c, X86_FEATURE_ERMS) >> + && c->family != 0x19 /* Zen3/4 */) >> + setup_force_cpu_cap(X86_FEATURE_XEN_REP_MOVSB); >> + > > May it be fixed through a (future ?) microcode update, especially since > rep movs is microcoded on these archs ? I don't know, but I also don't expect that to happen. Jan
