On 15/11/2022 13:26, Roger Pau Monne wrote:
> Since the VIRT_SPEC_CTRL.SSBD selection is no longer context switched
> on vm{entry,exit} there's no need to use a synthetic feature bit for
> it anymore.
>
> Remove the bit and instead use a global variable.
>
> No functional change intended.
>
> Signed-off-by: Roger Pau Monné <[email protected]>
> Reviewed-by: Jan Beulich <[email protected]>
> Release-acked-by: Henry Wang <[email protected]>
This is definitely not appropriate for 4.17, but it's a performance
regression in general, hence my firm and repeated objection to this
style of patch.
General synthetic bits have existed for several decades longer than
alternatives. It has never ever been a rule, or even a recommendation,
to aggressively purge the non-alternative bits, because it's a provably
bad thing to do.
You are attempting a micro-optimisation, that won't produce any
improvement at all in centuries, while...
> diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
> index a332087604..9e3b9094d3 100644
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -49,6 +49,7 @@ boolean_param("allow_unsafe", opt_allow_unsafe);
> /* Signal whether the ACPI C1E quirk is required. */
> bool __read_mostly amd_acpi_c1e_quirk;
> bool __ro_after_init amd_legacy_ssbd;
> +bool __ro_after_init amd_virt_spec_ctrl;
... actually expending .rodata with something 8 times less efficiently
packed, and ...
>
> static inline int rdmsr_amd_safe(unsigned int msr, unsigned int *lo,
> unsigned int *hi)
> diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
> index 822f9ace10..acc2f606ce 100644
> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -3,6 +3,7 @@
> #include <xen/param.h>
> #include <xen/sched.h>
> #include <xen/nospec.h>
> +#include <asm/amd.h>
... (Specific to this instance) making life harder for the people trying
to make CONFIG_AMD work, and ...
> diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
> index 4e53056624..0b94af6b86 100644
> --- a/xen/arch/x86/spec_ctrl.c
> +++ b/xen/arch/x86/spec_ctrl.c
> @@ -514,12 +514,12 @@ static void __init print_details(enum ind_thunk thunk,
> uint64_t caps)
> (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
> boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
> boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
> - boot_cpu_has(X86_FEATURE_VIRT_SC_MSR_HVM) ||
> + amd_virt_spec_ctrl ||
... breaking apart a single TEST instruction, which not only adds an
extra conditional merge, but now hits an cold-ish cache line everywhere
it's used.
Count how many synthetic feature bits it will actually take to change
the per-cpu data size, and realise that, when it will take more than 200
years at the current rate of accumulation, any believe that this is an
improvement to be had disappears.
Yes, it is only a micro regression, but you need a far better
justification than "there is a gain of 64 bytes per CPU which will be
non-theoretical in more than 200 years" when traded off vs the actual
512 bytes, plus extra code bloat bloat, plus reduced locality of data
that this "improvement" genuinely costs today.
~Andrew