On Wed, 25 Aug 2021 11:39:40 -0300 Daniel Henrique Barboza <[email protected]> wrote:
> The pSeries machine will support a new NUMA affinity form, FORM2. > This new FORM will be negotiated via ibm,architecture-vec5 during > CAS. All artifacts and assumptions that are currently on use for > FORM1 affinity will be deprecated in a guest that chooses to use > FORM2. This means that we're going to wait until CAS to determine > whether we're going to use FORM1 and FORM2. > > This patch does that by moving all NUMA data init functions to post-CAS > time. spapr_numa_associativity_init() is moved from spapr_machine_init() > to do_client_architecture_support(). Straightforward change since the > initialization of spapr->numa_assoc_array is transparent to the guest. > > spapr_numa_write_rtas_dt() is more complex. The function is called from > spapr_dt_rtas(), which in turned is called by spapr_build_fdt(). It seems some other functions like spapr_numa_write_associativity_dt() or spapr_numa_get_vcpu_assoc() could also be affected by the delayed call to spapr_numa_associativity_init(). > spapr_build_fdt() is called in two places: spapr_machine_reset() and > do_client_architecture_support(). This means that we're writing RTAS > nodes with NUMA artifacts without going through CAS first, and then > writing it again post CAS. This is not an issue because, at this moment, > we always write the same FORM1 NUMA affinity properties in the DT. > With the upcoming FORM2 support, we're now reliant on guest choice to > decide what to write. > > The proposed solution is to change spapr_numa_write_rtas_dt() to not > write the DT until we're post-CAS. This is a benign guest visible change > (a well behaved guest wouldn't bother to read NUMA properties before CAs), > but to be on the safe side, let's wrap it with a machine class flag to skip > this logic unless we're running with the latest machine type. This also > means that FORM2 support will not be available for older machine types. > > Signed-off-by: Daniel Henrique Barboza <[email protected]> > --- > hw/ppc/spapr.c | 6 +++--- > hw/ppc/spapr_hcall.c | 4 ++++ > hw/ppc/spapr_numa.c | 11 +++++++++++ > include/hw/ppc/spapr.h | 1 + > 4 files changed, 19 insertions(+), 3 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 5459f9a7e9..d8363bda88 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2809,9 +2809,6 @@ static void spapr_machine_init(MachineState *machine) > > spapr->gpu_numa_id = spapr_numa_initial_nvgpu_numa_id(machine); > > - /* Init numa_assoc_array */ > - spapr_numa_associativity_init(spapr, machine); > - > if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) && > ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, 0, > spapr->max_compat_pvr)) { > @@ -4709,8 +4706,11 @@ DEFINE_SPAPR_MACHINE(6_1, "6.1", true); > */ > static void spapr_machine_6_0_class_options(MachineClass *mc) > { > + SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc); > + > spapr_machine_6_1_class_options(mc); > compat_props_add(mc->compat_props, hw_compat_6_0, hw_compat_6_0_len); > + smc->pre_6_1_numa_affinity = true; Versions should be bumped now that 6.1 is released. > } > > DEFINE_SPAPR_MACHINE(6_0, "6.0", false); > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c > index 0e9a5b2e40..1cc88716c1 100644 > --- a/hw/ppc/spapr_hcall.c > +++ b/hw/ppc/spapr_hcall.c > @@ -11,6 +11,7 @@ > #include "helper_regs.h" > #include "hw/ppc/spapr.h" > #include "hw/ppc/spapr_cpu_core.h" > +#include "hw/ppc/spapr_numa.h" > #include "mmu-hash64.h" > #include "cpu-models.h" > #include "trace.h" > @@ -1197,6 +1198,9 @@ target_ulong do_client_architecture_support(PowerPCCPU > *cpu, > spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00); > spapr_ovec_cleanup(ov1_guest); > > + /* Init numa_assoc_array */ > + spapr_numa_associativity_init(spapr, MACHINE(spapr)); > + > /* > * Ensure the guest asks for an interrupt mode we support; > * otherwise terminate the boot. > diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c > index 04a86f9b5b..b0bd056546 100644 > --- a/hw/ppc/spapr_numa.c > +++ b/hw/ppc/spapr_numa.c > @@ -379,6 +379,17 @@ static void > spapr_numa_FORM1_write_rtas_dt(SpaprMachineState *spapr, > */ > void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas) > { > + SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); > + > + /* > + * pre-6.1 machine types were writing RTAS information before pre-6.2 > + * CAS. Check if that's case before changing their existing > + * behavior. > + */ > + if (spapr_ovec_empty(spapr->ov5_cas) && !smc->pre_6_1_numa_affinity) { Checking emptiness of spapr->ov5_cas is a hack since the guest could have cleared all the bits... I'm not a big fan of checks for pre/post CAS actually even if I had to add one for memory hot unplug support some time back. I'm not sure about the motivation for this patch. Is it *just* to avoid the supposedly useless generation of FORM1 properties in the initial DT ? If yes, this isn't a hot path so I don't think it's worth the pain. We can start with FORM1 and switch to FORM2 if the guest requests so during CAS, no ? > + return; > + } > + > spapr_numa_FORM1_write_rtas_dt(spapr, fdt, rtas); > } > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > index 637652ad16..068a29535a 100644 > --- a/include/hw/ppc/spapr.h > +++ b/include/hw/ppc/spapr.h > @@ -145,6 +145,7 @@ struct SpaprMachineClass { > hwaddr rma_limit; /* clamp the RMA to this size */ > bool pre_5_1_assoc_refpoints; > bool pre_5_2_numa_associativity; > + bool pre_6_1_numa_affinity; > > bool (*phb_placement)(SpaprMachineState *spapr, uint32_t index, > uint64_t *buid, hwaddr *pio,
