Excerpts from Cédric Le Goater's message of February 16, 2022 8:52 pm:
> On 2/16/22 11:25, Nicholas Piggin wrote:
>> This implements the Nested KVM HV hcall API for spapr under TCG.
>>
>> The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
>> L1 is switched back in returned from the hcall when a HV exception
>> is sent to the vhyp. Register state is copied in and out according to
>> the nested KVM HV hcall API specification.
>>
>> The hdecr timer is started when the L2 is switched in, and it provides
>> the HDEC / 0x980 return to L1.
>>
>> The MMU re-uses the bare metal radix 2-level page table walker by
>> using the get_pate method to point the MMU to the nested partition
>> table entry. MMU faults due to partition scope errors raise HV
>> exceptions and accordingly are routed back to the L1.
>>
>> The MMU does not tag translations for the L1 (direct) vs L2 (nested)
>> guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
>> and exit).>
>> Reviewed-by: Fabiano Rosas <[email protected]>
>> Signed-off-by: Nicholas Piggin <[email protected]>
>
> Reviewed-by: Cédric Le Goater <[email protected]>
>
> Some last comments below,
[...]
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index edbf3eeed0..852fe61b36 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -199,6 +199,9 @@ struct SpaprMachineState {
>> bool has_graphics;
>> uint32_t vsmt; /* Virtual SMT mode (KVM's "core stride") */
>>
>> + /* Nested HV support (TCG only) */
>> + uint64_t nested_ptcr;
>> +
>
> this is new state to migrate.
>
[...]
>> +/* Linux 64-bit powerpc pt_regs struct, used by nested HV */
>> +struct kvmppc_pt_regs {
>> + uint64_t gpr[32];
>> + uint64_t nip;
>> + uint64_t msr;
>> + uint64_t orig_gpr3; /* Used for restarting system calls */
>> + uint64_t ctr;
>> + uint64_t link;
>> + uint64_t xer;
>> + uint64_t ccr;
>> + uint64_t softe; /* Soft enabled/disabled */
>> + uint64_t trap; /* Reason for being here */
>> + uint64_t dar; /* Fault registers */
>> + uint64_t dsisr; /* on 4xx/Book-E used for ESR */
>> + uint64_t result; /* Result of a system call */
>> +};
>
> I think we need to start moving all the spapr hcall definitions under
> spapr_hcall.h. It can come later.
Sure.
[...]
>> diff --git a/include/hw/ppc/spapr_cpu_core.h
>> b/include/hw/ppc/spapr_cpu_core.h
>> index dab3dfc76c..b560514560 100644
>> --- a/include/hw/ppc/spapr_cpu_core.h
>> +++ b/include/hw/ppc/spapr_cpu_core.h
>> @@ -48,6 +48,11 @@ typedef struct SpaprCpuState {
>> bool prod; /* not migrated, only used to improve dispatch latencies */
>> struct ICPState *icp;
>> struct XiveTCTX *tctx;
>> +
>> + /* Fields for nested-HV support */
>> + bool in_nested; /* true while the L2 is executing */
>> + CPUPPCState *nested_host_state; /* holds the L1 state while L2 executes
>> */
>> + int64_t nested_tb_offset; /* L1->L2 TB offset */
>
> This needs a new vmstate.
How about instead of the vmstate (we would need all the L1 state in
nested_host_state as well), we just add a migration blocker in the
L2 entry path. We could limit the max hdecr to say 1 second to
ensure it unblocks before long.
I know migration blockers are not preferred but in this case it gives
us some iterations to debug and optimise first, which might change
the data to migrate.
Thanks,
Nick