On Thu, 2021-01-07 at 04:38 +0200, Maxim Levitsky wrote:
> On Wed, 2021-01-06 at 10:17 -0800, Sean Christopherson wrote:
> > On Wed, Jan 06, 2021, Maxim Levitsky wrote:
> > > If migration happens while L2 entry with an injected event to L2 is 
> > > pending,
> > > we weren't including the event in the migration state and it would be
> > > lost leading to L2 hang.
> > 
> > But the injected event should still be in vmcs12 and 
> > KVM_STATE_NESTED_RUN_PENDING
> > should be set in the migration state, i.e. it should naturally be copied to
> > vmcs02 and thus (re)injected by vmx_set_nested_state().  Is 
> > nested_run_pending
> > not set?  Is the info in vmcs12 somehow lost?  Or am I off in left field...
> 
> You are completely right. 
> The injected event can be copied like that since the vmc(b|s)12 is migrated.
> 
> We can safely disregard both these two patches and the parallel two patches 
> for SVM.
> I am almost sure that the real root cause of this bug was that we 
> weren't restoring the nested run pending flag, and I even 
> happened to fix this in this patch series.
> 
> This is the trace of the bug (I removed the timestamps to make it easier to 
> read)
> 
> 
> kvm_exit:             vcpu 0 reason vmrun rip 0xffffffffa0688ffa info1 
> 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 
> 0x00000000
> kvm_nested_vmrun:     rip: 0xffffffffa0688ffa vmcb: 0x0000000103594000 nrip: 
> 0xffffffff814b3b01 int_ctl: 0x01000001 event_inj: 0x80000036 npt: on
>                                                                               
>                                                 ^^^ this is the injection
> kvm_nested_intercepts: cr_read: 0010 cr_write: 0010 excp: 00060042 
> intercepts: bc4c8027 00006e7f 00000000
> kvm_fpu:              unload
> kvm_userspace_exit:   reason KVM_EXIT_INTR (10)
> 
> ============================================================================
> migration happens here
> ============================================================================
> 
> ...
> kvm_async_pf_ready:   token 0xffffffff gva 0
> kvm_apic_accept_irq:  apicid 0 vec 243 (Fixed|edge)
> 
> kvm_nested_intr_vmexit: rip: 0x000000000000fff0
> 
> ^^^^^ this is the nested vmexit that shouldn't have happened, since nested 
> run is pending,
> and which erased the eventinj field which was migrated correctly just like 
> you say.
> 
> kvm_nested_vmexit_inject: reason: interrupt ext_inf1: 0x0000000000000000 
> ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
> ...
> 
> 
> We did notice that this vmexit had a wierd RIP and I 
> even explained this later to myself,
> that this is the default RIP which we put to vmcb, 
> and it wasn't yet updated, since it updates just prior to vm entry.
> 
> My test already survived about 170 iterations (usually it crashes after 20-40 
> iterations)
> I am leaving the stress test running all night, let see if it survives.

And after leaving it overnight, the test survived about 1000 iterations.

Thanks again!

Best regards,
        Maxim Levitstky


> 
> V2 of the patches is on the way.
> 
> Thanks again for the help!
> 
> Best regards,
>       Maxim Levitsky
> 
> >  
> > > Fix this by queueing the injected event in similar manner to how we queue
> > > interrupted injections.
> > > 
> > > This can be reproduced by running an IO intense task in L2,
> > > and repeatedly migrating the L1.
> > > 
> > > Suggested-by: Paolo Bonzini <[email protected]>
> > > Signed-off-by: Maxim Levitsky <[email protected]>
> > > ---
> > >  arch/x86/kvm/vmx/nested.c | 12 ++++++------
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index e2f26564a12de..2ea0bb14f385f 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -2355,12 +2355,12 @@ static void prepare_vmcs02_early(struct vcpu_vmx 
> > > *vmx, struct vmcs12 *vmcs12)
> > >    * Interrupt/Exception Fields
> > >    */
> > >   if (vmx->nested.nested_run_pending) {
> > > -         vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> > > -                      vmcs12->vm_entry_intr_info_field);
> > > -         vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
> > > -                      vmcs12->vm_entry_exception_error_code);
> > > -         vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
> > > -                      vmcs12->vm_entry_instruction_len);
> > > +         if ((vmcs12->vm_entry_intr_info_field & 
> > > VECTORING_INFO_VALID_MASK))
> > > +                 vmx_process_injected_event(&vmx->vcpu,
> > > +                                            
> > > vmcs12->vm_entry_intr_info_field,
> > > +                                            
> > > vmcs12->vm_entry_instruction_len,
> > > +                                            
> > > vmcs12->vm_entry_exception_error_code);
> > > +
> > >           vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
> > >                        vmcs12->guest_interruptibility_info);
> > >           vmx->loaded_vmcs->nmi_known_unmasked =
> > > -- 
> > > 2.26.2
> > > 


Reply via email to