On 21/03/2022 21:12, Tamas K Lengyel wrote: > During VM forking and resetting a failed vmentry has been observed due > to the guest non-register state going out-of-sync with the guest register > state. For example, a VM fork reset right after a STI instruction can trigger > the failed entry. This is due to the guest non-register state not being saved > from the parent VM, thus the reset operation only copies the register state. > > Fix this by including the guest non-register state in hvm_hw_cpu so that when > its copied from the parent VM the vCPU state remains in sync. > > SVM is not currently wired-in as VM forking is VMX only and saving > non-register > state during normal save/restore/migration operation hasn't been needed. If > deemed necessary in the future it can be wired in by adding a svm-substructure > to hvm_hw_cpu.
I disagree here. This bug isn't really to do with fuzzing; it's to do with our APIs being able to get/set vCPU state correctly, and fuzzing is the example which demonstrated the breakage. This will also cause very subtle bugs for the guest-transparent migration work, and the live update work, both of which want to shift vcpu state behind a VM which hasn't actively entered Xen via hypercall. (Quick tangent. Seriously, can the SDM be fixed so it actually enumerates the Activity States.) Xen doesn't currently support any situation where Intel's idea of Activity State is anything other than 0. This in turn is buggy, because we don't encode VPF_blocked anywhere. An activity state of hlt might not be best place to encode this, because notably absent from Intel's activity state is mwait. We'll also terminate things like schedop_poll early. Next, AMD does have various fields from interruptibility. If you want me to write the patch then fine, but they should not be excluded from a patch like this. AMD do not have separate bits for STI and MovSS, due to microarchitectural differences, but the single INTERRUPT_SHADOW bit does need managing, as does the blocked-by-NMI bit which is emulated on SVM and older Intel CPUs. Minor tangent, blocked-by-SMI needs cross-linking with vm-entry controls, so I'm not sure it is something we ought to expose. Next, it turns out that MSR_DEBUGCTL isn't included anywhere (not even the MSR list). It is important, because it forms part of the VMEntry cross-check with PENDING_DBG and TF. Next, we also don't preserve PDPTRs. This is another area where Intel and AMD CPUs behave differently, but under Intel's architecture, the guest kernel can clobber the 32bit PAE block of pointers and everything will function correctly until the next mov into cr3. There are definitely bugs in Xen's emulated pagewalk in this area. So there are a lot of errors with hvm_hw_cpu and I bet I haven't found them all. As discussed at multiple previous XenSummits, the current load/save mess needs moving out of Xen, and that would be the correct time to fix the other errors, but it probably is too much for this case. At a minimum, there shouldn't be a VMX specific union, because we are talking about architecture-agnostic properties of the vCPU. It's fine for the format to follow Intel's bit layout for the subset of bits we tolerate saving/restoring, but this needs discussing in the header, and needs rejecting on set. An extra check/reject is 0% overhead for forking, so I don't find that a credible argument against doing it. I'm not sure if we want to include the activity state at the moment without a better idea of how to encode VPF_blocked, but I think DEBUGCTL does need including. This in turn involves transmitting the LBR MSRs too. ~Andrew
