On 02/12/2025 8:33 am, Jan Beulich wrote:
> On 01.12.2025 17:36, Andrew Cooper wrote:
>> On 01/12/2025 9:02 am, Jan Beulich wrote:
>>> On 28.11.2025 18:47, Andrew Cooper wrote:
>>>> While we do this for unknown user mode exits, crashing for supervisor mode
>>>> exits is unhelpful.  Intel in particular expect the unknown case to be #UD
>>>> because they do introduce new instructions with new VMEXIT_* codes without
>>>> other enablement controls.  e.g. MSRLIST, USER_MSR, MSR_IMM, but AMD have
>>>> RDPRU and SKINIT as examples too.
>>> USER-MSR has MSR_USER_MSR_CTL, so doesn't fully fit here? (It's still not us
>>> to directly control exposure of the insns, but an OS would need to use the
>>> MSR to do so, and hence we need to properly handle writes to that MSR for
>>> the respective exits to become possible.)
>> Oh yes, and the #GP from failing the MSR_USER_MSR_CTL check take
>> priority over the intercept.
>>
>>> MSRLIST has a dedicated exec control, so whether the exits can occur is
>>> under our control.
>> Ah ok.
>>
>>
>>> RDPRU and SKINIT have dedicated intercepts, without use of which the
>>> respective exit can't occur aiui.
>> Correct, but note how we intercept them unconditionally?
>>
>> MONITOR, MWAIT and SKINIT are for Xen's safety, because otherwise the
>> instructions would execute normally in guest context.
>>
>> RDPRU is to block access to the perf counters, because a guest has no
>> legitimate use for them.
>>
>> There are no enablement controls for these instructions.  They're always
>> guest-available (on capable hardware) if not intercepted.
> For our purposes, the intercept is the enable 

No - very critically not.

If we were to not intercept SKINIT, the guest could INIT the logical
processor (into a weird protected mode, rather than real mode), and
reset the DRTM PCRs in the TPM.

This is an example of a capability which *should* have had an enable
(FamFh Rev F and G and Fam10h Rev B had SVM but not SKINIT) but didn't.

It is always available in guest mode on capable hardware, and Xen *must*
intercept SKINIT to keep control of the system.

The absence of an enable should also have had a CVE but I can't locate one.

It turns out to be the same for MONITOR, which appeared in Fam10h Rev B
and has no enable.

So actually, AMD is full of examples where we ought to have a CPU
support policy...


> (i.e. we disable their use
> by injecting #UD if the intercept triggers). IOW I think those are
> slightly different in any event, in not really being "unknown". I don't
> mind their mentioning, but I think the distinction wants to at least be
> expressed somehow.
>
>>>> @@ -3083,8 +3067,13 @@ void asmlinkage svm_vmexit_handler(void)
>>>>          gprintk(XENLOG_ERR, "Unexpected vmexit: reason %#"PRIx64", "
>>>>                  "exitinfo1 %#"PRIx64", exitinfo2 %#"PRIx64"\n",
>>>>                  exit_reason, vmcb->exitinfo1, vmcb->exitinfo2);
>>>> -    crash_or_fault:
>>>> -        svm_crash_or_fault(v);
>>>> +        fallthrough;
>>>> +    case VMEXIT_MONITOR:
>>>> +    case VMEXIT_MWAIT:
>>>> +    case VMEXIT_SKINIT:
>>>> +    case VMEXIT_RDPRU:
>>>> +    inject_ud:
>>>> +        hvm_inject_hw_exception(X86_EXC_UD, X86_EVENT_NO_EC);
>>>>          break;
>>>>      }
>>>>  
>>> Should this be brought more in line with respective VMX code (kept) below,
>>> in never bypassing the gprintk() by any of the case labels? Basically
>>> meaning that the case labels you move could simply be dropped for the time
>>> being (or else, like the INVCPID one visible in context below, would want
>>> re-inserting a few lines earlier).
>> As said, they're all reachable by guests on capable hardware.
>>
>> I could add a /* Not implemented for guests */ if that would make it
>> clearer?
> Yes, ideally with "yet" also added - recall I've been sitting on an RDPRU
> emulator patch, awaiting you to fulfill your promise of sorting the CPUID
> side of things there.
>
>> But, we don't want the printk().  We know the labels are reachable, and
>> #UD is the right action.
> Hmm, yes, with what you have said further up I think I agree. Yet then my
> question goes the other way around: Do we want the log message for the
> (at least) two known exits in VMX, which are grouped with the default:
> label?

The preemption timer and INVPCID?

Those have explicit enables and are either not configured (preemption
timer, and INVPCID with Shadow), or enabled and not intercepted (INVPCID
on capable hardware and with HAP).

Getting the VMExit is either a configuration error in Xen, or a
hardware/VMM violation of the VT-x spec.

I suppose we probably want to make those into explicit domain crashes.

> IOW I'm still puzzled by the asymmetry between SVM and VMX in this regard.

VT-x and SVM are different in almost every regard.  I find it surprising
how well they mange to approximate a working system given the differences.

~Andrew

Reply via email to