PV: issue branch prediction barrier when switching 64-bit guest to kernel mode

Roger Pau Monné Tue, 19 Dec 2023 03:48:34 -0800

On Tue, Dec 19, 2023 at 10:56:16AM +0100, Jan Beulich wrote:
> On 18.12.2023 18:24, Roger Pau Monné wrote:
> > On Tue, Feb 14, 2023 at 05:12:08PM +0100, Jan Beulich wrote:
> >> Since both kernel and user mode run in ring 3, they run in the same
> >> "predictor mode".
> > 
> > That only true when IBRS is enabled, otherwise all CPU modes share the
> > same predictor mode?
> 
> But here we only care about ring 3 anyway?


Hm, yes, what I wanted to say is that this is not exclusive to 64bit
PV, as with IBRS disabled ring 3 and ring 0 share the same predictor
mode.  Anyway, not relevant.

> 
> >> @@ -753,7 +755,9 @@ static inline void pv_inject_sw_interrup
> >>   * but we can't make such requests fail all of the sudden.
> >>   */
> >>  #define PV64_VM_ASSIST_MASK (PV32_VM_ASSIST_MASK                      | \
> >> -                             (1UL << VMASST_TYPE_m2p_strict))
> >> +                             (1UL << VMASST_TYPE_m2p_strict)          | \
> >> +                             ((opt_ibpb_mode_switch + 0UL) <<           \
> >> +                              VMASST_TYPE_mode_switch_no_ibpb))
> > 
> > I'm wondering that it's kind of weird to offer the option to PV domUs
> > if opt_ibpb_entry_pv is set, as then the guest mode switch will always
> > (implicitly) do a IBPB as requiring an hypercall and thus take an
> > entry point into Xen.
> > 
> > I guess it's worth having it just as a way to signal to Xen that the
> > hypervisor does perform an IBPB, even if the guest cannot disable it.
> 
> I'm afraid I'm confused by your reply. Not only, but also because the
> latter sentence looks partly backwards / non-logical to me.

Sorry, I think I didn't word that very well.  The remark is tied to
the one below about the vmassist 'possibly' allowing the guest to
disable IBPB on guest user -> kernel context switches, but Xen might
unconditionally do additional IBPBs that the guest cannot disable.

> >> --- a/xen/arch/x86/pv/domain.c
> >> +++ b/xen/arch/x86/pv/domain.c
> >> @@ -455,6 +455,7 @@ static void _toggle_guest_pt(struct vcpu
> >>  void toggle_guest_mode(struct vcpu *v)
> >>  {
> >>      const struct domain *d = v->domain;
> >> +    struct cpu_info *cpu_info = get_cpu_info();
> >>      unsigned long gs_base;
> >>  
> >>      ASSERT(!is_pv_32bit_vcpu(v));
> >> @@ -467,15 +468,21 @@ void toggle_guest_mode(struct vcpu *v)
> >>      if ( v->arch.flags & TF_kernel_mode )
> >>          v->arch.pv.gs_base_kernel = gs_base;
> >>      else
> >> +    {
> >>          v->arch.pv.gs_base_user = gs_base;
> >> +
> >> +        if ( opt_ibpb_mode_switch &&
> >> +             !(d->arch.spec_ctrl_flags & SCF_entry_ibpb) &&
> >> +             !VM_ASSIST(d, mode_switch_no_ibpb) )
> >> +            cpu_info->spec_ctrl_flags |= SCF_new_pred_ctxt;
> > 
> > Likewise similar to the remarks I've made before, if doing an IBPB on
> > entry is enough to cover for the case here, it must also be fine to
> > issue the IBPB right here, instead of deferring to return to guest
> > context?
> > 
> > The only concern would be (as you mentioned before) to avoid clearing
> > valid Xen predictions, but I would rather see some figures about what
> > effect the delaying to return to guest has vs issuing it right here.
> 
> Part of the reason (aiui) to do things on the exit path was to
> consolidate the context switch induced one and the user->kernel switch
> one into the same place and mechanism.

Isn't it kind of a very specific case that we end up doing a
user->kernel switch as part of a context switch?  IOW: would require
the vCPU to be scheduled out at that very specific point.

> >> + *
> >> + * By default (on affected and capable hardware) as a safety measure Xen,
> >> + * to cover for the fact that guest-kernel and guest-user modes are both
> >> + * running in ring 3 (and hence share prediction context), would issue a
> >> + * barrier for user->kernel mode switches of PV guests.
> >> + */
> >> +#define VMASST_TYPE_mode_switch_no_ibpb  33
> > 
> > Would it be possible to define the assist as
> > VMASST_TYPE_mode_switch_ibpb and have it on when enabled?  So that the
> > guest would disable it if unneeded?  IMO negated options are in
> > general harder to understand.
> 
> Negative options aren't nice, yes, but VM assists start out as all
> clear.

Are you sure?  I see VMASST_TYPE_pae_extended_cr3 getting set in
dom0_construct_pv() and that makes me wonder whether other bits
couldn't start set also.

Maybe there's some restriction I'm missing, but I don't see any
wording in the description of the interface that states that all
assists are supposed to start disabled.

Thanks, Roger.

Re: [PATCH v4 4/4] x86/PV: issue branch prediction barrier when switching 64-bit guest to kernel mode

Reply via email to