PV: issue branch prediction barrier when switching 64-bit guest to kernel mode

Roger Pau Monné Tue, 19 Dec 2023 07:11:27 -0800

On Tue, Dec 19, 2023 at 03:06:50PM +0100, Jan Beulich wrote:
> On 19.12.2023 12:48, Roger Pau Monné wrote:
> > On Tue, Dec 19, 2023 at 10:56:16AM +0100, Jan Beulich wrote:
> >> On 18.12.2023 18:24, Roger Pau Monné wrote:
> >>> On Tue, Feb 14, 2023 at 05:12:08PM +0100, Jan Beulich wrote:
> >>>> --- a/xen/arch/x86/pv/domain.c
> >>>> +++ b/xen/arch/x86/pv/domain.c
> >>>> @@ -455,6 +455,7 @@ static void _toggle_guest_pt(struct vcpu
> >>>>  void toggle_guest_mode(struct vcpu *v)
> >>>>  {
> >>>>      const struct domain *d = v->domain;
> >>>> +    struct cpu_info *cpu_info = get_cpu_info();
> >>>>      unsigned long gs_base;
> >>>>  
> >>>>      ASSERT(!is_pv_32bit_vcpu(v));
> >>>> @@ -467,15 +468,21 @@ void toggle_guest_mode(struct vcpu *v)
> >>>>      if ( v->arch.flags & TF_kernel_mode )
> >>>>          v->arch.pv.gs_base_kernel = gs_base;
> >>>>      else
> >>>> +    {
> >>>>          v->arch.pv.gs_base_user = gs_base;
> >>>> +
> >>>> +        if ( opt_ibpb_mode_switch &&
> >>>> +             !(d->arch.spec_ctrl_flags & SCF_entry_ibpb) &&
> >>>> +             !VM_ASSIST(d, mode_switch_no_ibpb) )
> >>>> +            cpu_info->spec_ctrl_flags |= SCF_new_pred_ctxt;
> >>>
> >>> Likewise similar to the remarks I've made before, if doing an IBPB on
> >>> entry is enough to cover for the case here, it must also be fine to
> >>> issue the IBPB right here, instead of deferring to return to guest
> >>> context?
> >>>
> >>> The only concern would be (as you mentioned before) to avoid clearing
> >>> valid Xen predictions, but I would rather see some figures about what
> >>> effect the delaying to return to guest has vs issuing it right here.
> >>
> >> Part of the reason (aiui) to do things on the exit path was to
> >> consolidate the context switch induced one and the user->kernel switch
> >> one into the same place and mechanism.
> > 
> > Isn't it kind of a very specific case that we end up doing a
> > user->kernel switch as part of a context switch?  IOW: would require
> > the vCPU to be scheduled out at that very specific point.
> 
> No, there's no user->kernel switch at the same time as context switch.
> What I was trying to explain is that with the actual IBPB being issued
> on exit to guest, both the context switch path and the user->kernel
> mode switch path set the same indicator, for the exit path to consume.


Deferring to exit to guest path could be OK, but unless strictly
needed, which I don't think it's the case, I would request for IBPB to
be executed in C context rather than assembly one.

> >>>> + *
> >>>> + * By default (on affected and capable hardware) as a safety measure 
> >>>> Xen,
> >>>> + * to cover for the fact that guest-kernel and guest-user modes are both
> >>>> + * running in ring 3 (and hence share prediction context), would issue a
> >>>> + * barrier for user->kernel mode switches of PV guests.
> >>>> + */
> >>>> +#define VMASST_TYPE_mode_switch_no_ibpb  33
> >>>
> >>> Would it be possible to define the assist as
> >>> VMASST_TYPE_mode_switch_ibpb and have it on when enabled?  So that the
> >>> guest would disable it if unneeded?  IMO negated options are in
> >>> general harder to understand.
> >>
> >> Negative options aren't nice, yes, but VM assists start out as all
> >> clear.
> > 
> > Are you sure?  I see VMASST_TYPE_pae_extended_cr3 getting set in
> > dom0_construct_pv() and that makes me wonder whether other bits
> > couldn't start set also.
> > 
> > Maybe there's some restriction I'm missing, but I don't see any
> > wording in the description of the interface that states that all
> > assists are supposed to start disabled.
> 
> Well, that setting of pae_extended_cr3 is in response to the kernel's
> notes section having a respective indicator. So we still only set the
> bit in response to what the kernel's asking us to do, just that here
> we carry out the request ahead of launching the kernel.
> 
> Also consider what would happen during migration if there was a
> default-on assist: At the destination we can't know whether the
> source simply didn't know of the bit, or whether the guest elected to
> clear it.

Hm, I see, so I was indeed missing that aspect.  VM assist is passed
as a plain bitmap, and there's no signal on which assists the VM had
available on the source side if not enabled.

Thanks, Roger.

Re: [PATCH v4 4/4] x86/PV: issue branch prediction barrier when switching 64-bit guest to kernel mode

Reply via email to