On 21.10.2022 23:54, Andrew Cooper wrote:
> On 20/10/2022 12:01, Roger Pau Monné wrote:
>> Hello,
>>
>> As part of some follow up improvements to my VIRT_SPEC_CTRL series we
>> have been discussing what the usage of SSBD should be for the
>> hypervisor itself.  There's currently a `spec-ctrl=ssbd` option [0],
>> that has an out of date description, as now SSBD is always offered to
>> guests on AMD hardware, either using SPEC_CTRL or VIRT_SPEC_CTRL.
>>
>> It has been pointed out by Andrew that toggling SSBD on AMD using
>> VIRT_SPEC_CTRL or the non-architectural way (MSR_AMD64_LS_CFG) can
>> have a high impact on performance, and hence switching it on every
>> guest <-> hypervisor context switch is likely a very high
>> performance penalty.
>>
>> It's been suggested that it could be more appropriate to run Xen with
>> the guest SSBD selection on those systems, however that clashes with
>> the current intent of the `spec-ctrl=ssbd` option.
>>
>> I hope I have captured the expressed opinions correctly in the text
>> above.
>>
>> I see two ways to solve this:
>>
>>  * Keep the current logic for switching SSBD on guest <-> hypervisor
>>    context switch, but only use it if `spec-ctrl=ssbd` is set on the
>>    command line.
>>
>>  * Remove the logic for switching SSBD on guest <-> hypervisor context
>>    switch, ignore setting of `spec-ctrl=ssbd` on those systems and run
>>    hypervisor code with the guest selection of SSBD.
>>
>> Which has raised me the question of whether there's an use case
>> for always running hypervisor code with SSBD enabled, or that's no
>> longer relevant if we always offer guests a way for them to toggle the
>> setting when required.
>>
>> I would like to settle on a way forward, so we can get this fixed
>> before 4.17.
>>
>> Thanks, Roger.
>>
>> [0] 
>> https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#spec-ctrl-x86
> 
> There are many issues at play here.  Not least that virt spec ctrl is
> technically a leftover task that ought to force a re-issue of XSA-263.
> 
> Accessing MSRs (even reading) is very expensive, typically >1k cycles. 
> The core CFG registers are more expensive than most, because they're
> intended to be configured once after reset and then left alone.
> 
> Throughout the speculation work, we've seen crippling performance hits
> from accessing MSRs in fastpaths.  The fact we're forced to use MSRs in
> fastpaths even on new CPUs with built in (rather than retrofitted)
> speculation support is is an area of concern still being worked on with
> the CPU vendors.
> 
> Case in point.  We found for XSA-398 that toggling AMD's
> MSR_SPEC_CTRL.IBRS on the PV entrypath was so bad that setting it
> unilaterally behind the back of PV guests was the faster option. 
> (Another todo is to stop doing this on Intel eIBRS systems, and this
> will recover us a decent chunk of performance.)
> 
> 
> SSBD mitigations are (rightly or wrongly) off by default for performance
> reasons.  AMD are less affected than Intel, for microarchitectural
> reasons which are discussed in relevant whitepapers, and which are
> expected to remain true for future CPUs.
> 
> When Xen doesn't care about the protecting itself against SSBD by
> default, I guarantee you that it will be faster to omit the MSR accesses
> and run in the guest kernel's choice, than to clear the SSBD
> protection.  We simply don't spend long enough in the hypervisor for the
> hit against memory accesses to dwarf the hit for MSR accesses taken on
> entry/exit.
> 
> The reason we put in spec-ctrl=ssbd was as a stopgap, because at the
> time we didn't know how bad SSB really was, and it was decided that the
> admin should have a big hammer to use if they really needed.
> 
> When Xen does care about protecting itself, the above reasoning bites
> back hard.  Because we spend (or should be spending!) >99% of time in
> the guest, the hit to memory accesses is far more likely to be able
> dwarf the hit from the MSR accesses, but now, the dominating factor for
> performance is the vmexit rate.
> 
> The problem is that if you've got a completely compute bound workload,
> there are very few exits, while if you've got an IO bound workload,
> there are plenty of exits.  I honestly don't know if it will be more
> efficient to leave SSBD active unilaterally (whether or not we hide
> this, e.g. synthesizing SSB_NO), or to let the guest run with it kernels
> choice.  I suspect the answer is different with different workloads.
> 
> 
> But, one other factor helps us.  Given that the default is fast (rather
> than secure), anyone opting in to spec-ctrl=ssbd is saying "I care more
> about security than performance", at which point we can simplify what we
> do because we don't need to cater to everyone.
> 
> 
> As a slight tangent, there is a cost to having too many options, which
> must not be ignored.  Xen's speculation safety is far too complicated
> already and needs to get more simple; this has a material impact on how
> easy it is to follow, and how easy it to make changes.
> 
> It is the way it is because we've had 6 years of drip feeding one
> problem after another, and haven't had the time to take a step and
> design something more sensible from having 6 years of
> knowledge/learnings as a basis.  There are definitely things which I
> would have done differently, if 6 years ago, I'd known what I know now,
> and part of the reason why the recent speculation security work has
> taken so much effort is because it has involved reworking the effort
> which came before, to a deadline which never has enough time to plan
> properly within.
> 
> 
> So, first question, do we care about having an "SSBD active while in
> Xen" mode?
> 
> Probably yes, because we a) still don't have a working solution for PV
> guests on AMD and b) who knows if there's something far worse lurking in
> the future.  Sods law says that if we decide no here, it will be
> critical for some future issue.
> 
> But as it's off by default and noone's made has made any noise about
> having it on, we ought to prioritise simplicity.
> 
> Given that off is the default, but we know that kernels do offer it to
> userspace, and it does get used by certain processes, we need to
> prioritise performance.  And here, this is net system performance, not
> "ensure it's off whenever it can be".  Having Xen run in the guest
> kernel's choice of value will result in much better overall performance,
> than trying to modify the setting in the VMentry/exit path.

My takeaway from this reply of yours is: By default run with the guest's
choice, while (I'm less certain here) you're undecided about the behavior
with "spec-ctrl=ssbd". Please could you make explicit whether this is a
correct understanding of mine?

Jan

> Sorry that this is a very long and somewhat open ended answer, but it is
> genuinely the level of complexity I grapple with on every security issue
> in this area.
> 
> ~Andrew


Reply via email to