Re: [Xen-devel] Ongoing/future speculative mitigation work

George Dunlap Mon, 10 Dec 2018 04:20:36 -0800

On 12/10/18 12:12 PM, George Dunlap wrote:
> On 12/7/18 6:40 PM, Wei Liu wrote:
>> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
>>> Hello,
>>>
>>> This is an accumulation and summary of various tasks which have been
>>> discussed since the revelation of the speculative security issues in
>>> January, and also an invitation to discuss alternative ideas.  They are
>>> x86 specific, but a lot of the principles are architecture-agnostic.
>>>
>>> 1) A secrets-free hypervisor.
>>>
>>> Basically every hypercall can be (ab)used by a guest, and used as an
>>> arbitrary cache-load gadget.  Logically, this is the first half of a
>>> Spectre SP1 gadget, and is usually the first stepping stone to
>>> exploiting one of the speculative sidechannels.
>>>
>>> Short of compiling Xen with LLVM's Speculative Load Hardening (which is
>>> still experimental, and comes with a ~30% perf hit in the common case),
>>> this is unavoidable.  Furthermore, throwing a few array_index_nospec()
>>> into the code isn't a viable solution to the problem.
>>>
>>> An alternative option is to have less data mapped into Xen's virtual
>>> address space - if a piece of memory isn't mapped, it can't be loaded
>>> into the cache.
>>>
>>> An easy first step here is to remove Xen's directmap, which will mean
>>> that guests general RAM isn't mapped by default into Xen's address
>>> space.  This will come with some performance hit, as the
>>> map_domain_page() infrastructure will now have to actually
>>> create/destroy mappings, but removing the directmap will cause an
>>> improvement for non-speculative security as well (No possibility of
>>> ret2dir as an exploit technique).
>>>
>>> Beyond the directmap, there are plenty of other interesting secrets in
>>> the Xen heap and other mappings, such as the stacks of the other pcpus. 
>>> Fixing this requires moving Xen to having a non-uniform memory layout,
>>> and this is much harder to change.  I already experimented with this as
>>> a meltdown mitigation around about a year ago, and posted the resulting
>>> series on Jan 4th,
>>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html,
>>> some trivial bits of which have already found their way upstream.
>>>
>>> To have a non-uniform memory layout, Xen may not share L4 pagetables. 
>>> i.e. Xen must never have two pcpus which reference the same pagetable in
>>> %cr3.
>>>
>>> This property already holds for 32bit PV guests, and all HVM guests, but
>>> 64bit PV guests are the sticking point.  Because Linux has a flat memory
>>> layout, when a 64bit PV guest schedules two threads from the same
>>> process on separate vcpus, those two vcpus have the same virtual %cr3,
>>> and currently, Xen programs the same real %cr3 into hardware.
>>>
>>> If we want Xen to have a non-uniform layout, are two options are:
>>> * Fix Linux to have the same non-uniform layout that Xen wants
>>> (Backwards compatibility for older 64bit PV guests can be achieved with
>>> xen-shim).
>>> * Make use XPTI algorithm (specifically, the pagetable sync/copy part)
>>> forever more in the future.
>>>
>>> Option 2 isn't great (especially for perf on fixed hardware), but does
>>> keep all the necessary changes in Xen.  Option 1 looks to be the better
>>> option longterm.
>>>
>>> As an interesting point to note.  The 32bit PV ABI prohibits sharing of
>>> L3 pagetables, because back in the 32bit hypervisor days, we used to
>>> have linear mappings in the Xen virtual range.  This check is stale
>>> (from a functionality point of view), but still present in Xen.  A
>>> consequence of this is that 32bit PV guests definitely don't share
>>> top-level pagetables across vcpus.
>>
>> Correction: 32bit PV ABI prohibits sharing of L2 pagetables, but L3
>> pagetables can be shared. So guests will schedule the same top-level
>> pagetables across vcpus. >
>> But, 64bit Xen creates a monitor table for 32bit PAE guest and put the
>> CR3 provided by guest to the first slot, so pcpus don't share the same
>> L4 pagetables. The property we want still holds.
> 
> Ah, right -- but Xen can get away with this because in PAE mode, "L3" is
> just 4 entries that are loaded on CR3-switch and not automatically kept
> in sync by the hardware; i.e., the OS already needs to do its own
> "manual syncing" if it updates any of the L3 entires; so it's the same
> for Xen.
> 
>>> Juergen/Boris: Do you have any idea if/how easy this infrastructure
>>> would be to implement for 64bit PV guests as well?  If a PV guest can
>>> advertise via Elfnote that it won't share top-level pagetables, then we
>>> can audit this trivially in Xen.
>>>
>>
>> After reading Linux kernel code, I think it is not going to be trivial.
>> As now threads in Linux share one pagetable (as it should be).
>>
>> In order to make each thread has its own pagetable while still maintain
>> the illusion of one address space, there needs to be synchronisation
>> under the hood.
>>
>> There is code in Linux to synchronise vmalloc, but that's only for the
>> kernel portion. The infrastructure to synchronise userspace portion is
>> missing.
>>
>> One idea is to follow the same model as vmalloc -- maintain a reference
>> pagetable in struct mm and a list of pagetables for threads, then
>> synchronise the pagetables in the page fault handler. But this is
>> probably a bit hard to sell to Linux maintainers because it will touch a
>> lot of the non-Xen code, increase complexity and decrease performance.
> 
> Sorry -- what do you mean "synchronize vmalloc"?  If every thread has a
> different view of the kernel's vmalloc area, then every thread must have
> a different L4 table, right?  And if every thread has a different L4
> table, then we've already got the main thing we need from Linux, don't we?


Just had an IRL chat with Wei:  The syncronization he was talking about
was a syncronization *of the kernel space* *between procesess*.  What we
would need in Linux is a synchronization *of userspace* *between
threads*.  So the same basic idea is there, but it would require a
reasomable amount of extra extension work.

Since the work that would need to be done in Linux is exactly the same
work that we'd need to do in Xen, I think the Linux maintainers would be
pretty annoyed if we asked them to do it instead of doing it ourselves.

 -George

_______________________________________________
Xen-devel mailing list
[email protected]
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ongoing/future speculative mitigation work

Reply via email to