On 28.09.2022 15:03, Juergen Gross wrote:
> On 28.09.22 14:06, Jan Beulich wrote:
>> On 28.09.2022 12:58, Andrew Cooper wrote:
>>> On 28/09/2022 11:38, Jan Beulich wrote:
>>>> As an alternative I'd like to propose the introduction of a bit (or 
>>>> multiple
>>>> ones, see below) augmenting the hypercall number, to control the flavor of 
>>>> the
>>>> buffers used for every individual hypercall.  This would likely involve the
>>>> introduction of a new hypercall page (or multiple ones if more than one 
>>>> bit is
>>>> to be used), to retain the present abstraction where it is the hypervisor 
>>>> which
>>>> actually fills these pages.
>>>
>>> There are other concerns which need to be accounted for.
>>>
>>> Encrypted VMs cannot use a hypercall page; they don't trust the
>>> hypervisor in the first place, and the hypercall page is (specifically)
>>> code injection.  So the sensible new ABI cannot depend on a hypercall table.
>>
>> I don't think there's a dependency, and I think there never really has been.
>> We've been advocating for its use, but we've not enforced that anywhere, I
>> don't think.
>>
>>> Also, rewriting the hypercall page on migrate turns out not to have been
>>> the most clever idea, and only works right now because the instructions
>>> are the same length in the variations for each mode.
>>>
>>> Also continuations need to change to avoid userspace liveness problems,
>>> and existing hypercalls that we do have need splitting between things
>>> which are actually privileged operations (within the guest context) and
>>> things which are logical control operations, so the kernel can expose
>>> the latter to userspace without retaining the gaping root hole which is
>>> /dev/xen/privcmd, and a blocker to doing UEFI Secureboot.
>>>
>>> So yes, starting some new clean(er) interface from hypercall 64 is the
>>> plan, but it very much does not want to be a simple mirror of the
>>> existing 0-63 with a differing calling convention.
>>
>> All of these look like orthogonal problems to me. That's likely all
>> relevant for, as I think you've been calling it, ABI v2, but shouldn't
>> hinder our switching to a physical address based hypercall model.
>> Otherwise I'm afraid we'll never make any progress in that direction.
> 
> What about an alternative model allowing to use most of the current
> hypercalls unmodified?
> 
> We could add a new hypercall for registering hypercall buffers via
> virtual address, physical address, and size of the buffers (kind of a
> software TLB).

Why not?

> The buffer table would want to be physically addressed
> by the hypercall, of course.

I'm not convinced of this, as it would break uniformity of the hypercall
interfaces. IOW in the hypervisor we then wouldn't be able to use
copy_from_guest() to retrieve the contents. Perhaps this simply shouldn't
be a table, but a hypercall not involving any buffers (i.e. every
discontiguous piece would need registering separately). I expect such a
software TLB wouldn't have many entries, so needing to use a couple of
hypercalls shouldn't be a major issue.

> It might be interesting to have this table per vcpu (it should be
> allowed to use the same table for multiple vcpus) in order to speed
> up finding translation entries of percpu buffers.

Yes. Perhaps insertion and purging could simply be two new VCPUOP_*.

As a prereq I think we'd need to sort the cross-vCPU accessing of guest
data, coincidentally pointed out in a post-commit-message remark in
https://lists.xen.org/archives/html/xen-devel/2022-09/msg01761.html. The
subject vCPU isn't available in copy_to_user_hvm(), which is where I'd
expect the TLB lookup to occur (while assuming handles point at globally
mapped space _might_ be okay, using the wrong vCPU's TLB surely isn't).

> Any hypercall buffer being addressed virtually could first tried to
> be found via the SW-TLB. This wouldn't require any changes for most
> of the hypercall interfaces. Only special cases with very large buffers
> might need indirect variants (like Jan said: via GFN lists, which could
> be passed in registered buffers).
> 
> Encrypted guests would probably want to use static percpu buffers in
> order to avoid switching the encryption state of the buffers all the
> time.
> 
> An unencrypted PVH/HVM domain (e.g. PVH dom0) could just define one
> giant buffer with the domain's memory size via the physical memory
> mapping of the kernel. All kmalloc() addresses would be in that region.

That's Linux-centric. I'm not convinced all OSes maintain a directmap.
Without such, switching to this model might end up quite intrusive on
the OS side.

Thinking of Linux, we'd need a 2nd range covering the data part of the
kernel image.

Further this still wouldn't (afaics) pave a reasonable route towards
dealing with privcmd-invoked hypercalls.

Finally - in how far are we concerned of PV guests using linear
addresses for hypercall buffers? I ask because I don't think the model
lends itself to use also for the PV guest interfaces.

Jan

> A buffer address not found would need to be translated like today (and
> fail for an encrypted guest).
> 
> Thoughts?
> 
> 
> Juergen


Reply via email to