Re: Limitations for Running Xen on KVM Arm64

Julien Grall Thu, 30 Oct 2025 16:55:52 -0700

Hi Mohamed,

On 30/10/2025 18:33, Mohamed Mediouni wrote:

On 30. Oct 2025, at 14:41, [email protected] wrote:

Adding @[email protected] and replying to his questions he asked over 
#XenDevel:matrix.org.

can you add some details why the implementation cannot be optimized in KVM? 
Asking because I have never seen such issue when running Xen on QEMU (without 
nested virt enabled).
AFAIK when Xen is run on QEMU without virtualization, then instructions are 
emulated in QEMU while with KVM, ideally the instruction should run directly on 
hardware except in some special cases (those trapped by FGT/CGT). Such as this 
one where KVM maintains shadow page tables for each VM. It traps these 
instructions and emulates them with callback such as handle_vmalls12e1is(). The 
way this callback is implemented, it has to iterate over the whole address 
space and clean-up the page tables which is a costly operation. Regardless of 
this, it should still be optimized in Xen as invalidating a selective range 
would be much better than invalidating a whole range of 48-bit address space.
Some details about your platform and use case would be helpful. I am interested 
to know whether you are using all the features for nested virt.
I am using AWS G4. My use case is to run Xen as guest hypervisor. Yes, most of 
the features are enabled except VHE or those which are disabled by KVM.



Hello,

You mean Graviton4 (for reference to others, from a bare metal instance)? 
Interesting to see people caring about nested virt there :) - and hopefully 
using it wasn’t too much of a pain for you to deal with.


; switch to current VMID
tlbi rvae1, guest_vaddr ; first invalidate stage-1 TLB by guest VA for current 
VMID
tlbi ripas2e1, guest_paddr ; then invalidate stage-2 TLB by IPA range for 
current VMID
dsb ish
isb
; switch back the VMID
     • This is where I am not quite sure and I was hoping that if someone with 
Arm expertise could sign off on this so that I can work on its implementation 
in Xen. This will be an optimization not only for virtualized hardware but also 
in general for Xen on arm64 machines.


Note that the documentation says

The invalidation is not required to apply to caching structures that combine 
stage 1 and stage 2 translation table entries.


for TLBIP RIPAS2E1

     • The second place in Xen where this is problematic is when multiple vCPUs 
of the same domain juggle on single pCPU, TLBs are invalidated everytime a 
different vCPU runs on a pCPU. I do not know how this can be optimized. Any 
support on this is appreciated.



One way to handle this is every invalidate within the VM a broadcast TLB 
invalidate (HCR_EL2.FB is what you’re looking for) and then forego that TLB 
maintenance as it’s no longer necessary. This should not have a practical 
performance impact.

To confirm my understanding, you are suggesting to rely on the L2 guestto send the TLB flush. Did I understanding correctly? If so, wouldn'tthis open a security hole because a misbehaving guest may never send theTLB flush?


Cheers,

--
Julien Grall

Re: Limitations for Running Xen on KVM Arm64

Reply via email to