> On 30. Oct 2025, at 07:12, [email protected] wrote:
>
> • This can be optimized by either introducing a batch version of this
> hypercall i.e., XENMEM_remove_from_physmap_batch and flushing TLBs only once
> for all pages being removed
> OR
> by using a TLBI instruction that only invalidates the intended range of
> addresses instead of the whole stage-1 and stage-2 translations. I understand
> that a single TLBI instruction does not exist that can perform both stage-1
> and stage-2 invalidations for a given address range but maybe a combination
> of instructions can be used such as:
> ; switch to current VMID
> tlbi rvae1, guest_vaddr ; first invalidate stage-1 TLB by guest VA for
> current VMID
> tlbi ripas2e1, guest_paddr ; then invalidate stage-2 TLB by IPA range for
> current VMID
> dsb ish
> isb
> ; switch back the VMID
> • This is where I am not quite sure and I was hoping that if someone with
> Arm expertise could sign off on this so that I can work on its implementation
> in Xen. This will be an optimization not only for virtualized hardware but
> also in general for Xen on arm64 machines.
There’s no visibility on what’s going on at stage-1. We don’t know the guest
VAs that map to the given IPA so doing the full stage-1 TLB flush is the only
option if FEAT_nTLBPA isn’t present (and FEAT_nTLBPA is not present on Neoverse
V2).
If FEAT_nTLBPA is present (such as Neoverse V3), then you don’t need the
stage-1 TLB invalidate in this code path.
> So, on older architectures, full stage-2 invalidation would be required. For
> an architecture independent solution, creating a batch version seems to be a
> better way.
Might as well have both, although the range invalidate for stage-2 is most
likely enough to resolve performance issues in your case.