On Wed, Apr 15, 2026 at 07:05:17AM +0900, Itaru Kitayama wrote: > On Tue, Apr 14, 2026 at 11:16:47AM +0100, Wei-Lin Chang wrote: > > On Tue, Apr 14, 2026 at 06:31:22AM +0900, Itaru Kitayama wrote: > > > On Mon, Apr 13, 2026 at 10:18:42AM +0100, Wei-Lin Chang wrote: > > > > Hi Itaru, > > > > > > > > On Mon, Apr 13, 2026 at 08:19:25AM +0900, Itaru Kitayama wrote: > > > > > On Sun, Apr 12, 2026 at 03:22:15PM +0100, Wei-Lin Chang wrote: > > > > > > This selftest simply starts an L1, which starts its own guest (L2). > > > > > > L2 > > > > > > runs without stage-1 and 2 translations, it calls an HVC to jump > > > > > > back > > > > > > to L1. > > > > > > > > > > How do you disable both the nested guest (L2)'s MMU and stage 2 > > > > > translations? > > > > > > > > Guest stage-2 is disabled by not setting HCR_EL2.VM in prepare_hyp(), > > > > and stage-1 is disabled by not writing to SCTLR_EL12 in init_vcpu(), > > > > effectively using the default value set by L0. However since SCTLR_EL1 > > > > has many architecturally UNKNOWN bits (including SCTLR_EL1.M), it should > > > > be better to write a value before running L2 I suppose... > > > > > > Thanks. What do you think of using copy_el2_to_el1() macro in at.c, so we > > > can prepare in guest_code() to manipulate the SCTLR_EL12 System register > > > with the sensible programmed values? > > > > Yes, using copy_el2_to_el1() can give us an L2 stage-1 that is identical > > to the L1's stage-1. But what I was considering was if guest stage-2 is > > enabled (which we plan to implement), then those stage-1 page tables > > will have to be mapped for L2, and its base address translated to L2IPA. > > It's doable but seems like extra complexity when stage-1 is not so > > interesting for KVM (except for AT?), it lets the guest do whatever it > > likes and let the hardware do the translation. > > > > Let me know if you have reasons to want stage-1 for L2, there could be > > something I should consider but did not. > > By keeping nested guest's MMU enabled, we can exercise the shadow stage > 2 on the host. But I am fine with you starting nested guest's IPA and I > hope Marc and Oliver approve this seris and merge upstream.
I think you have guest stage-1 and guest stage-2 confused. Whether the nested guest's stage-1 MMU is enabled or not does not affect what KVM is doing with the shadow page tables. Stage-1 MMU translates L2VA -> L2IPA. Shadow page tables store the combined translation of L2IPA -> L1IPA (stage-2 PTs L1 built for L2) and L1IPA -> host PA (stage-2 PTs host built for L1). Additionally, stage-2 not enabled for L2 does not mean shadow stage-2 is not exercised, there is still a distince shadow stage-2 for it doing the work, albeit simple (the stored mapping is the same as the canonical stage-2). All in all, if we want to make the shadow page tables more interesting, what we should do is build a stage-2 for L2, and enable it in L1, not just turn on L2's stage-1 MMU. Thanks, Wei-Lin Chang > > Thanks, > Itaru. > > > > > Thanks, > > Wei-Lin Chang > > > > > > > > Itaru. > > > > > > > > > > > Thanks, > > > > Wei-Lin Chang > > > > > > > > > > > > > > Itaru. > > > > > > > > > > > > > > > > > Signed-off-by: Wei-Lin Chang <[email protected]> > > > > > > --- > > > > > > tools/testing/selftests/kvm/Makefile.kvm | 1 + > > > > > > .../selftests/kvm/arm64/hello_nested.c | 103 > > > > > > ++++++++++++++++++ > > > > > > 2 files changed, 104 insertions(+) > > > > > > create mode 100644 tools/testing/selftests/kvm/arm64/hello_nested.c > > > > > > > > > > > > diff --git a/tools/testing/selftests/kvm/Makefile.kvm > > > > > > b/tools/testing/selftests/kvm/Makefile.kvm > > > > > > index 3dc3e39f7025..e8c108e0c487 100644 > > > > > > --- a/tools/testing/selftests/kvm/Makefile.kvm > > > > > > +++ b/tools/testing/selftests/kvm/Makefile.kvm > > > > > > @@ -168,6 +168,7 @@ TEST_GEN_PROGS_arm64 += > > > > > > arm64/arch_timer_edge_cases > > > > > > TEST_GEN_PROGS_arm64 += arm64/at > > > > > > TEST_GEN_PROGS_arm64 += arm64/debug-exceptions > > > > > > TEST_GEN_PROGS_arm64 += arm64/hello_el2 > > > > > > +TEST_GEN_PROGS_arm64 += arm64/hello_nested > > > > > > TEST_GEN_PROGS_arm64 += arm64/host_sve > > > > > > TEST_GEN_PROGS_arm64 += arm64/hypercalls > > > > > > TEST_GEN_PROGS_arm64 += arm64/external_aborts > > > > > > diff --git a/tools/testing/selftests/kvm/arm64/hello_nested.c > > > > > > b/tools/testing/selftests/kvm/arm64/hello_nested.c > > > > > > new file mode 100644 > > > > > > index 000000000000..97387e4697b3 > > > > > > --- /dev/null > > > > > > +++ b/tools/testing/selftests/kvm/arm64/hello_nested.c > > > > > > @@ -0,0 +1,103 @@ > > > > > > +// SPDX-License-Identifier: GPL-2.0-only > > > > > > +/* > > > > > > + * hello_nested - Go from vEL2 to EL1 then back > > > > > > + */ > > > > > > + > > > > > > +#include "nested.h" > > > > > > +#include "processor.h" > > > > > > +#include "test_util.h" > > > > > > +#include "ucall.h" > > > > > > + > > > > > > +#define XLATE2GPA (0xABCD) > > > > > > +#define L2STACKSZ (0x100) > > > > > > + > > > > > > +/* > > > > > > + * TPIDR_EL2 is used to store vcpu id, so save and restore it. > > > > > > + */ > > > > > > +static vm_paddr_t ucall_translate_to_gpa(void *gva) > > > > > > +{ > > > > > > + vm_paddr_t gpa; > > > > > > + u64 vcpu_id = read_sysreg(tpidr_el2); > > > > > > + > > > > > > + GUEST_SYNC2(XLATE2GPA, gva); > > > > > > + > > > > > > + /* get the result from userspace */ > > > > > > + gpa = read_sysreg(tpidr_el2); > > > > > > + > > > > > > + write_sysreg(vcpu_id, tpidr_el2); > > > > > > + > > > > > > + return gpa; > > > > > > +} > > > > > > + > > > > > > +static void l2_guest_code(void) > > > > > > +{ > > > > > > + do_hvc(); > > > > > > +} > > > > > > + > > > > > > +static void guest_code(void) > > > > > > +{ > > > > > > + struct vcpu vcpu; > > > > > > + struct hyp_data hyp_data; > > > > > > + int ret; > > > > > > + vm_paddr_t l2_pc, l2_stack_top; > > > > > > + /* force 16-byte alignment for the stack pointer */ > > > > > > + u8 l2_stack[L2STACKSZ] __attribute__((aligned(16))); > > > > > > + > > > > > > + GUEST_ASSERT_EQ(get_current_el(), 2); > > > > > > + GUEST_PRINTF("vEL2 entry\n"); > > > > > > + > > > > > > + l2_pc = ucall_translate_to_gpa(l2_guest_code); > > > > > > + l2_stack_top = ucall_translate_to_gpa(&l2_stack[L2STACKSZ]); > > > > > > + > > > > > > + init_vcpu(&vcpu, l2_pc, l2_stack_top); > > > > > > + prepare_hyp(); > > > > > > + > > > > > > + ret = run_l2(&vcpu, &hyp_data); > > > > > > + GUEST_ASSERT_EQ(ret, ARM_EXCEPTION_TRAP); > > > > > > + GUEST_DONE(); > > > > > > +} > > > > > > + > > > > > > +int main(void) > > > > > > +{ > > > > > > + struct kvm_vcpu_init init; > > > > > > + struct kvm_vcpu *vcpu; > > > > > > + struct kvm_vm *vm; > > > > > > + struct ucall uc; > > > > > > + vm_paddr_t gpa; > > > > > > + > > > > > > + TEST_REQUIRE(kvm_check_cap(KVM_CAP_ARM_EL2)); > > > > > > + vm = vm_create(1); > > > > > > + > > > > > > + kvm_get_default_vcpu_target(vm, &init); > > > > > > + init.features[0] |= BIT(KVM_ARM_VCPU_HAS_EL2); > > > > > > + vcpu = aarch64_vcpu_add(vm, 0, &init, guest_code); > > > > > > + kvm_arch_vm_finalize_vcpus(vm); > > > > > > + > > > > > > + while (true) { > > > > > > + vcpu_run(vcpu); > > > > > > + > > > > > > + switch (get_ucall(vcpu, &uc)) { > > > > > > + case UCALL_SYNC: > > > > > > + if (uc.args[0] == XLATE2GPA) { > > > > > > + gpa = addr_gva2gpa(vm, > > > > > > (vm_vaddr_t)uc.args[1]); > > > > > > + vcpu_set_reg(vcpu, > > > > > > KVM_ARM64_SYS_REG(SYS_TPIDR_EL2), gpa); > > > > > > + } > > > > > > + break; > > > > > > + case UCALL_PRINTF: > > > > > > + pr_info("%s", uc.buffer); > > > > > > + break; > > > > > > + case UCALL_DONE: > > > > > > + pr_info("DONE!\n"); > > > > > > + goto end; > > > > > > + case UCALL_ABORT: > > > > > > + REPORT_GUEST_ASSERT(uc); > > > > > > + fallthrough; > > > > > > + default: > > > > > > + TEST_FAIL("Unhandled ucall: %ld\n", uc.cmd); > > > > > > + } > > > > > > + } > > > > > > + > > > > > > +end: > > > > > > + kvm_vm_free(vm); > > > > > > + return 0; > > > > > > +} > > > > > > -- > > > > > > 2.43.0 > > > > > >

