On 2026/2/26 04:11, Sean Christopherson wrote:
On Mon, Feb 02, 2026, Lance Yang wrote:
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 37dc8465e0f5..6a5e47ee4eb6 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -856,6 +856,12 @@ static void __init kvm_guest_init(void)
#ifdef CONFIG_SMP
if (pv_tlb_flush_supported()) {
pv_ops.mmu.flush_tlb_multi = kvm_flush_tlb_multi;
+ /*
+ * KVM's flush implementation calls native_flush_tlb_multi(),
+ * which sends real IPIs when INVLPGB is not available.
Not on all (virtual) CPUs. The entire point of KVM's PV TLB flush is to elide
the IPIs. If a vCPU was scheduled out by the host, the guest sets a flag and
relies on the host to flush the TLB on behalf of the guest prior to the next
VM-Enter.
Ah, I see. Thanks for the correction!
KVM only sends IPIs to running vCPUs; preempted ones are left out of the
mask
and flushed on VM-Enter. So the old comment was wrong ...
IIUC, we still set the flag to true because only running vCPUs can be in a
software/lockless walk, and they all get the IPI, so the flush is enough.
Does that match what you had in mind?
Thanks,
Lance
for_each_cpu(cpu, flushmask) {
/*
* The local vCPU is never preempted, so we do not explicitly
* skip check for local vCPU - it will never be cleared from
* flushmask.
*/
src = &per_cpu(steal_time, cpu);
state = READ_ONCE(src->preempted);
if ((state & KVM_VCPU_PREEMPTED)) {
if (try_cmpxchg(&src->preempted, &state,
state | KVM_VCPU_FLUSH_TLB))
__cpumask_clear_cpu(cpu, flushmask); <===
removes CPU from the IPI set
}
}
native_flush_tlb_multi(flushmask, info);
+ if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
+ pv_ops.mmu.flush_tlb_multi_implies_ipi_broadcast = true;
pr_info("KVM setup pv remote TLB flush\n");
}