On Mon, 20 Apr 2026 13:07:15 +0100,
Akihiko Odaki <[email protected]> wrote:
>
> On 2026/04/20 18:51, Marc Zyngier wrote:
> > On Mon, 20 Apr 2026 09:36:16 +0100,
> > Akihiko Odaki <[email protected]> wrote:
> >>
> >> On 2026/04/20 2:19, Marc Zyngier wrote:
> >>> On Sat, 18 Apr 2026 09:14:25 +0100,
> >>> Akihiko Odaki <[email protected]> wrote:
> >>>>
> >>>> On a heterogeneous arm64 system, KVM's PMU emulation is based on the
> >>>> features of a single host PMU instance. When a vCPU is migrated to a
> >>>> pCPU with an incompatible PMU, counters such as PMCCNTR_EL0 stop
> >>>> incrementing.
> >>>>
> >>>> Although this behavior is permitted by the architecture, Windows does
> >>>> not handle it gracefully and may crash with a division-by-zero error.
> >>>>
> >>>> The current workaround requires VMMs to pin vCPUs to a set of pCPUs
> >>>> that share a compatible PMU. This is difficult to implement correctly in
> >>>> QEMU/libvirt, where pinning occurs after vCPU initialization, and it
> >>>> also restricts the guest to a subset of available pCPUs.
> >>>>
> >>>> Introduce the KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY attribute to
> >>>> create a "fixed-counters-only" PMU. When set, KVM exposes a PMU that is
> >>>> compatible with all pCPUs but that does not support programmable
> >>>> event counters which may have different feature sets on different PMUs.
> >>>>
> >>>> This allows Windows guests to run reliably on heterogeneous systems
> >>>> without crashing, even without vCPU pinning, and enables VMMs to
> >>>> schedule vCPUs across all available pCPUs, making full use of the host
> >>>> hardware.
> >>>>
> >>>> Much like KVM_ARM_VCPU_PMU_V3_IRQ and other read-write attributes, this
> >>>> attribute provides a getter that facilitates kernel and userspace
> >>>> debugging/testing.
> >>>
> >>> OK, so that's the sales pitch. But how is it implemented? I would like
> >>> to be able to read a high-level description of the implementation
> >>> trade-offs.
> >>
> >> Implementation-wise it is very trivial. Essentially the following
> >> addition in kvm_arm_pmu_v3_get_attr() is the entire implementation:
> >> + case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
> >> + if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
> >> &vcpu->kvm->arch.flags))
> >> + return 0;
> >>
> >> Both its functionality and code complexity is trivial. So we can argue
> >> that:
> >> - the functionality is too trivial to be useful or
> >> - the interface/implementation complexity is so trivial that it does not
> >> incur maintenance burden
> >>
> >> In this case the selftest uses the getter so I was more inclined to
> >> have it, but adding one just for the selftest sounds too ad-hoc, so
> >> here I looked into other attributes to ensure that it was not
> >> introducing inconsistency with existing interfaces.
> >>
> >> As the result, I found there are other read-write attributes; in fact
> >> there are more read-write attributes than write-only ones.
> >
> > You're completely missing the point. I'm referring to the whole of the
> > commit message, which is more of a marketing slide than a technical
> > description.
>
> In terms of implementation, the obvious tradeoff is that it adds more
> code to implement the feature. One thing to note is that
> kvm_vcpu_load_pmu() is added and is called each time a vCPU migrates
> across pCPUs. The heavy part, making the KVM_REQ_RELOAD_PMU request,
> only happens when the feature is enabled.
Well, that's what I want to see. The repeated blurb about Windows
being broken is cover letter material, but not fir for a commit
message.
[...]
> >>> "for the first time" gives the impression that it will work if you try
> >>> again. I'd rather we say that "This feature is incompatible with the
> >>> existence of a PMU event filter".
> >>
> >> The following sequence will work:
> >> 1. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
> >> 2. Set KVM_ARM_VCPU_PMU_V3_FILTER
> >> 3. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
> >>
> >> This is to make the behavior conistent with KVM_ARM_VCPU_PMU_V3_SET_PMU.
> >
> > I don't think this is correct. Filtering is completely at odds with
> > this patch, and I don't want to have to reason about the combination.
>
> kvm_arm_pmu_v3_set_pmu() has the following condition:
>
> if (kvm_vm_has_ran_once(kvm) ||
> (kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) {
> ret = -EBUSY;
> break;
> }
>
> kvm_arm_pmu_v3_set_pmu_fixed_counters_only() has the corresponding
> condition for consistency:
>
> if (kvm_vm_has_ran_once(kvm) ||
> (kvm->arch.pmu_filter &&
> !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
> &kvm->arch.flags)))
> return -EBUSY;
>
> We can of course kill the PMU event filter for
> FIXED_COUNTERS_ONLY. The filter is effectively no-op with
> FIXED_COUNTERS_ONLY and I don't think that consistency matters much.
We shouldn't allow weird combinations in the UAPI. Since it makes no
sense to have both fixed-function *and* filters, we should make them
mutually exclusive.
[...]
> >> I expect migration will be handled with the conventional register
> >> getters and setters, but please share if you have a concern.
> >
> > At the very least I want to see some documentation explaining that.
>
> What kind of documentation do you expect?
A description of what counters are exposed by this feature, and what
architectural features they are dependent on.
> If we change kvm_vcpu_load_pmu() to avoid for_each_set_bit(), there
> would be a good chance to forget updating it when mechanically
> updating existing for_each_set_bit() instances, so it is a candidate
> for documentation. But I don't have a good idea where to place it
> either.
The moment we introduce FEAT_PMUv3_ICNTR, the whole PMUv3 emulation
will catch fire anyway, as we already limit ourselves to 32 bits for
reset and nesting switch.
So this will be a major redesign anyway. If you are really worried,
leave a comment in there.
M.
--
Without deviation from the norm, progress is not possible.