On 2026/02/26 20:54, Oliver Upton wrote:
Hi Akihiko,

On Wed, Feb 25, 2026 at 01:31:15PM +0900, Akihiko Odaki wrote:
@@ -629,6 +629,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
                kvm_vcpu_load_vhe(vcpu);
        kvm_arch_vcpu_load_fp(vcpu);
        kvm_vcpu_pmu_restore_guest(vcpu);
+       if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, 
&vcpu->kvm->arch.flags))
+               kvm_make_request(KVM_REQ_CREATE_PMU, vcpu);

We only need to set the request if the vCPU has migrated to a different
PMU implementation, no?

Indeed. I was too lazy to implement such a check since it won't affect performance unless the new feature is requested, but having one may be still nice.


        if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
                kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
@@ -1056,6 +1058,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
                if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
                        kvm_vcpu_reload_pmu(vcpu);
+ if (kvm_check_request(KVM_REQ_CREATE_PMU, vcpu))
+                       kvm_vcpu_create_pmu(vcpu);
+

My strong preference would be to squash the migration handling into
kvm_vcpu_reload_pmu(). It is already reprogramming PMU events in
response to other things.

Can you share a reason for that?

In terms of complexity, I don't think it will help reducing complexity since the only common things between kvm_vcpu_reload_pmu() and kvm_vcpu_create_pmu() are the enumeration of enabled counters, which is simple enough.

In terms of performance, I guess it is better to keep kvm_vcpu_create_pmu() small since it is triggered for each migration.


                if (kvm_check_request(KVM_REQ_RESYNC_PMU_EL0, vcpu))
                        kvm_vcpu_pmu_restore_guest(vcpu);
@@ -1516,7 +1521,8 @@ static int kvm_setup_vcpu(struct kvm_vcpu *vcpu)
         * When the vCPU has a PMU, but no PMU is set for the guest
         * yet, set the default one.
         */
-       if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu)
+       if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu &&
+           !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, 
&kvm->arch.flags))
                ret = kvm_arm_set_default_pmu(kvm);

I'd rather just initialize it to a default than have to deal with the
field being sometimes null.

I agree. I'll change this with the next version.


-static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
+static u64 kvm_pmu_enabled_counter_mask(struct kvm_vcpu *vcpu)
  {
-       struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
-       unsigned int mdcr = __vcpu_sys_reg(vcpu, MDCR_EL2);
+       u64 mask = 0;
- if (!(__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(pmc->idx)))
-               return false;
+       if (__vcpu_sys_reg(vcpu, MDCR_EL2) & MDCR_EL2_HPME)
+               mask |= kvm_pmu_hyp_counter_mask(vcpu);
- if (kvm_pmu_counter_is_hyp(vcpu, pmc->idx))
-               return mdcr & MDCR_EL2_HPME;
+       if (kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E)
+               mask |= ~kvm_pmu_hyp_counter_mask(vcpu);
- return kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E;
+       return __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
+}
+
+static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
+{
+       struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
+
+       return kvm_pmu_enabled_counter_mask(vcpu) & BIT(pmc->idx);
  }

You're churning a good bit of code, this needs to happen in a separate
patch (if at all).

It makes sense. The next version will have a separate patch for this.


@@ -689,6 +710,14 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
        int eventsel;
        u64 evtreg;
+ if (!arm_pmu) {
+               arm_pmu = kvm_pmu_probe_armpmu(vcpu->cpu);

kvm_pmu_probe_armpmu() takes a global mutex, I'm not sure that's what we
want.

What prevents us from opening a PERF_TYPE_RAW event and allowing perf to
work out the right PMU for this CPU?

Unfortunately perf does not seem to have a capability to switch to the right PMU. tools/perf/Documentation/intel-hybrid.txt says the perf tool creates events for each PMU in a hybird configuration, for example.


+               if (!arm_pmu) {
+                       vcpu_set_on_unsupported_cpu(vcpu);

At this point it seems pretty late to flag the CPU as unsupported. Maybe
instead we can compute the union cpumask for all the PMU implemetations
the VM may schedule on.

This is just a safe guard and it is a responsibility of the userspace to schedule the VCPU properly. It is conceptually same with what kvm_arch_vcpu_load() does when migrating to an unsupported CPU.


@@ -1249,6 +1299,10 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, 
struct kvm_device_attr *attr)
                irq = vcpu->arch.pmu.irq_num;
                return put_user(irq, uaddr);
        }
+       case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
+               lockdep_assert_held(&vcpu->kvm->arch.config_lock);
+               if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, 
&vcpu->kvm->arch.flags))
+                       return 0;

We don't need a getter for this, userspace should remember how it
provisioned the VM.

The getter is useful for debugging and testing. The selftest will use it to query the current state.

Regards,
Akihiko Odaki

Reply via email to