On 2026/02/27 8:05, Oliver Upton wrote:
On Thu, Feb 26, 2026 at 11:47:54PM +0900, Akihiko Odaki wrote:
On 2026/02/26 23:43, Akihiko Odaki wrote:
On 2026/02/26 20:54, Oliver Upton wrote:
Hi Akihiko,

On Wed, Feb 25, 2026 at 01:31:15PM +0900, Akihiko Odaki wrote:
@@ -629,6 +629,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu
*vcpu, int cpu)
           kvm_vcpu_load_vhe(vcpu);
       kvm_arch_vcpu_load_fp(vcpu);
       kvm_vcpu_pmu_restore_guest(vcpu);
+    if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
&vcpu- >kvm->arch.flags))
+        kvm_make_request(KVM_REQ_CREATE_PMU, vcpu);

We only need to set the request if the vCPU has migrated to a different
PMU implementation, no?

Indeed. I was too lazy to implement such a check since it won't affect
performance unless the new feature is requested, but having one may be
still nice.

I'd definitely like to see this.


       if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
           kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
@@ -1056,6 +1058,9 @@ static int check_vcpu_requests(struct
kvm_vcpu *vcpu)
           if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
               kvm_vcpu_reload_pmu(vcpu);
+        if (kvm_check_request(KVM_REQ_CREATE_PMU, vcpu))
+            kvm_vcpu_create_pmu(vcpu);
+

My strong preference would be to squash the migration handling into
kvm_vcpu_reload_pmu(). It is already reprogramming PMU events in
response to other things.

Can you share a reason for that?

In terms of complexity, I don't think it will help reducing complexity
since the only common things between kvm_vcpu_reload_pmu() and
kvm_vcpu_create_pmu() are the enumeration of enabled counters, which is
simple enough.

I prefer it in terms of code organization. We should have a single
helper that refreshes the backing perf events when something has
globally changed for the vPMU.

Besides this, "create" is confusing since the vPMU has already been
instantiated.

In terms of performance, I guess it is better to keep
kvm_vcpu_create_pmu() small since it is triggered for each migration.

I think the surrounding KVM code for iterating over the counters is
inconsequential compared to the overheads of calling into perf to
recreate the PMU events. Since we expect this to be slow, we should only
set the request when absolutely necessary.

I see. I'll squash it into kvm_vcpu_reload_pmu().


+static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
+{
+    struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
+
+    return kvm_pmu_enabled_counter_mask(vcpu) & BIT(pmc->idx);
   }

You're churning a good bit of code, this needs to happen in a separate
patch (if at all).

It makes sense. The next version will have a separate patch for this.

If I have the full picture right, you may not need it with a common
request handler.

I think I'm going to use it to check if the vCPU is covered by the perf events currently enabled before requesting KVM_REQ_RELOAD_PMU.



@@ -689,6 +710,14 @@ static void
kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
       int eventsel;
       u64 evtreg;
+    if (!arm_pmu) {
+        arm_pmu = kvm_pmu_probe_armpmu(vcpu->cpu);

kvm_pmu_probe_armpmu() takes a global mutex, I'm not sure that's what we
want.

What prevents us from opening a PERF_TYPE_RAW event and allowing perf to
work out the right PMU for this CPU?

Unfortunately perf does not seem to have a capability to switch to the
right PMU. tools/perf/Documentation/intel-hybrid.txt says the perf tool
creates events for each PMU in a hybird configuration, for example.

I think I misunderstood what you meant. Letting
perf_event_create_kernel_counter() to figure out what a PMU to use may be a
good idea. I'll give a try with the next version.

Yep, this is what I was alluding to.

I tried this, but unfortunately it didn't work well. Simply using PERF_TYPE_RAW let perf_event_create_kernel_counter() choose an arbitrary PMU, potentially not covering the current PCPU.

We can change the cpu parameter of the function to fix this, but it binds the perf event to that particular PCPU and requires recreating perf event when migrating to another PCPU covered by the same PMU.

I think I'm going to use RCU to avoid locking a global mutex.




+        if (!arm_pmu) {
+            vcpu_set_on_unsupported_cpu(vcpu);

At this point it seems pretty late to flag the CPU as unsupported. Maybe
instead we can compute the union cpumask for all the PMU implemetations
the VM may schedule on.

This is just a safe guard and it is a responsibility of the userspace to
schedule the VCPU properly. It is conceptually same with what
kvm_arch_vcpu_load() does when migrating to an unsupported CPU.

I agree with you that we need to have some handling for this situation.

What I don't like about this is userspace doesn't discover its mistake
until the guest actually programs a PMC. I'd much rather preserve the
existing ABI where KVM proactively rejects running a vCPU on an
unsupported CPU.

Thanks for explanation. I'll change this with the next version.


+    case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
+        lockdep_assert_held(&vcpu->kvm->arch.config_lock);
+        if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
&vcpu->kvm->arch.flags))
+            return 0;

We don't need a getter for this, userspace should remember how it
provisioned the VM.

The getter is useful for debugging and testing. The selftest will use it
to query the current state.

That's fine for debugging this on your own kernel but we don't need it
upstream. There's several other vPMU attributes that are write-only,
like KVM_ARM_VCPU_PMU_V3_SET_PMU.

Not just for debugging kernel, but it will be useful for userspace debugging.

Indeed there are other readonly attributes, but there is also an attribute with getter: KVM_ARM_VCPU_PMU_V3_IRQ. I think there are more if you look at a scope broader than the KVM_ARM_VCPU_PMU_V3_CTRL group, and such existing getters for read/write attributes are probably only useful for kernel/userspace debugging either. I think having a getter can be justified, given that these preexisting examples.

Regards,
Akihiko Odaki

Reply via email to