From: Sebastian Andrzej Siewior <[email protected]> Sent: Wednesday, March 
18, 2026 3:02 AM
> 
> On 2026-03-17 17:25:20 [+0000], Michael Kelley wrote:
> > From: Sebastian Andrzej Siewior <[email protected]> Sent: Thursday, 
> > March 12, 2026 10:07 AM
> > >
> > Let me try to address the range of questions here and in the follow-up
> > discussion. As background, an overview of VMBus interrupt handling is in:
> >
> > Documentation/virt/hyperv/vmbus.rst
> >
> > in the section entitled "Synthetic Interrupt Controller (synic)". The
> > relevant text is:
> >
> >    The SINT is mapped to a single per-CPU architectural interrupt (i.e,
> >    an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because
> >    each CPU in the guest has a synic and may receive VMBus interrupts,
> >    they are best modeled in Linux as per-CPU interrupts. This model works
> >    well on arm64 where a single per-CPU Linux IRQ is allocated for
> >    VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ 
> > labelled
> >    "Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86
> >    interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR)
> >    across all CPUs and explicitly coded to call vmbus_isr(). In this case,
> >    there's no Linux IRQ, and the interrupts are visible in aggregate in
> >    /proc/interrupts on the "HYP" line.
> >
> > The use of a statically allocated sysvec pre-dates my involvement in this
> > code starting in 2017, but I believe it was modelled after what Xen does,
> > and for the same reason -- to effectively create a per-CPU interrupt on
> > x86/x64. Acorn is also using HYPERVISOR_CALLBACK_VECTOR, but I
> > don't know if that is also to create a per-CPU interrupt.
> 
> If you create a vector, it becomes per-CPU. There is simply no mapping
> from HYPERVISOR_CALLBACK_VECTOR to request_percpu_irq(). But if we had
> this…

Indeed, yes, that would remove the need for all the per-CPU interrupt hackery
on x86/x64. I don't have any objection to someone pursuing that path, but it's
not something I can do. Full disclosure:  You'll see my name on Hyper-V and
VMBus stuff in the Linux kernel, with Microsoft as my employer. But I retired
from Microsoft 2.5 years ago, and my current involvement in Linux kernel work
is purely as a very part-time volunteer. I also lack access to hardware and the
test machinery needed to make more significant changes, particularly if multiple
versions of Hyper-V must be tested.

> 
> …
> > > What clears this? This is wrongly placed. This should go to
> > > sysvec_hyperv_callback() instead with its matching canceling part. The
> > > add_interrupt_randomness() should also be there and not here.
> > > sysvec_hyperv_stimer0() managed to do so.
> >
> > I don't have any knowledge to bring regarding the use of
> > lockdep_hardirq_threaded().
> 
> It is used in IRQ core to mark the execution of an interrupt handler
> which becomes threaded in a forced-threaded scenario. The goal is to let
> lockdep know that this piece of code on !RT will be threaded on RT and
> therefore there is no need to report a possible locking problem that
> will not exist on RT.
> 
> > > Different question: What guarantees that there won't be another
> > > interrupt before this one is done? The handshake appears to be
> > > deprecated. The interrupt itself returns ACKing (or not) but the actual
> > > handler is delayed to this thread. Depending on the userland it could
> > > take some time and I don't know how impatient the host is.
> >
> > In more recent versions of Hyper-V, what's deprecated is Hyper-V implicitly
> > and automatically doing the EOI. So in sysvec_hyperv_callback(), apic_eoi()
> > is usually explicitly called to ack the interrupt.
> >
> > There's no guarantee, in either the existing case or the new PREEMPT_RT
> > case, that another VMBus interrupt won't come in on the same CPU
> > before the tasklets scheduled by vmbus_message_sched() or
> > vmbus_chan_sched() have run. From a functional standpoint, the Linux
> > code and interaction with Hyper-V handles another interrupt correctly.
> 
> So there is no scenario that the host will trigger interrupts because
> the guest is leaving the ISR without doing anything/ making progress?
> 
> > From a delay standpoint, there's not a problem for the normal (i.e., not
> > PREEMPT_RT) case because the tasklets run as the interrupt exits -- they
> > don't end up in ksoftirqd. For the PREEMPT_RT case, I can see your point
> > about delays since the tasklets are scheduled from the new per-CPU thread.
> > But my understanding is that Jan's motivation for these changes is not to
> > achieve true RT behavior, since Hyper-V doesn't provide that anyway.
> > The goal is simply to make PREEMPT_RT builds functional, though Jan may
> > have further comments on the goal.
> 
> I would be worried if the host would storming interrupts to the guest
> because it makes no progress.

No, that kind of storming won't happen. The Hyper-V host<->guest
interface is based on message queues. The host interrupts the guest
if it puts a message in the queue that transitions the queue from
"empty" to "not empty". Eventually the tasklet enabled in vmbus_isr()
and its subsidiaries gets around to emptying the queue, which effectively
re-arms the interrupt. The host may add more messages to the queue,
but it doesn't interrupt again for that queue until the queue is empty.
If the guest is delayed in doing that emptying, nothing bad happens.

There could be multiple queues that interrupt the same vCPU in the
guest, so there might be another interrupt to the same vCPU due to
a different queue, but that could happen regardless of the latency in
emptying a queue. And the number of queues assigned to a vCPU
is at most a small integer.

> 
> > > > +               __vmbus_isr();
> > > Moving on. This (trying very hard here) even schedules tasklets. Why?
> > > You need to disable BH before doing so. Otherwise it ends in ksoftirqd.
> > > You don't want that.
> >
> > Again, Jan can comment on the impact of delays due to ending up
> > in ksoftirqd.
> 
> My point is that having this with threaded interrupt support would
> eliminate the usage of tasklets.

Agreed, probably. For the non-RT case, the latency in getting to the
tasklet code *does* matter. I'm not familiar with how tasklets compare
to threaded interrupts on latency.

> 
> > > Couldn't the whole logic be integrated into the IRQ code? Then we could
> > > have mask/ unmask if supported/ provided and threaded interrupts. Then
> > > sysvec_hyperv_reenlightenment() could use a proper threaded interrupt
> > > instead apic_eoi() + schedule_delayed_work().
> >
> > As I described above, Hyper-V needs a per-CPU interrupt. It's faked up
> > on x86/x64 with the hardcoded HYPERVISOR_CALLBACK_VECTOR sysvec
> > entry, but on arm64 a normal Linux per-CPU IRQ is used. Once the execution
> > path gets to vmbus_isr(), the two architectures share the same code. Same
> > thing is done with the Hyper-V STIMER0 interrupt as a per-CPU interrupt.
> 
> This one has the "random" collecting on the right spot.

Regarding the timer path, see my comment in the other email thread.

> 
> > If there's a better way to fake up a per-CPU interrupt on x86/x64, I'm open
> > to looking at it.
> >
> > As I recently discovered in discussion with Jan, standard Linux IRQ handling
> > will *not* thread per-CPU interrupts. So even on arm64 with a standard
> > Linux per-CPU IRQ is used for VMBus and STIMER0 interrupts, we can't
> > request threading.
> 
> It would require a statement from the x86 & IRQ maintainers if it is
> worth on x86 to make allow pass HYPERVISOR_CALLBACK_VECTOR to
> request_percpu_irq() and have an IRQF_ that this one needs to be forced
> threaded. Otherwise we would need to remain with the workarounds.

Again, you or someone else is welcome to explore this topic.

> 
> If you say that an interrupt storm can not occur, I would prefer
> |static DEFINE_WAIT_OVERRIDE_MAP(vmbus_map, LD_WAIT_CONFIG);
> |…
> |     lock_map_acquire_try(&vmbus_map);
> |     __vmbus_isr();
> |     lock_map_release(&vmbus_map);
> 
> while it has mostly the same effect.
> 
> Either way, that add_interrupt_randomness() should be moved to
> sysvec_hyperv_callback() like it has been done for
> sysvec_hyperv_stimer0(). It should be invoked twice now if gets there
> via vmbus_percpu_isr().
> 
> > I need to refresh my memory on sysvec_hyperv_reenlightenment(). If
> > I recall correctly, it's not a per-CPU interrupt, so it probably doesn't
> > need to have a hardcoded vector. Overall, the Hyper-V reenlightenment
> > functionality is a bit of a fossil that isn't needed on modern x86/x64
> > processors that support TSC scaling. And it doesn't exist for arm64.
> > It might be worth seeing if it could be dropped entirely ...

I've refreshed my memory on the reenlightenment functionality, and
I think it has to stay. The functionality is used by KVM when it is running
in an L1 VM on an L0 Hyper-V host, and supporting its own L2 guest VMs.
I will check with Vitaly Kuznetsov, who originally added the reenlightenment
support for KVM, but I suspect it needs to stay for a few more years.

Old Hyper-V version support has been dropped in the past [1], but the
situation with reenlightenment is more that just the Hyper-V version.

Michael

[1] 
https://lore.kernel.org/all/[email protected]/

> >
> > Michael
> 
> Sebastian

Reply via email to