On 13.02.2026 17:28, Oleksii Kurochko wrote:
> Based on Linux kernel v6.16.0.
> Note that smp_wmb() is used instead of smp_mb__before_atomic() as what
> we want to guarantee that if a bit in irqs_pending_mask is obversable
> that the correspondent bit in irqs_pending is observable too.
>
> Add lockless tracking of pending vCPU interrupts using atomic bitops.
> Two bitmaps are introduced:
> - irqs_pending — interrupts currently pending for the vCPU
> - irqs_pending_mask — bits that have changed in irqs_pending
>
> The design follows a multi-producer, single-consumer model, where the
> consumer is the vCPU itself. Producers may set bits in
> irqs_pending_mask without a lock. Clearing bits in irqs_pending_mask is
> performed only by the consumer via xchg(). The consumer must not write
> to irqs_pending and must not act on bits that are not set in the mask.
> Otherwise, extra synchronization should be provided.
>
> On RISC-V interrupts are not injected via guest registers, so pending
> interrupts must be recorded in irqs_pending (using the new
> vcpu_{un}set_interrupt() helpers) and flushed to the guest by updating
> HVIP before returning control to the guest. The consumer side is
> implemented in a follow-up patch.
>
> A barrier between updating irqs_pending and setting the corresponding
> mask bit in vcpu_set_interrupt()/vcpu_unset_interrupt() guarantees
> that if the consumer observes a mask bit set, the corresponding pending
> bit is also visible. This prevents missed interrupts during the flush.
>
> It is possible that a guest could have pending bit in the hardware
> register without being marked pending in irq_pending bitmap as:
> According to the RISC-V ISA specification:
> Bits hip.VSSIP and hie.VSSIE are the interrupt-pending and
> interrupt-enable bits for VS-level software interrupts. VSSIP in hip
> is an alias (writable) of the same bit in hvip.
> Additionally:
> When bit 2 of hideleg is zero, vsip.SSIP and vsie.SSIE are read-only
> zeros. Else, vsip.SSIP and vsie.SSIE are aliases of hip.VSSIP and
> hie.VSSIE.
> This means the guest may modify vsip.SSIP, which implicitly updates
> hip.VSSIP and the bit being written with 1 would also trigger an interrupt
> as according to the RISC-V spec:
> These conditions for an interrupt trap to occur must be evaluated in a
> bounded amount of time from when an interrupt becomes, or ceases to be,
> pending in sip, and must also be evaluated immediately following the
> execution of an SRET instruction or an explicit write to a CSR on which
> these interrupt trap conditions expressly depend (including sip, sie and
> sstatus).
> What means that IRQ_VS_SOFT must be synchronized separately, what is done
> in vcpu_sync_interrupts(). Note, also, that IRQ_PMU_OVF would want to be
> synced for the similar reason as IRQ_VS_SOFT, but isn't sync-ed now as
> PMU isn't supported now.
>
> For the remaining VS-level interrupt types (IRQ_VS_TIMER and
> IRQ_VS_EXT), the specification states they cannot be modified by the guest
> and are read-only because of:
"cannot be modified by the guest" is still meaning about anything. Aren't you
intending to merely talk about their pending bits? Then all of ...
> Bits hip.VSEIP and hie.VSEIE are the interrupt-pending and interrupt-enable
> bits for VS-level external interrupts. VSEIP is read-only in hip, and is
> the logical-OR of these interrupt sources:
> • bit VSEIP of hvip;
> • the bit of hgeip selected by hstatus.VGEIN; and
> • any other platform-specific external interrupt signal directed to
> VS-level.
> Bits hip.VSTIP and hie.VSTIE are the interrupt-pending and interrupt-enable
> bits for VS-level timer interrupts. VSTIP is read-only in hip, and is the
> logical-OR of hvip.VSTIP and any other platform-specific timer interrupt
> signal directed to VS-level.
> and
> When bit 10 of hideleg is zero, vsip.SEIP and vsie.SEIE are read-only zeros.
> Else, vsip.SEIP and vsie.SEIE are aliases of hip.VSEIP and hie.VSEIE.
>
> When bit 6 of hideleg is zero, vsip.STIP and vsie.STIE are read-only zeros.
> Else, vsip.STIP and vsie.STIE are aliases of hip.VSTIP and hie.VSTIE.
> and also,
... this is largely irrelevant, while ...
> Bits sip.SEIP and sie.SEIE are the interrupt-pending and interrupt-enable
> bits for supervisor-level external interrupts. If implemented, SEIP is
> read-only in sip, and is set and cleared by the execution environment,
> typically through a platform-specific interrupt controller.
>
> Bits sip.STIP and sie.STIE are the interrupt-pending and interrupt-enable
> bits for supervisor-level timer interrupts. If implemented, STIP is
> read-only in sip, and is set and cleared by the execution environment
... this is important.
> Thus, for these interrupt types, it is sufficient to use vcpu_set_interrupt()
> and vcpu_unset_interrupt(), and flush them during the call of
> vcpu_flush_interrupts() (which is introduced in follow up patch).
>
> vcpu_sync_interrupts(), which is called just before entering the VM,
> slightly bends the rule that the irqs_pending bit must be written
> first, followed by updating the corresponding bit in irqs_pending_mask.
> However, it still respects the core guarantee that the producer never
> clears the mask and only writes to irqs_pending if it is the one that
> flipped the corresponding mask bit from 0 to 1.
> Moreover, since the consumer won't run concurrently because
> vcpu_sync_interrupts() and the consumer path are going to be invoked
> equentially immediately before VM entry, it is safe to slightly relax
Nit: There was an 's' lost at the start of the line.
> @@ -127,3 +128,73 @@ void arch_vcpu_destroy(struct vcpu *v)
> {
> vfree((void *)&v->arch.cpu_info[1] - STACK_SIZE);
> }
> +
> +int vcpu_set_interrupt(struct vcpu *v, unsigned int irq)
> +{
> + /* We only allow VS-mode software, timer, and external interrupts */
> + if ( irq != IRQ_VS_SOFT &&
> + irq != IRQ_VS_TIMER &&
> + irq != IRQ_VS_EXT )
> + return -EINVAL;
> +
> + set_bit(irq, v->arch.irqs_pending);
> + /*
> + * The counterpart of this barrier is the one encoded implicitly in
> xchg()
> + * which is used in consumer part (vcpu_flush_interrupts()).
> + */
> + smp_wmb();
> + set_bit(irq, v->arch.irqs_pending_mask);
Wasn't this meant to go away with ...
> + if ( !test_and_set_bit(irq, v->arch.irqs_pending_mask) )
... the introduction of this?
> + vcpu_kick(v);
Nit: Bad indentation.
As to the test_and_set_bit(): In isolation here this looks correct, but
when taking ...
> + return 0;
> +}
> +
> +int vcpu_unset_interrupt(struct vcpu *v, unsigned int irq)
> +{
> + /* We only allow VS-mode software, timer, external interrupts */
> + if ( irq != IRQ_VS_SOFT &&
> + irq != IRQ_VS_TIMER &&
> + irq != IRQ_VS_EXT )
> + return -EINVAL;
> +
> + clear_bit(irq, v->arch.irqs_pending);
> + /*
> + * The counterpart of this barrier is the one encoded implicitly in
> xchg()
> + * which is used in consumer part (vcpu_flush_interrupts()).
> + */
> + smp_wmb();
> + set_bit(irq, v->arch.irqs_pending_mask);
> +
> + return 0;
> +}
... this into account - what about vcpu_unset_interrupt() followed by
vcpu_set_interrupt()? Shouldn't that also result in a kick? I.e.
shouldn't the condition above be whether either of the two bits
transitioned from 0 to 1?
> --- a/xen/arch/riscv/include/asm/domain.h
> +++ b/xen/arch/riscv/include/asm/domain.h
> @@ -54,6 +54,25 @@ struct arch_vcpu {
> register_t henvcfg;
> register_t hideleg;
> register_t hstateen0;
> + register_t hvip;
> +
> + register_t vsie;
> +
> + /*
> + * VCPU interrupts
> + *
> + * We have a lockless approach for tracking pending VCPU interrupts
> + * implemented using atomic bitops. The irqs_pending bitmap represent
> + * pending interrupts whereas irqs_pending_mask represent bits changed
> + * in irqs_pending. Our approach is modeled around multiple producer
> + * and single consumer problem where the consumer is the VCPU itself.
> + *
> + * DECLARE_BITMAP() is needed here to support 64 vCPU local interrupts
> + * on RV32 host.
> + */
> +#define RISCV_VCPU_NR_IRQS MAX(BITS_PER_LONG, 64)
In the revlog you mention also taking care of "the case if AIA isn't used".
I can't spot anything like this here.
> --- a/xen/arch/riscv/traps.c
> +++ b/xen/arch/riscv/traps.c
> @@ -169,6 +169,13 @@ static void do_unexpected_trap(const struct
> cpu_user_regs *regs)
> die();
> }
>
> +static void check_for_pcpu_work(void)
> +{
> + struct vcpu *c = current;
As indicated, please try to get used to using conventional variable names.
Jan