On 13.02.2026 17:28, Oleksii Kurochko wrote:
> Based on Linux kernel v6.16.0.
> Note that smp_wmb() is used instead of smp_mb__before_atomic() as what
> we want to guarantee that if a bit in irqs_pending_mask is obversable
> that the correspondent bit in irqs_pending is observable too.
> 
> Add lockless tracking of pending vCPU interrupts using atomic bitops.
> Two bitmaps are introduced:
>  - irqs_pending — interrupts currently pending for the vCPU
>  - irqs_pending_mask — bits that have changed in irqs_pending
> 
> The design follows a multi-producer, single-consumer model, where the
> consumer is the vCPU itself. Producers may set bits in
> irqs_pending_mask without a lock. Clearing bits in irqs_pending_mask is
> performed only by the consumer via xchg(). The consumer must not write
> to irqs_pending and must not act on bits that are not set in the mask.
> Otherwise, extra synchronization should be provided.
> 
> On RISC-V interrupts are not injected via guest registers, so pending
> interrupts must be recorded in irqs_pending (using the new
> vcpu_{un}set_interrupt() helpers) and flushed to the guest by updating
> HVIP before returning control to the guest. The consumer side is
> implemented in a follow-up patch.
> 
> A barrier between updating irqs_pending and setting the corresponding
> mask bit in vcpu_set_interrupt()/vcpu_unset_interrupt() guarantees
> that if the consumer observes a mask bit set, the corresponding pending
> bit is also visible. This prevents missed interrupts during the flush.
> 
> It is possible that a guest could have pending bit in the hardware
> register without being marked pending in irq_pending bitmap as:
>   According to the RISC-V ISA specification:
>     Bits hip.VSSIP and hie.VSSIE are the interrupt-pending and
>     interrupt-enable  bits for VS-level software interrupts. VSSIP in hip
>     is an alias (writable) of the same bit in hvip.
>   Additionally:
>     When bit 2 of hideleg is zero, vsip.SSIP and vsie.SSIE are read-only
>     zeros. Else, vsip.SSIP and vsie.SSIE are aliases of hip.VSSIP and
>     hie.VSSIE.
> This means the guest may modify vsip.SSIP, which implicitly updates
> hip.VSSIP and the bit being written with 1 would also trigger an interrupt
> as according to the RISC-V spec:
>   These conditions for an interrupt trap to occur must be evaluated in a
>   bounded   amount of time from when an interrupt becomes, or ceases to be,
>   pending in sip,  and must also be evaluated immediately following the
>   execution of an SRET  instruction or an explicit write to a CSR on which
>   these interrupt trap conditions expressly depend (including sip, sie and
>   sstatus).
> What means that IRQ_VS_SOFT must be synchronized separately, what is done
> in vcpu_sync_interrupts(). Note, also, that IRQ_PMU_OVF would want to be
> synced for the similar reason as IRQ_VS_SOFT, but isn't sync-ed now as
> PMU isn't supported now.
> 
> For the remaining VS-level interrupt types (IRQ_VS_TIMER and
> IRQ_VS_EXT), the specification states they cannot be modified by the guest
> and are read-only because of:

"cannot be modified by the guest" is still meaning about anything. Aren't you
intending to merely talk about their pending bits? Then all of ...

>   Bits hip.VSEIP and hie.VSEIE are the interrupt-pending and interrupt-enable
>   bits for VS-level external interrupts. VSEIP is read-only in hip, and is
>   the logical-OR of these interrupt sources:
>     • bit VSEIP of hvip;
>     • the bit of hgeip selected by hstatus.VGEIN; and
>     • any other platform-specific external interrupt signal directed to
>       VS-level.
>   Bits hip.VSTIP and hie.VSTIE are the interrupt-pending and interrupt-enable
>   bits for VS-level timer interrupts. VSTIP is read-only in hip, and is the
>   logical-OR of hvip.VSTIP and any other platform-specific timer interrupt
>   signal directed to VS-level.
> and
>   When bit 10 of hideleg is zero, vsip.SEIP and vsie.SEIE are read-only zeros.
>   Else, vsip.SEIP and vsie.SEIE are aliases of hip.VSEIP and hie.VSEIE.
> 
>   When bit 6 of hideleg is zero, vsip.STIP and vsie.STIE are read-only zeros.
>   Else, vsip.STIP and vsie.STIE are aliases of hip.VSTIP and hie.VSTIE.
> and also,

... this is largely irrelevant, while ...

>   Bits sip.SEIP and sie.SEIE are the interrupt-pending and interrupt-enable
>   bits for supervisor-level external interrupts. If implemented, SEIP is
>   read-only in sip, and is set and cleared by the execution environment,
>   typically through a platform-specific interrupt controller.
> 
>   Bits sip.STIP and sie.STIE are the interrupt-pending and interrupt-enable
>   bits for supervisor-level timer interrupts. If implemented, STIP is
>   read-only in sip, and is set and cleared by the execution environment

... this is important.

> Thus, for these interrupt types, it is sufficient to use vcpu_set_interrupt()
> and vcpu_unset_interrupt(), and flush them during the call of
> vcpu_flush_interrupts() (which is introduced in follow up patch).
> 
> vcpu_sync_interrupts(), which is called just before entering the VM,
> slightly bends the rule that the irqs_pending bit must be written
> first, followed by updating the corresponding bit in irqs_pending_mask.
> However, it still respects the core guarantee that the producer never
> clears the mask and only writes to irqs_pending if it is the one that
> flipped the corresponding mask bit from 0 to 1.
> Moreover, since the consumer won't run concurrently because
> vcpu_sync_interrupts() and the consumer path are going to be invoked
> equentially immediately before VM entry, it is safe to slightly relax

Nit: There was an 's' lost at the start of the line.

> @@ -127,3 +128,73 @@ void arch_vcpu_destroy(struct vcpu *v)
>  {
>      vfree((void *)&v->arch.cpu_info[1] - STACK_SIZE);
>  }
> +
> +int vcpu_set_interrupt(struct vcpu *v, unsigned int irq)
> +{
> +    /* We only allow VS-mode software, timer, and external interrupts */
> +    if ( irq != IRQ_VS_SOFT &&
> +         irq != IRQ_VS_TIMER &&
> +         irq != IRQ_VS_EXT )
> +        return -EINVAL;
> +
> +    set_bit(irq, v->arch.irqs_pending);
> +    /*
> +     * The counterpart of this barrier is the one encoded implicitly in 
> xchg()
> +     * which is used in consumer part (vcpu_flush_interrupts()).
> +     */
> +    smp_wmb();
> +    set_bit(irq, v->arch.irqs_pending_mask);

Wasn't this meant to go away with ...

> +    if ( !test_and_set_bit(irq, v->arch.irqs_pending_mask) )

... the introduction of this?

> +      vcpu_kick(v);

Nit: Bad indentation.

As to the test_and_set_bit(): In isolation here this looks correct, but
when taking ...

> +    return 0;
> +}
> +
> +int vcpu_unset_interrupt(struct vcpu *v, unsigned int irq)
> +{
> +    /* We only allow VS-mode software, timer, external interrupts */
> +    if ( irq != IRQ_VS_SOFT &&
> +         irq != IRQ_VS_TIMER &&
> +         irq != IRQ_VS_EXT )
> +        return -EINVAL;
> +
> +    clear_bit(irq, v->arch.irqs_pending);
> +    /*
> +     * The counterpart of this barrier is the one encoded implicitly in 
> xchg()
> +     * which is used in consumer part (vcpu_flush_interrupts()).
> +     */
> +    smp_wmb();
> +    set_bit(irq, v->arch.irqs_pending_mask);
> +
> +    return 0;
> +}

... this into account - what about vcpu_unset_interrupt() followed by
vcpu_set_interrupt()? Shouldn't that also result in a kick? I.e.
shouldn't the condition above be whether either of the two bits
transitioned from 0 to 1?

> --- a/xen/arch/riscv/include/asm/domain.h
> +++ b/xen/arch/riscv/include/asm/domain.h
> @@ -54,6 +54,25 @@ struct arch_vcpu {
>      register_t henvcfg;
>      register_t hideleg;
>      register_t hstateen0;
> +    register_t hvip;
> +
> +    register_t vsie;
> +
> +    /*
> +     * VCPU interrupts
> +     *
> +     * We have a lockless approach for tracking pending VCPU interrupts
> +     * implemented using atomic bitops. The irqs_pending bitmap represent
> +     * pending interrupts whereas irqs_pending_mask represent bits changed
> +     * in irqs_pending. Our approach is modeled around multiple producer
> +     * and single consumer problem where the consumer is the VCPU itself.
> +     *
> +     * DECLARE_BITMAP() is needed here to support 64 vCPU local interrupts
> +     * on RV32 host.
> +     */
> +#define RISCV_VCPU_NR_IRQS MAX(BITS_PER_LONG, 64)

In the revlog you mention also taking care of "the case if AIA isn't used".
I can't spot anything like this here.

> --- a/xen/arch/riscv/traps.c
> +++ b/xen/arch/riscv/traps.c
> @@ -169,6 +169,13 @@ static void do_unexpected_trap(const struct 
> cpu_user_regs *regs)
>      die();
>  }
>  
> +static void check_for_pcpu_work(void)
> +{
> +    struct vcpu *c = current;

As indicated, please try to get used to using conventional variable names.

Jan

Reply via email to