Hello,

Damien Zammit, le jeu. 10 juil. 2025 10:14:21 +0000, a ecrit:
> This avoids race condition with multiple devices raising interrupts
> simultaneously on the same IRQ and causing mask to fail.

Are we *sure* that PCI interrupts may get raised on several cpus
concurrently ? It makes the race game *really* more complex...

The very probable scenario, however, is __enable_irq() getting called
concurrently (since it is just called on the irq_acknowledge RPC), and
splhigh() is not enough to protect against that.

Rather than using an atomic counter, which is *really* difficult to get
right, I'd say just introduce one simple_lock per interrupt, which you
can take with simple_lock_irq instead of splhigh(). It should be solving
everything in a way which is way simpler to be sure of.

Along the way, better replace unsigned int ndisabled_irq with a
structure containing ndisabled_irq and the simple_lock, and make it
aligned on the cache line size, so each irq has its own cache line, and
reduce cache ping-pong.

> ---
>  i386/i386/irq.c | 26 ++++++++++++++++++++------
>  1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/i386/i386/irq.c b/i386/i386/irq.c
> index 91318f67..d91b3856 100644
> --- a/i386/i386/irq.c
> +++ b/i386/i386/irq.c
> @@ -41,11 +41,19 @@ __disable_irq (irq_t irq_nr)
>  {
>    assert (irq_nr < NINTR);
>  
> +  /* The spl wrapping protects the same cpu
> +   * from being interrupted while updating this */
>    spl_t s = splhigh();
> -  ndisabled_irq[irq_nr]++;
> +
> +  /* masking the irq unconditionally prevents
> +   * other cpus being interrupted on this irq */
> +  mask_irq (irq_nr);

There is a window here where __enable_irq() can unmask the interrupt
before we increment ndisabled_irq.

> +  /* we can atomically increment the nesting counter now */
> +  __atomic_add_fetch(&ndisabled_irq[irq_nr], 1, __ATOMIC_RELAXED);
>    assert (ndisabled_irq[irq_nr] > 0);
> -  if (ndisabled_irq[irq_nr] == 1)
> -    mask_irq (irq_nr);
> +
>    splx(s);
>  }
>  
> @@ -54,11 +62,17 @@ __enable_irq (irq_t irq_nr)
>  {
>    assert (irq_nr < NINTR);
>  
> +  /* The spl wrapping protects the same cpu
> +   * from being interrupted while updating this */
>    spl_t s = splhigh();
> -  assert (ndisabled_irq[irq_nr] > 0);
> -  ndisabled_irq[irq_nr]--;
> -  if (ndisabled_irq[irq_nr] == 0)
> +
> +  /* This could be racy if the irq was already unmasked,
> +   * and the irq is not guaranteed to be masked at this point if more
> +   * than one device has called us almost simultaneously on this irq.
> +   * Therefore it is only safe to decrement the counter here atomically 
> inside the check */ 
> +  if (__atomic_add_fetch(&ndisabled_irq[irq_nr], -1, __ATOMIC_RELAXED) == 0)

There is a window here where __disable_irq() can mask the interrupt, but
then we unmask it.

>      unmask_irq (irq_nr);

Samuel

Reply via email to