Hello, Damien Zammit, le jeu. 10 juil. 2025 10:14:21 +0000, a ecrit: > This avoids race condition with multiple devices raising interrupts > simultaneously on the same IRQ and causing mask to fail.
Are we *sure* that PCI interrupts may get raised on several cpus concurrently ? It makes the race game *really* more complex... The very probable scenario, however, is __enable_irq() getting called concurrently (since it is just called on the irq_acknowledge RPC), and splhigh() is not enough to protect against that. Rather than using an atomic counter, which is *really* difficult to get right, I'd say just introduce one simple_lock per interrupt, which you can take with simple_lock_irq instead of splhigh(). It should be solving everything in a way which is way simpler to be sure of. Along the way, better replace unsigned int ndisabled_irq with a structure containing ndisabled_irq and the simple_lock, and make it aligned on the cache line size, so each irq has its own cache line, and reduce cache ping-pong. > --- > i386/i386/irq.c | 26 ++++++++++++++++++++------ > 1 file changed, 20 insertions(+), 6 deletions(-) > > diff --git a/i386/i386/irq.c b/i386/i386/irq.c > index 91318f67..d91b3856 100644 > --- a/i386/i386/irq.c > +++ b/i386/i386/irq.c > @@ -41,11 +41,19 @@ __disable_irq (irq_t irq_nr) > { > assert (irq_nr < NINTR); > > + /* The spl wrapping protects the same cpu > + * from being interrupted while updating this */ > spl_t s = splhigh(); > - ndisabled_irq[irq_nr]++; > + > + /* masking the irq unconditionally prevents > + * other cpus being interrupted on this irq */ > + mask_irq (irq_nr); There is a window here where __enable_irq() can unmask the interrupt before we increment ndisabled_irq. > + /* we can atomically increment the nesting counter now */ > + __atomic_add_fetch(&ndisabled_irq[irq_nr], 1, __ATOMIC_RELAXED); > assert (ndisabled_irq[irq_nr] > 0); > - if (ndisabled_irq[irq_nr] == 1) > - mask_irq (irq_nr); > + > splx(s); > } > > @@ -54,11 +62,17 @@ __enable_irq (irq_t irq_nr) > { > assert (irq_nr < NINTR); > > + /* The spl wrapping protects the same cpu > + * from being interrupted while updating this */ > spl_t s = splhigh(); > - assert (ndisabled_irq[irq_nr] > 0); > - ndisabled_irq[irq_nr]--; > - if (ndisabled_irq[irq_nr] == 0) > + > + /* This could be racy if the irq was already unmasked, > + * and the irq is not guaranteed to be masked at this point if more > + * than one device has called us almost simultaneously on this irq. > + * Therefore it is only safe to decrement the counter here atomically > inside the check */ > + if (__atomic_add_fetch(&ndisabled_irq[irq_nr], -1, __ATOMIC_RELAXED) == 0) There is a window here where __disable_irq() can mask the interrupt, but then we unmask it. > unmask_irq (irq_nr); Samuel