On Mon, 2015-09-21 at 22:25 +0200, Francois Romieu wrote:
> David Woodhouse <dw...@infradead.org> :
> > From: David Woodhouse <david.woodho...@intel.com>
> > 
> > The TX timeout handling has been observed to trigger RX IRQ storms. And
> > since cp_interrupt() just keeps saying that it handled the interrupt,
> > the machine then dies. Fix the return value from cp_interrupt(), and
> > the offending IRQ gets disabled and the machine survives.
> 
> I am not fond of the way it dissociates the hardware status word and the
> software "handled" variable.

Oh, I like that part very much :)

The practice of returning a 'handled' status only when you've actually
*done* something you expect to mitigate the interrupt, is a useful way
of also protecting against both hardware misbehaviour and software
bugs.

> What you are describing - RX IRQ storms - looks like a problem between
> the irq and poll handlers. That's where I expect the problem to be solved.

I already fixed that, in the next patch in the series. But the failure
mode *should* have been 'IRQ disabled' and the device continuing to
work via polling. Not a complete death of the machine. That's the
difference this patch makes.

> Sprinkling "handled" operations does not make me terribly confortable,
> especially as I'd happily trade the old-style part irq, part napi
> processing for a plain napi processing (I can get over it though :o) ).

The existing cp_rx_poll() function mostly runs without taking cp->lock.
But cp_tx() *does* need cp->lock for the tx_head/tx_tail manipulations.
I'm guessing that's why it's still being called directly from the hard
IRQ handler? I suppose we could probably find a way to move that out.

But I was mostly just trying to fix stuff that was actually broken...
:)

-- 
David Woodhouse                            Open Source Technology Centre
david.woodho...@intel.com                              Intel Corporation

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to