On Mon, 2015-09-21 at 22:25 +0200, Francois Romieu wrote: > David Woodhouse <dw...@infradead.org> : > > From: David Woodhouse <david.woodho...@intel.com> > > > > The TX timeout handling has been observed to trigger RX IRQ storms. And > > since cp_interrupt() just keeps saying that it handled the interrupt, > > the machine then dies. Fix the return value from cp_interrupt(), and > > the offending IRQ gets disabled and the machine survives. > > I am not fond of the way it dissociates the hardware status word and the > software "handled" variable.
Oh, I like that part very much :) The practice of returning a 'handled' status only when you've actually *done* something you expect to mitigate the interrupt, is a useful way of also protecting against both hardware misbehaviour and software bugs. > What you are describing - RX IRQ storms - looks like a problem between > the irq and poll handlers. That's where I expect the problem to be solved. I already fixed that, in the next patch in the series. But the failure mode *should* have been 'IRQ disabled' and the device continuing to work via polling. Not a complete death of the machine. That's the difference this patch makes. > Sprinkling "handled" operations does not make me terribly confortable, > especially as I'd happily trade the old-style part irq, part napi > processing for a plain napi processing (I can get over it though :o) ). The existing cp_rx_poll() function mostly runs without taking cp->lock. But cp_tx() *does* need cp->lock for the tx_head/tx_tail manipulations. I'm guessing that's why it's still being called directly from the hard IRQ handler? I suppose we could probably find a way to move that out. But I was mostly just trying to fix stuff that was actually broken... :) -- David Woodhouse Open Source Technology Centre david.woodho...@intel.com Intel Corporation
smime.p7s
Description: S/MIME cryptographic signature