Re: PATCH 2.6.17-rc5 tulip free_irq() called too late

2006-06-08 Thread Grant Grundler
On Thu, Jun 08, 2006 at 10:43:04AM -0400, Jeff Garzik wrote: ... > Perhaps cp_close() in 8139cp.c could be an example of a good ordering? > It stops the chip, syncs irqs, frees irq, then frees [thus unmapping] > the rings. Here is a new patch that moves free_irq() into tulip_down(). The resultin

Re: PATCH 2.6.17-rc5 tulip free_irq() called too late

2006-06-08 Thread Grant Grundler
On Thu, Jun 08, 2006 at 11:38:52AM -0400, Jeff Garzik wrote: > >Can we call free_irq() from tulip_down()? > > I'm sure you can answer that yourself. If it doesn't cause problems > elsewhere, yes. Otherwise, no. :) Yeah, well, I was hoping you would "Just Know" (tm). :) Research takes time. >

Re: PATCH 2.6.17-rc5 tulip free_irq() called too late

2006-06-08 Thread Jeff Garzik
Grant Grundler wrote: Ok...I think I understand what you are driving at here. The case is when CPU vector is enabled and shared but one device _without_ an interrupt handler is registered is still yanking on the interrupt line. It will cause linux to disable the line since the IRQ isn't being han

Re: PATCH 2.6.17-rc5 tulip free_irq() called too late

2006-06-08 Thread Grant Grundler
On Thu, Jun 08, 2006 at 11:32:39AM -0400, Jeff Garzik wrote: > >The chip IRQ gets turned off in tulip_down(). > >It won't be screaming for very long. > > Then you admit that you add a race. Yes - I realized that after I hit :( ... > >In the shared IRQ case, I expect free_irq() to unlink this in

Re: PATCH 2.6.17-rc5 tulip free_irq() called too late

2006-06-08 Thread Grant Grundler
On Thu, Jun 08, 2006 at 09:22:21AM -0600, Grant Grundler wrote: > > Perhaps cp_close() in 8139cp.c could be an example of a good ordering? > > It stops the chip, syncs irqs, frees irq, then frees [thus unmapping] > > the rings. > > Sorry, I don't see how it matters if we disable chip IRQ first >

Re: PATCH 2.6.17-rc5 tulip free_irq() called too late

2006-06-08 Thread Jeff Garzik
Grant Grundler wrote: On Thu, Jun 08, 2006 at 10:43:04AM -0400, Jeff Garzik wrote: (CC'ing our newly minted tulip maintainer, Val) Excellent! Has MAINTAINERS file been updated? :) It should be updated, yes. Calling free_irq() while the chip is still active is just a bad idea, because the

Re: PATCH 2.6.17-rc5 tulip free_irq() called too late

2006-06-08 Thread Grant Grundler
On Thu, Jun 08, 2006 at 10:43:04AM -0400, Jeff Garzik wrote: > (CC'ing our newly minted tulip maintainer, Val) Excellent! Has MAINTAINERS file been updated? :) ... > NAK. This is a band-aid, and one that creates new problems even as it > attempts to solve problems. You failed to demonstrate th

Re: PATCH 2.6.17-rc5 tulip free_irq() called too late

2006-06-08 Thread Jeff Garzik
(CC'ing our newly minted tulip maintainer, Val) Grant Grundler wrote: Jeff, SLES10 testing exposed an MCA that was confirmed to be a DMA IO TLB miss. This means tulip device was attempting to DMA to memory that was already unmapped. The test was crashing in the "ifconfig down" step when a 4-por

PATCH 2.6.17-rc5 tulip free_irq() called too late

2006-05-31 Thread Grant Grundler
Jeff, SLES10 testing exposed an MCA that was confirmed to be a DMA IO TLB miss. This means tulip device was attempting to DMA to memory that was already unmapped. The test was crashing in the "ifconfig down" step when a 4-port tulip card was under this work load: while : do ifconfig eth24