On 31/07/2017 13:59, Måns Rullgård wrote: > Mason writes: > >> On 29/07/2017 17:18, Florian Fainelli wrote: >> >>> On 07/29/2017 05:02 AM, Mason wrote: >>> >>>> I have identified a 100% reproducible flaw. >>>> I have proposed a work-around that brings this down to 0 >>>> (tested 1000 cycles of link up / ping / link down). >>> >>> Can you also try to get help from your HW resources to eventually help >>> you find out what is going on here? >> >> The patch I proposed /is/ based on the feedback from the HW team :-( >> "Just reset the HW block, and everything will work as expected." > > Nobody is saying a reset won't recover the lockup. The problem is that > we don't know what caused it to lock up in the first place. How do we > know it can't happen during normal operation? If we knew the cause, it > might also be possible to avoid the situation entirely.
How does one prove that something "can't happen during normal operation"? The "put adapter in loop-back mode so we can send ourselves fake packets" shenanigans seems completely insane, if you ask me. Other things make no sense to me, for example in nb8800_dma_stop() there is a polling loop: do { mdelay(100); nb8800_writel(priv, NB8800_TX_DESC_ADDR, txb->dma_desc); wmb(); mdelay(100); nb8800_writel(priv, NB8800_TXC_CR, txcr | TCR_EN); mdelay(5500); err = readl_poll_timeout_atomic(priv->base + NB8800_RXC_CR, rxcr, !(rxcr & RCR_EN), 1000, 100000); printk("err=%d retry=%d\n", err, retry); } while (err && --retry); (It was me who added the delays.) *Whatever* delays I insert, it always goes 3 times through the loop. [ 29.654492] ++ETH++ gw32 reg=f002610c val=9ecc8000 [ 29.759320] ++ETH++ gw32 reg=f0026100 val=005c0aff [ 35.364705] err=-110 retry=5 [ 35.467609] ++ETH++ gw32 reg=f002610c val=9ecc8000 [ 35.572436] ++ETH++ gw32 reg=f0026100 val=005c0aff [ 41.177822] err=-110 retry=4 [ 41.280726] ++ETH++ gw32 reg=f002610c val=9ecc8000 [ 41.385553] ++ETH++ gw32 reg=f0026100 val=005c0aff [ 46.890907] err=0 retry=3 How is that possible? I've tried using spinlocks and delays to get parallel execution down to a minimum, and have the same logs on both boards. Regards.