On 29/07/2017 22:15, Florian Fainelli wrote: > On 07/29/2017 05:44 AM, Mason wrote: > >> We tested 4 switches, and DHCP failed on 3 of them. >> Disabling pause frames "fixed" that. > > OK, so it is this problem that you reported about before?
The "Ethernet flow control / pause frames" issue is separate from the "link down wedges RX" issue. We discussed the former back in November 2016: https://www.mail-archive.com/netdev@vger.kernel.org/msg137094.html https://patchwork.ozlabs.org/patch/694577/ Wait a second... I see that you and Mans had the following exchange: https://www.mail-archive.com/netdev@vger.kernel.org/msg138175.html Mans mentions disabling DMA to be able to change the flow control bits. The current theory is that it is disabling DMA in ndo_stop that wedges RX. So maybe the two issues are related after all... I hate all these hardware quirks. Why can't HW engineers make stuff that "just works"... > Pause frames are tricky in that receiving pause frames means you > should backpressure your transmitter and sending pause frames happens > when your receiver cannot keep up. It is somewhat conceivable that > your HW implementation is bogus and that you can get the HW in a > state where it gets permanently backpressured for instance? And then > only a full re-init would get you out of this stuck state presumably? > Are there significant differences at the DMA/Ethernet controller > level between Tango 3 (is that the one Mans worked on?) and Tango 4 > for instance that could explain a behavioral difference? I'll have to take a look at the issue in light of the new information. FWIW, Mans has tango3&4 boards. I work on newer boards. The HW dev *swears* there have been no functional differences in the eth block "forever". However, bus accesses are faster in recent chips, which could change who wins specific races. Regards.