On Fri, Oct 27, 2017 at 3:34 PM, Paweł Staszewski <pstaszew...@itcare.pl> wrote: > Hi > > > > > I have many problems with 40e driver > > memleaks , kernel panics , stack traces , tx hungx , tx timeouts and many > many others :) > > > But the main problem that can't be resolved in linux is resolved in freebsd > > problem in freebsd with this: > > [2501243.181829] i40e 0000:01:00.1 eno2: VSI_seid 390, Hung TX queue 17, > tx_pending_hw: 1, NTC:0x16b, HWB: 0x16b, NTU: 0x16c, TAIL: 0x16c > [2501243.181835] i40e 0000:01:00.1 eno2: VSI_seid 390, Issuing force_wb for > TX queue 17, Interrupt Reg: 0x0 > > > Was solved by this: > > > " > > change this piece in ixl_tso_detect_sparse() in ixl_txrx.c: > > if (mss < 1) { > if (num > IXL_SPARSE_CHAIN) > return (true); > num = (mss == 0) ? 0 : 1; > mss += mp->m_pkthdr.tso_segsz; > } > > to > > if (num > IXL_SPARSE_CHAIN) > return (true); > if (mss < 1) { > num = (mss == 0) ? 0 : 1; > mss += mp->m_pkthdr.tso_segsz; > } > > Intel FreeBSD Team: This will definitely prevent MDDs on the buffers you > sent me. > > " > > > An I have a question - how to do the same in linux ? :)
The same fix is already there. All this is checking for is to make certain we don't span too many descriptors. We added that fix close to a year ago. Take a look at the following, it is the last fix in the set for the same issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander Duyck <alexander.h.du...@intel.com> Date: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K Like I told you we need to look into this. A Tx hang can have many causes. It is just a common symptom. The crashes you included is not Tx hangs. It is something else entirely and is only reproduced on 4.12.X. > Cause i have same problem in Linux with this i40e buggy driver: No, this is not the same issue. This is the same symptom. This is the equivalent of running to the doctor and demanding antibiotics because someone has a cough and insisting it is pneumonia, when for all we know it is the common cold or an allergy. If you want help, my advice is to focus on one issue, document how you get into that state completely, and don't try to throw everything and the kitchen sink in with it. Once you have one trace you can stop there as emailing daily with multiple copies of the same or similar trace, or worse yet unrelated traces, doesn't help anything unless that is specifically being asked for by the person doing the debug. > More here: > https://bugzilla.kernel.org/show_bug.cgi?id=197325 Yes, we are aware of this bugzilla. We are still trying to sort the contents at this point. Unfortunately it is difficult to sort out as from what I can tell there are about 3 or 4 different issues and you jump in-between them somewhat randomly and incoherently so it is hard to sort out what is data for what issue. > Thanks > Pawel I appreciate that you want this fixed, but emailing multiple times a day with a trace but no background, or background and no trace, and then injecting random unrelated questions doesn't help to clarify anything. Essentially it is just trolling. >From what I can tell you have 3 issues that I am aware of: One is that with team driver running on top of the i40e ports you are seeing a NETDEV WATCHDOG being triggered, and we don't know if it is a regression or not as you stated you are seeing it on 4.11.X now with the latest firmware, and the issue didn't previously occur with your previous firmware so we are working to determine if this is a firmware regression and what can be done to resolve it. There is a second issue which is occurring on the latest kernels as a result of the first issue in which the PF is failing to come back up when the issue occurs. This one is being handled as a part of the first issue for now. The third issue is an issue with the Rx ring rx_bi value not being NULL when the interface is being reset in response to the watchdog, and that appears to only happen on something like 4.12 if I recall. The issue appears to be already resolved in 4.13 so there is not much need for us to investigate it unless we need to generate a back-port for 4.12 stable. Does that pretty much sum up everything you are seeing? Is it clear what the status is of us looking into it? If so you don't need to send any more traces or any more updates as we are aware of the issues. We will update the bugzilla if we need more information, or if we have additional information to provide to you. Thanks. - Alex