On 15.03.2019 21:09, VDR User wrote: >>>>> Thanks for the additional info and for testing 4.20.15. >>>>> To rule out that the issue is caused by a regression in network or >>>>> some other subsystem: Can you take the r8169.c from 4.20.15 and test >>>>> it on top of 5.0? >>>>> Meanwhile I'll look at the changes in the driver between 4.20 and 5.0. >>>> >>>> Sure, no problem! I'll copy the driver & recompile now actually. >>>> Hopefully there aren't a ton of changes to r8169.c to sift through and >>>> the cause isn't good at hiding itself! >>>> >>> I checked the driver changes new in 5.0 and there are very few >>> functional changes. You could try to revert the following: >>> >>> 5317d5c6d47e ("r8169: use napi_consume_skb where possible") >> >> Will do, and fwiw, while I haven't been able to do tons of testing >> today, I haven't been able to trigger the crash after replacing >> 5.0.0's r8169.c with 4.20.15's r8169.c this morning. I'll restore the >> file and revert the change you mentioned, and report back my findings. > > Heiner, > > After going back to vanilla kernel 5.0 and then reverting 5317d5c6d47e > ("r8169: use napi_consume_skb where possible"), I so far have not had > any crashes after transferring roughly 30GB back & forth. I'm not > completely confident yet the crash is resolve with that revert and > will continue to do further testing throughout the weekend as well. > What confidence level do you have that 5317d5c6d47e is the culprit at > this point? > Good, thanks for testing. I simply see no other change since 4.20 that could cause these symptoms. Using napi_consume_skb() at this place in r8169.c looks safe to me. Option 1 is that I miss something, option 2 is that there's an issue in the NAPI subsystem. However in the latter case I assume at least the Mellanox and/or Intel guys would have observed the same issue on their respective CI systems. Let me add Alexander, maybe he can provide a hint before we go and revert the change.
> Thanks, > Derek > Heiner