Hi there. I can confirm this problem still exists in newest kernels and with the latest intel drivers as of today:
Jan 19 16:05:19 osd9 kernel: [511271.581413] i40e 0000:02:00.1: TX driver issue detected, PF reset issued Jan 19 16:09:08 osd9 kernel: [511500.919380] i40e 0000:02:00.0: TX driver issue detected, PF reset issued driver: i40e-2.4.3 (and xenial / 4.13 shipped driver: 2.1.14-k) kernel: 4.13.0-25-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 12:16:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux. Kernel loaded with nopti noibrs noibpb (Meltdown / Spetre mitigation disabled). We can trigger the issue with high load (benchmarking Ceph cluster with fio: 4 clients, 8 threads, iodepth 256, 100% random write, 64K block size). Only when we use relatively large block size (64K) do we hit this problem. With 4K blocks we do not hit this issue. We haven't tested large random reads (that test is still to be done). When using openvswitch port-channel (as we do) with jumbo frames ... this port-channel will not come back online after the reset. rmmod i40e / modprobe i40e does the trick though. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1723127 Title: Intel i40e PF reset due to incorrect MDD detection (continues...) Status in linux package in Ubuntu: In Progress Status in linux source package in Xenial: In Progress Bug description: This is a continuation from bug 1713553; a patch was added in that bug to attempt to fix this, and it may have helped reduce the issue but appears not to have fixed it, based on more reports. The issue is the i40e driver, when TSO is enabled, sometimes sees the NIC firmware issue a "MDD event" where MDD is "Malicious Driver Detection". This is vaguely defined in the i40e spec, but with no way to tell what the NIC actually saw that it didn't like. So, the driver can do nothing but print an error message and reset the PF (or VF). Unfortunately, this resets the interface, which causes an interruption in network traffic flow while the PF is resetting. See bug 1713553 for more details. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp