This bug is missing log files that will aid in diagnosing the problem.
While running an Ubuntu kernel (not a mainline or third-party kernel)
please enter the following command in a terminal window:

apport-collect 1730550

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable
to run this command, please add a comment stating that fact and change
the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the
Ubuntu Kernel Team.

** Changed in: linux (Ubuntu)
       Status: New => Incomplete

** Tags added: xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1730550

Title:
  e1000e in 4.4.0-97-generic breaks 82574L under heavy load.

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  This issue was first reported on the netdev email list by Lennart Sorensen:
  https://www.mail-archive.com/netdev@vger.kernel.org/msg178170.html

  Commit 16ecba59bc333d6282ee057fb02339f77a880beb causes link drops on
  the 82574L under heavy load.

  "Unfortunately this commit changed the driver to assume
  that the Other Causes interrupt can only mean link state change and
  hence sets the flag that (unfortunately) means both link is down and link
  state should be checked.  Since this now happens 3000 times per second,
  the chances of it happening while the watchdog_task is checking the link
  state becomes pretty high, and it if does happen to coincice, then the
  watchdog_task will reset the adapter, which causes a real loss of link."

  A fix for this issue was accepted into the net-next branch, along with
  other e1000e/igb patches:
  https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-
  next.git/commit/?id=f44dea3421b47d355a835e9cfcc59ca7318575a9

  The original reported experienced this issue on a Supermicro X7SPA-
  HF-D525 server board. We see this issue on many servers running X9DBL-
  1F server boards. Both boards use the Intel 82574L for the network
  interfaces.  We see messages like this under heavy load:

  [Nov 6 15:42] e1000e: eth0 NIC Link is Down
  [  +0.001670] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow 
Control: Rx/Tx
  [Nov 6 16:10] e1000e: eth0 NIC Link is Down
  [  +0.008505] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow 
Control: Rx/Tx
  [Nov 7 00:49] e1000e: eth0 NIC Link is Down
  [  +2.235111] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow 
Control: Rx/Tx

  We have confirmed that the connected switch sees the link drops also,
  to these are not false alarms from the e1000e driver.

  # lsb_release -rd
  Description:  Ubuntu 16.04.2 LTS
  Release:      16.04

  I could not cleanly apply the net-next patch to 4.4.0 so I tested with just 
the following cherry picked changes on the latest 4.4.0 kernel source package.
  https://patchwork.ozlabs.org/patch/823942/
  https://patchwork.ozlabs.org/patch/823945/
  https://patchwork.ozlabs.org/patch/823940/
  https://patchwork.ozlabs.org/patch/823941/
  https://patchwork.ozlabs.org/patch/823939/

  Although it's my understanding the first two are the critical ones for
  the race condition. I have been running with the patches e1000e kernel
  driver, under network load for 7 days and I no longer see the network
  interface drops.

  Could we pull these changes into the Ubuntu 4.4.0 kernel ?

  Thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730550/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to