Bug#777683: e1000e driver, empty TX queue after IP drop causes dev_watchdog

S Egbert Tue, 24 May 2016 18:09:25 -0700

I too have the same problem on Debian as 3 others do.

As a former Ethernet driver developer, I noticed that the queue is emptywhen the interrupt was fired. And that it appeared hung in the Linuxqdisc portion at Interrupt context, to a point of having watchdog timerexpiring.


My relevant details is:
    Dell OptiPlex 980
    3.16.0-4-amd64
    linux/3.16.7-ckt25-2 (2016-04-08) x86_64
    Intel Gigabit Ethernet 82578DM Gigabit Network Connection (rev 05)

From what I've gathered from the following potentially duplicate bug#798512 and Intel Community Forums:


1 - It isn't CPU-related
2.  This error happened in the following Linux kernel versions:
    a. 3.16.0-4-amd64
    b. 3.19.5 (source: Intel communities)
    c. 4.3+70~bpo8+1
    b. 3.16.7-ckt11-1

3. This error does NOT happen in the following Linux kernel versions(take this with a grain of salt, for we haven't a reliable repeatablebug inducement yet):

    a. 3.16.7-ckt20-1+deb8u4
4. Intel driver used but still have error
   b. 3.3.3-NAPI
5. Intel hardware having this problem
  a. Intel I217-V (rev 04) (onboard) (has lspci SERR-)
  b. Intel 82578DM (rev 05) (onboard)  (has lspci SERR+)
  c. Intel Corporation 82579V Gigabit Network Connection (rev 05) (onboard)
6. Linux network

a. eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdiscpfifo_fast state UP group default qlen 1000b. eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fastmaster br0 state UP mode DEFAULT group default qlen 1000

So far, common thread of the alike problems is the following (morereports will eliminate a few):

1.  e1000e driver
2.  ip link using 'qdisc' and 'pfifo_fast' option
2.  onboard Ethernet (PCI-related?)
3. Starting at Linux 3.16.0
4.  IP outgoing packets dropped was non-zero (mostly 32 packets)
4.  share similar call stack backtrace:

Bug #777683 call stack backtrace

[ 295.041406] <IRQ> [<ffffffff8150b405>] ? dump_stack+0x41/0x51 [295.041417] [<ffffffff81067797>] ? warn_slowpath_common+0x77/0x90 [295.041420] [<ffffffff810677fc>] ? warn_slowpath_fmt+0x4c/0x50 [295.041425] [<ffffffff81074777>] ? mod_timer+0x127/0x1e0 [ 295.041430][<ffffffff8143eb96>] ? dev_watchdog+0x236/0x240 [ 295.041433][<ffffffff8143e960>] ? dev_graft_qdisc+0x70/0x70 [ 295.041436][<ffffffff81072ae1>] ? call_timer_fn+0x31/0x100 [ 295.041439][<ffffffff8143e960>] ? dev_graft_qdisc+0x70/0x70 [ 295.041442][<ffffffff81074119>] ? run_timer_softirq+0x209/0x2f0 [ 295.041445][<ffffffff8106c641>] ? __do_softirq+0xf1/0x290 [ 295.041448][<ffffffff8106ca15>] ? irq_exit+0x95/0xa0 [ 295.041451][<ffffffff81514455>] ? smp_apic_timer_interrupt+0x45/0x60 [ 295.041455][<ffffffff8151253d>] ? apic_timer_interrupt+0x6d/0x80 [ 295.041456]<EOI> [<ffffffff81074a26>] ? get_next_timer_interrupt+0x1d6/0x250 [295.041465] [<ffffffff813ddf9f>] ? cpuidle_enter_state+0x4f/0xc0 [295.041468] [<ffffffff813ddf98>] ? cpuidle_enter_state+0x48/0xc0 [295.041472] [<ffffffff810a7fa8>] ? cpu_startup_entry+0x2f8/0x400 [295.041475] [<ffffffff81903071>] ? start_kernel+0x492/0x49d [295.041478] [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e [ 295.041480][<ffffffff81902120>] ? early_idt_handlers+0x120/0x120 [ 295.041483][<ffffffff8190271f>] ? x86_64_start_kernel+0x14d/0x15c [ 295.041485]---[ end trace aaf46f7eeccba58f ]--- [ 295.041502] e1000e 0000:00:19.0eth-office: Reset adapter unexpectedly


Intel Community Forums (Intel 3.3.3-NAPI driver):
(source: https://communities.intel.com/message/305442#305442)
<IRQ>
[<ffffffff812e1ac9>] ? dump_stack+0x40/0x57
[<ffffffff81074451>] ? warn_slowpath_common+0x81/0xb0
[<ffffffff810744dc>] ? warn_slowpath_fmt+0x5c/0x80
[<ffffffff814b89e9>] ? dev_watchdog+0x229/0x240
[<ffffffff814b87c0>] ? dev_deactivate_queue.constprop.34+0x60/0x60
[<ffffffff810d6e90>] ? call_timer_fn+0x30/0xf0
[<ffffffff814b87c0>] ? dev_deactivate_queue.constprop.34+0x60/0x60
[<ffffffff810d861d>] ? run_timer_softirq+0x17d/0x2b0
[<ffffffff81078ca7>] ? __do_softirq+0x107/0x270
[<ffffffff81078f46>] ? irq_exit+0x86/0x90
[<ffffffff8158d90e>] ? smp_apic_timer_interrupt+0x3e/0x50
[<ffffffff8158b7a2>] ? apic_timer_interrupt+0x82/0x90
<EOI>
[<ffffffff8145ce08>] ? cpuidle_enter_state+0xe8/0x220
[<ffffffff8145cde3>] ? cpuidle_enter_state+0xc3/0x220
[<ffffffff810b3894>] ? cpu_startup_entry+0x294/0x350
[<ffffffff8104b600>] ? start_secondary+0x150/0x190

Debian Bug #798512

<ffffffff81067797>] ? warn_slowpath_common+0x77/0x90
<ffffffff810677fc>] ? warn_slowpath_fmt+0x4c/0x50
<ffffffff81074777>] ? mod_timer+0x127/0x1e0
<ffffffff8143eb96>] ? dev_watchdog+0x236/0x240
<ffffffff8143e960>] ? dev_graft_qdisc+0x70/0x70
<ffffffff81072ae1>] ? call_timer_fn+0x31/0x100
<ffffffff8143e960>] ? dev_graft_qdisc+0x70/0x70
<ffffffff81074119>] ? run_timer_softirq+0x209/0x2f0
<ffffffff8106c641>] ? __do_softirq+0xf1/0x290
<ffffffff8106ca15>] ? irq_exit+0x95/0xa0

My /var/log/message (3.6.14):
dmesg: e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k

dmesg: e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set todynamic conservative mode

dmesg: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:

May 24 18:44:55 sandbay kernel: [ 840.766377] <IRQ>[<ffffffff8150e835>] ? dump_stack+0x5d/0x78May 24 18:44:55 sandbay kernel: [ 840.766391] [<ffffffff810677f7>] ?warn_slowpath_common+0x77/0x90May 24 18:44:55 sandbay kernel: [ 840.766396] [<ffffffff8106785c>] ?warn_slowpath_fmt+0x4c/0x50May 24 18:44:55 sandbay kernel: [ 840.766410] [<ffffffff81440f86>] ?dev_watchdog+0x236/0x240May 24 18:44:55 sandbay kernel: [ 840.766418] [<ffffffff81440d50>] ?dev_graft_qdisc+0x70/0x70May 24 18:44:55 sandbay kernel: [ 840.766424] [<ffffffff81072ba1>] ?call_timer_fn+0x31/0x100May 24 18:44:55 sandbay kernel: [ 840.766435] [<ffffffff81440d50>] ?dev_graft_qdisc+0x70/0x70May 24 18:44:55 sandbay kernel: [ 840.766439] [<ffffffff810741d9>] ?run_timer_softirq+0x209/0x2f0May 24 18:44:55 sandbay kernel: [ 840.766444] [<ffffffff8106c6a1>] ?__do_softirq+0xf1/0x290May 24 18:44:55 sandbay kernel: [ 840.766452] [<ffffffff8106ca75>] ?irq_exit+0x95/0xa0May 24 18:44:55 sandbay kernel: [ 840.766457] [<ffffffff81517822>] ?do_IRQ+0x52/0xe0May 24 18:44:55 sandbay kernel: [ 840.766465] [<ffffffff8151566d>] ?common_interrupt+0x6d/0x6dMay 24 18:44:55 sandbay kernel: [ 840.766467] <EOI>[<ffffffff813e011f>] ? cpuidle_enter_state+0x4f/0xc0May 24 18:44:55 sandbay kernel: [ 840.766475] [<ffffffff813e0118>] ?cpuidle_enter_state+0x48/0xc0May 24 18:44:55 sandbay kernel: [ 840.766483] [<ffffffff810a8398>] ?cpu_startup_entry+0x2f8/0x400May 24 18:44:55 sandbay kernel: [ 840.766488] [<ffffffff81042cbf>] ?start_secondary+0x20f/0x2d0

Some helpful tips for those who do have this same problem is to providethe outputof the following shell commands:

- uname -a
- lspci -vv

- dmesg | grep e1000 # not 'grep e1000e', we want to know ifconflicts between Intel Eth driver exist- ip -s link show # we want to know if there are 1 or more Ethernetnetdevice

- callstack backtrace (from dmesg or /var/log/message)
- firmware version

Bug#777683: e1000e driver, empty TX queue after IP drop causes dev_watchdog

Reply via email to