mrk, to clarify, which version(s) of 4.4 specifically were you testing? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1524259
Title: igb: Detected Tx Unit Hang with stack trace Status in linux package in Ubuntu: Incomplete Bug description: Hello. For some time now we have a problem with one of our servers, that happens sporadically (once in a day or two days) and causes are not still known. We searched on lauchpad and tried many possible solutions, but nothing helped. We had tried vanilla Ubuntu 14.04.3 kernel - 3.16.x, and also 3.19.0-25-generic and linux- image-3.19.0-33-generic - the same symptoms on all of these versions. We also tried to rollback to 3.13: 3.13.0-43-generic and 3.13.0-62-generic, but the problem still persists. Our current configuration is: Ubuntu 14.04.3 with kernel 3.13.0-43.72 with Xen 4.4.2-0ubuntu0.14.04.3 (this host is used as xen hypervisor with iSCSI initiator if it is important). And here is how it's going: kernel: [135522.062941] igb 0000:01:00.1: Detected Tx Unit Hang kernel: [135522.062941] Tx Queue <5> kernel: [135522.062941] TDH <e> kernel: [135522.062941] TDT <21> kernel: [135522.062941] next_to_use <21> kernel: [135522.062941] next_to_clean <e> kernel: [135522.062941] buffer_info[next_to_clean] kernel: [135522.062941] time_stamp <10203c3ca> kernel: [135522.062941] next_to_watch <ffff8800bac590f0> kernel: [135522.062941] jiffies <10203c4e6> kernel: [135522.062941] desc.status <1c8200> kernel: [135526.063054] desc.status <0> Many of messages like this. Right after that we have reports like: kernel: [135526.982825] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4328767466, last ping 4328768718, now 4328769972 kernel: [135526.982911] connection2:0: detected conn error (1011) And finally: kernel: [135527.014836] WARNING: CPU: 8 PID: 0 at /build/buildd/linux-3.13.0/net/sched/sch_generic.c:264 dev_watchdog+0x276/0x280() kernel: [135527.014839] NETDEV WATCHDOG: eth1 (igb): transmit queue 4 timed out kernel: [135527.014841] Modules linked in: xt_physdev xen_netback xen_blkback cls_u32 sch_sfq sch_htb xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xen_gntdev xen_evtchn xenfs xen_privcmd ip6_tables ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi gpio_ich joydev ioatdma serio_raw mac_hid shpchp lpc_ich i7core_edac intel_powerclamp coretemp edac_core lp parport hid_generic usbhid hid raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear iptable_raw nf_nat nf_conntrack iptable_mangle iptable_filter psmouse ip_tables igb x_tables ahci libahci i2c_algo_bit dca ptp bridge pps_core 8021q garp stp llc mrp kernel: [135527.014903] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 3.13.0-43-generic #72-Ubuntu kernel: [135527.014905] Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.1c 08/03/2012 kernel: [135527.014907] 0000000000000009 ffff880268103d98 ffffffff81720bf6 ffff880268103de0 kernel: [135527.014912] ffff880268103dd0 ffffffff810677cd 0000000000000004 ffff880250b18000 kernel: [135527.014916] ffff8800030e5940 0000000000000008 0000000000000008 ffff880268103e30 kernel: [135527.014920] Call Trace: kernel: [135527.014923] <IRQ> [<ffffffff81720bf6>] dump_stack+0x45/0x56 kernel: [135527.014934] [<ffffffff810677cd>] warn_slowpath_common+0x7d/0xa0 kernel: [135527.014937] [<ffffffff8106783c>] warn_slowpath_fmt+0x4c/0x50 kernel: [135527.014943] [<ffffffff81645686>] dev_watchdog+0x276/0x280 kernel: [135527.014947] [<ffffffff81645410>] ? dev_graft_qdisc+0x80/0x80 kernel: [135527.014952] [<ffffffff81074386>] call_timer_fn+0x36/0x100 kernel: [135527.014955] [<ffffffff81645410>] ? dev_graft_qdisc+0x80/0x80 kernel: [135527.014959] [<ffffffff8107531f>] run_timer_softirq+0x1ef/0x2f0 kernel: [135527.014964] [<ffffffff8106cc1c>] __do_softirq+0xec/0x2c0 kernel: [135527.014969] [<ffffffff8106d165>] irq_exit+0x105/0x110 kernel: [135527.014976] [<ffffffff814340f5>] xen_evtchn_do_upcall+0x35/0x50 kernel: [135527.014981] [<ffffffff8173313e>] xen_do_hypervisor_callback+0x1e/0x30 kernel: [135527.014982] <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 kernel: [135527.014990] [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 kernel: [135527.014996] [<ffffffff81009e20>] ? xen_safe_halt+0x10/0x20 kernel: [135527.015001] [<ffffffff8101caaf>] ? default_idle+0x1f/0xc0 kernel: [135527.015005] [<ffffffff8101d376>] ? arch_cpu_idle+0x26/0x30 kernel: [135527.015010] [<ffffffff810bef35>] ? cpu_startup_entry+0xc5/0x290 kernel: [135527.015015] [<ffffffff810101b8>] ? cpu_bringup_and_idle+0x18/0x20 kernel: [135527.015018] ---[ end trace 431e88429488f9a4 ]--- kernel: [135527.015044] igb 0000:01:00.1 eth1: Reset adapter Then the network connection to this machine is dead and it tries to reconnect continuously, but with no success. We had no problems after rollback to 3.13.0-43 kernel in about a week, but now it's continue crashing with the above error. I'm not sure how to diagnose this, so need assist. Thanks. Thats what we have in dmesg about the NIC's: [ 15.220822] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k [ 15.220882] igb: Copyright (c) 2007-2013 Intel Corporation. [ 15.421684] igb 0000:01:00.0: added PHC on eth0 [ 15.421770] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection [ 15.421827] igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:25:90:00:cc:fc [ 15.421885] igb 0000:01:00.0: eth0: PBA No: Unknown [ 15.421939] igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) [ 15.621679] igb 0000:01:00.1: added PHC on eth1 [ 15.621747] igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection [ 15.621815] igb 0000:01:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:25:90:00:cc:fd [ 15.621885] igb 0000:01:00.1: eth1: PBA No: Unknown [ 15.621949] igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) [ 24.581560] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 30.941733] igb: eth1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX [ 30.941851] igb 0000:01:00.1 eth1: Link Speed was downgraded by SmartSpeed And here is ethtool output: Features for eth1: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off [fixed] receive-hashing: on highdma: on [fixed] rx-vlan-filter: on [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: on loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1524259/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp