This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1723482 and then change the status of the bug to 'Confirmed'. If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'. This change has been made by an automated script, maintained by the Ubuntu Kernel Team. ** Changed in: linux (Ubuntu) Status: New => Incomplete ** Tags added: xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1723482 Title: qlcnic firmware hang detected kvm ganeti Status in linux package in Ubuntu: Incomplete Bug description: 1) Ubuntu release: Description: Ubuntu 16.04.3 LTS Release: 16.04 2) Package version: * linux-image-extra-4.4.0-96-generic (4.4.0-96.119) * Also with HWE kernel (4.10.x) 3) What I expect: I have a 10G interface (HP NC523SFP 10Gb 2-port) in a HP ProLiant DL380p Gen8, BIOS P70 07/01/2015. The interface is configured using the module qlcnic and it works with the names ens2f0 and ens2f1. They also have VLANs configured. I have installed Ganeti software and bridges over those interfaces, br-dmz over ens2f0 and br-str over ens2f1. Everything should work without connectivity loss. 4) What happened instead: The interface loses the connectivity from time to time, although it recovers itself, with the following error: Oct 12 18:23:14 mazinger kernel: [107906.678468] qlcnic 0000:07:00.1: Pause control frames disabled on all ports Oct 12 18:23:14 mazinger kernel: [107906.678470] qlcnic 0000:07:00.0: Pause control frames disabled on all ports Oct 12 18:23:14 mazinger kernel: [107906.678475] qlcnic 0000:07:00.0: firmware hang detected Oct 12 18:23:14 mazinger kernel: [107906.678482] qlcnic 0000:07:00.0: Dumping hw/fw registers Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3e1f80, Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_0_PC: 0x6d920, PEG_NET_1_PC: 0x6d976, Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6edbe, Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_4_PC: 0x1e2f3 Oct 12 18:23:14 mazinger kernel: [107906.680107] qlcnic 0000:07:00.1: firmware hang detected Oct 12 18:23:14 mazinger kernel: [107906.680385] qlcnic 0000:07:00.1: Dumping hw/fw registers Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3e1f80, Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_0_PC: 0x6d920, PEG_NET_1_PC: 0x6d976, Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6edbe, Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_4_PC: 0x1e2f3 Oct 12 18:23:14 mazinger kernel: [107906.695571] br-dmz: port 1(ens2f0.2) entered disabled state Oct 12 18:23:15 mazinger kernel: [107907.690629] br-str: port 1(ens2f1.10) entered disabled state Oct 12 18:23:16 mazinger kernel: [107908.706988] qlcnic 0000:07:00.1: Detected state change from DEV_NEED_RESET, skipping ack check Oct 12 18:23:17 mazinger kernel: [107909.423713] qlcnic 0000:07:00.0 ens2f0: Dump data 15044136 bytes captured, dump data address = ffffc900334c3000, template header size 36864 bytes, template address = ffffc900193da000 Oct 12 18:23:21 mazinger kernel: [107912.800338] qlcnic 0000:07:00.0: loading firmware from flash Oct 12 18:23:27 mazinger kernel: [107919.137580] qlcnic 0000:07:00.0: Driver v5.3.63, firmware v4.20.1 Oct 12 18:23:27 mazinger kernel: [107919.501555] qlcnic 0000:07:00.1: Driver v5.3.63, firmware v4.20.1 Oct 12 18:23:28 mazinger kernel: [107920.425737] qlcnic 0000:07:00.0 ens2f0: Rx Context[0] Created, state 0x2 Oct 12 18:23:28 mazinger kernel: [107920.435780] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8000] Created, state 0x2 Oct 12 18:23:28 mazinger kernel: [107920.453103] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8008] Created, state 0x2 Oct 12 18:23:29 mazinger kernel: [107921.598651] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x800a] Created, state 0x2 Oct 12 18:23:29 mazinger kernel: [107921.615752] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x800c] Created, state 0x2 Oct 12 18:23:30 mazinger kernel: [107922.196706] qlcnic 0000:07:00.1 ens2f1: Rx Context[1] Created, state 0x2 Oct 12 18:23:30 mazinger kernel: [107922.406680] qlcnic 0000:07:00.1 ens2f1: Tx Context[0x8001] Created, state 0x2 Oct 12 18:23:30 mazinger kernel: [107922.422646] qlcnic 0000:07:00.1 ens2f1: Tx Context[0x8009] Created, state 0x2 Oct 12 18:23:30 mazinger kernel: [107922.439890] qlcnic 0000:07:00.1 ens2f1: Tx Context[0x800b] Created, state 0x2 Oct 12 18:23:30 mazinger kernel: [107922.456417] qlcnic 0000:07:00.1 ens2f1: Tx Context[0x800d] Created, state 0x2 Oct 12 18:23:31 mazinger kernel: [107923.500128] qlcnic 0000:07:00.0 ens2f0: NIC Link is up Oct 12 18:23:31 mazinger kernel: [107923.500360] br-dmz: port 1(ens2f0.2) entered forwarding state Oct 12 18:23:31 mazinger kernel: [107923.500375] br-dmz: port 1(ens2f0.2) entered forwarding state Oct 12 18:23:31 mazinger kernel: [107923.500680] qlcnic 0000:07:00.1 ens2f1: NIC Link is up Oct 12 18:23:31 mazinger kernel: [107923.500971] br-str: port 1(ens2f1.10) entered forwarding state Oct 12 18:23:31 mazinger kernel: [107923.500985] br-str: port 1(ens2f1.10) entered forwarding state --------------- Sometimes it also has kernel errors and need to be rebooted to recover the connectivity: Oct 9 14:36:41 mazinger kernel: [262273.497512] ------------[ cut here ]------------ Oct 9 14:36:41 mazinger kernel: [262273.497821] WARNING: CPU: 6 PID: 0 at /build/linux-z2ccW0/linux-4.4.0/net/sched/sch_generic.c:306 dev_watchdog+0x237/0x240() Oct 9 14:36:41 mazinger kernel: [262273.498083] NETDEV WATCHDOG: ens2f0 (qlcnic): transmit queue 0 timed out Oct 9 14:36:41 mazinger kernel: [262273.498579] Modules linked in: joydev binfmt_misc hpwdt ipmi_ssif bridge intel_rapl x86_pkg_temp_thermal input_leds intel_powerclamp serio_raw sb_edac edac_core lpc_ich 8250_fintek hpilo ioatdma shpchp ipmi_si ipmi_msghandler mac_hid kvm_intel kvm irqbypass ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc coretemp drbd lru_cache autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper qlcnic hid_generic tg3 igb dca hpsa vxlan cryptd usbhid ptp psmouse ip6_udp_tunnel pata_acpi hid i2c_algo_bit scsi_transport_sas pps_core udp_tunnel wmi fjes Oct 9 14:36:41 mazinger kernel: [262273.498651] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.4.0-96-generic #119-Ubuntu Oct 9 14:36:41 mazinger kernel: [262273.498652] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015 Oct 9 14:36:41 mazinger kernel: [262273.498654] 0000000000000286 fc090740aa4761f7 ffff881fbf783d98 ffffffff813fabd3 Oct 9 14:36:41 mazinger kernel: [262273.498666] ffff881fbf783de0 ffffffff81d715f8 ffff881fbf783dd0 ffffffff810812e2 Oct 9 14:36:41 mazinger kernel: [262273.498668] 0000000000000000 ffff881fade31b00 0000000000000006 ffff881fade30000 Oct 9 14:36:41 mazinger kernel: [262273.498681] Call Trace: Oct 9 14:36:41 mazinger kernel: [262273.498683] <IRQ> [<ffffffff813fabd3>] dump_stack+0x63/0x90 Oct 9 14:36:41 mazinger kernel: [262273.498691] [<ffffffff810812e2>] warn_slowpath_common+0x82/0xc0 Oct 9 14:36:41 mazinger kernel: [262273.498693] [<ffffffff8108137c>] warn_slowpath_fmt+0x5c/0x80 Oct 9 14:36:41 mazinger kernel: [262273.498697] [<ffffffff8175eca7>] dev_watchdog+0x237/0x240 Oct 9 14:36:41 mazinger kernel: [262273.498700] [<ffffffff8175ea70>] ? qdisc_rcu_free+0x40/0x40 Oct 9 14:36:41 mazinger kernel: [262273.498705] [<ffffffff810ed035>] call_timer_fn+0x35/0x120 Oct 9 14:36:41 mazinger kernel: [262273.498708] [<ffffffff8175ea70>] ? qdisc_rcu_free+0x40/0x40 Oct 9 14:36:41 mazinger kernel: [262273.498711] [<ffffffff810ed9ea>] run_timer_softirq+0x23a/0x2f0 Oct 9 14:36:41 mazinger kernel: [262273.498714] [<ffffffff81085dc1>] __do_softirq+0x101/0x290 Oct 9 14:36:41 mazinger kernel: [262273.498717] [<ffffffff810860c3>] irq_exit+0xa3/0xb0 Oct 9 14:36:41 mazinger kernel: [262273.498721] [<ffffffff81845d22>] smp_apic_timer_interrupt+0x42/0x50 Oct 9 14:36:41 mazinger kernel: [262273.498724] [<ffffffff81843fe2>] apic_timer_interrupt+0x82/0x90 Oct 9 14:36:41 mazinger kernel: [262273.498726] <EOI> [<ffffffff816d680e>] ? cpuidle_enter_state+0x10e/0x2b0 Oct 9 14:36:41 mazinger kernel: [262273.498731] [<ffffffff816d69e7>] cpuidle_enter+0x17/0x20 Oct 9 14:36:41 mazinger kernel: [262273.498735] [<ffffffff810c47c2>] call_cpuidle+0x32/0x60 Oct 9 14:36:41 mazinger kernel: [262273.498737] [<ffffffff816d69c3>] ? cpuidle_select+0x13/0x20 Oct 9 14:36:41 mazinger kernel: [262273.498739] [<ffffffff810c4a80>] cpu_startup_entry+0x290/0x350 Oct 9 14:36:41 mazinger kernel: [262273.498743] [<ffffffff810517b4>] start_secondary+0x154/0x190 Oct 9 14:36:41 mazinger kernel: [262273.498749] ---[ end trace 6388d35f388918bc ]--- Oct 9 14:36:41 mazinger kernel: [262273.498765] qlcnic 0000:07:00.0 ens2f0: rds_ring=0 crb_rcv_producer=3113 producer=3114 num_desc=4096 Oct 9 14:36:41 mazinger kernel: [262273.498773] qlcnic 0000:07:00.0 ens2f0: rds_ring=1 crb_rcv_producer=1023 producer=0 num_desc=1024 Oct 9 14:36:41 mazinger kernel: [262273.498781] qlcnic 0000:07:00.0 ens2f0: sds_ring=0 crb_sts_consumer=659 consumer=659 crb_intr_mask=0 num_desc=4096 Oct 9 14:36:41 mazinger kernel: [262273.498788] qlcnic 0000:07:00.0 ens2f0: sds_ring=1 crb_sts_consumer=2894 consumer=2894 crb_intr_mask=0 num_desc=4096 Oct 9 14:36:41 mazinger kernel: [262273.498792] qlcnic 0000:07:00.0 ens2f0: sds_ring=2 crb_sts_consumer=3092 consumer=3092 crb_intr_mask=0 num_desc=4096 Oct 9 14:36:41 mazinger kernel: [262273.498796] qlcnic 0000:07:00.0 ens2f0: sds_ring=3 crb_sts_consumer=570 consumer=570 crb_intr_mask=0 num_desc=4096 Oct 9 14:36:41 mazinger kernel: [262273.498798] qlcnic 0000:07:00.0 ens2f0: Tx ring=0 Context Id=0x8000 Oct 9 14:36:41 mazinger kernel: [262273.498800] qlcnic 0000:07:00.0 ens2f0: xmit_finished=161917485, xmit_called=161920455, xmit_on=0, xmit_off=2 Oct 9 14:36:41 mazinger kernel: [262273.498802] qlcnic 0000:07:00.0 ens2f0: crb_intr_mask=0 Oct 9 14:36:41 mazinger kernel: [262273.498805] qlcnic 0000:07:00.0 ens2f0: hw_producer=481, sw_producer=481 sw_consumer=491, hw_consumer=491 Oct 9 14:36:41 mazinger kernel: [262273.498807] qlcnic 0000:07:00.0 ens2f0: Total desc=1024, Available desc=10 Oct 9 14:36:41 mazinger kernel: [262273.498809] qlcnic 0000:07:00.0 ens2f0: Tx ring=1 Context Id=0x8008 Oct 9 14:36:41 mazinger kernel: [262273.498811] qlcnic 0000:07:00.0 ens2f0: xmit_finished=152057037, xmit_called=152059997, xmit_on=0, xmit_off=2 Oct 9 14:36:41 mazinger kernel: [262273.498813] qlcnic 0000:07:00.0 ens2f0: crb_intr_mask=0 Oct 9 14:36:41 mazinger kernel: [262273.498816] qlcnic 0000:07:00.0 ens2f0: hw_producer=81, sw_producer=81 sw_consumer=91, hw_consumer=91 Oct 9 14:36:41 mazinger kernel: [262273.498818] qlcnic 0000:07:00.0 ens2f0: Total desc=1024, Available desc=10 Oct 9 14:36:41 mazinger kernel: [262273.498819] qlcnic 0000:07:00.0 ens2f0: Tx ring=2 Context Id=0x800a Oct 9 14:36:41 mazinger kernel: [262273.498821] qlcnic 0000:07:00.0 ens2f0: xmit_finished=133645903, xmit_called=133648936, xmit_on=0, xmit_off=2 Oct 9 14:36:41 mazinger kernel: [262273.498824] qlcnic 0000:07:00.0 ens2f0: crb_intr_mask=0 Oct 9 14:36:41 mazinger kernel: [262273.498827] qlcnic 0000:07:00.0 ens2f0: hw_producer=572, sw_producer=572 sw_consumer=582, hw_consumer=582 Oct 9 14:36:41 mazinger kernel: [262273.498828] qlcnic 0000:07:00.0 ens2f0: Total desc=1024, Available desc=10 Oct 9 14:36:41 mazinger kernel: [262273.498830] qlcnic 0000:07:00.0 ens2f0: Tx ring=3 Context Id=0x800c Oct 9 14:36:41 mazinger kernel: [262273.498836] qlcnic 0000:07:00.0 ens2f0: xmit_finished=162932700, xmit_called=162935603, xmit_on=0, xmit_off=2 Oct 9 14:36:41 mazinger kernel: [262273.498843] qlcnic 0000:07:00.0 ens2f0: crb_intr_mask=0 Oct 9 14:36:41 mazinger kernel: [262273.498850] qlcnic 0000:07:00.0 ens2f0: hw_producer=568, sw_producer=568 sw_consumer=578, hw_consumer=578 Oct 9 14:36:41 mazinger kernel: [262273.498857] qlcnic 0000:07:00.0 ens2f0: Total desc=1024, Available desc=10 Oct 9 14:36:41 mazinger kernel: [262273.498863] qlcnic 0000:07:00.0 ens2f0: Tx timeout, reset adapter context. Oct 9 14:36:43 mazinger kernel: [262275.251864] qlcnic 0000:07:00.0: CDRP command failed: [7] Oct 9 14:36:43 mazinger kernel: [262275.252143] qlcnic 0000:07:00.0: Host MBX regs(2) Oct 9 14:36:43 mazinger kernel: [262275.252146] 00000039 Oct 9 14:36:43 mazinger kernel: [262275.252148] 00050032 <6>[262275.252150] Oct 9 14:36:43 mazinger kernel: [262275.252153] qlcnic 0000:07:00.0: FW MBX regs(3) Oct 9 14:36:43 mazinger kernel: [262275.252155] 00000007 Oct 9 14:36:43 mazinger kernel: [262275.252156] 00000000 00000000 Oct 9 14:36:43 mazinger kernel: [262275.252158] Oct 9 14:36:43 mazinger kernel: [262275.252166] qlcnic 0000:07:00.0 ens2f0: Failed to Delete interrupts 7 Oct 9 14:36:43 mazinger kernel: [262275.279376] br-dmz: port 1(ens2f0.2) entered disabled state Oct 9 14:36:43 mazinger kernel: [262275.447095] qlcnic 0000:07:00.0 ens2f0: Rx Context[0] Created, state 0x2 Oct 9 14:36:43 mazinger kernel: [262275.493365] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8000] Created, state 0x2 Oct 9 14:36:43 mazinger kernel: [262275.509816] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x800e] Created, state 0x2 Oct 9 14:36:43 mazinger kernel: [262275.527651] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8010] Created, state 0x2 Oct 9 14:36:43 mazinger kernel: [262275.543852] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8012] Created, state 0x2 Oct 9 14:36:43 mazinger kernel: [262275.545966] qlcnic 0000:07:00.0 ens2f0: qlcnic_reset_hw_context: soft reset complete ----------- What I have tried to fix it: - I have upgraded the interface firmware to the latest version provided by HP: # ethtool -i ens2f0 driver: qlcnic version: 5.3.63 firmware-version: 4.20.1 expansion-rom-version: bus-info: 0000:07:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no - I have opened a case with HP. Following their recomendations I have upgraded the firmware of the server to the latest version. After capturing a AHS (Active Health System) log the have told me there isn't a hardware problem and it should be a software issue. - I have tried HWE Kernel (version 4.10.x) which comes with a newer version of qlcnic module (5.3.65) but it didn't solved the problem. - After reading about some problems with TOS and virtual environments, I have disabled TOS/GOS and other configuration in the interfaces: auto <iface> iface <iface> inet manual pre-up /sbin/ethtool --offload <iface> gso off tso off sg off gro off I have found similar problems googling but all of them were solved applying one/some of those things. The issue seems to be related to this kind of interfaces and using them with virtual environments. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723482/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp