Public bug reported: 1) Ubuntu release:
Description: Ubuntu 16.04.3 LTS Release: 16.04 2) Package version: * linux-image-extra-4.4.0-96-generic (4.4.0-96.119) * Also with HWE kernel (4.10.x) 3) What I expect: I have a 10G interface (HP NC523SFP 10Gb 2-port) in a HP ProLiant DL380p Gen8, BIOS P70 07/01/2015. The interface is configured using the module qlcnic and it works with the names ens2f0 and ens2f1. They also have VLANs configured. I have installed Ganeti software and bridges over those interfaces, br- dmz over ens2f0 and br-str over ens2f1. Everything should work without connectivity loss. 4) What happened instead: The interface loses the connectivity from time to time, although it recovers itself, with the following error: Oct 12 18:23:14 mazinger kernel: [107906.678468] qlcnic 0000:07:00.1: Pause control frames disabled on all ports Oct 12 18:23:14 mazinger kernel: [107906.678470] qlcnic 0000:07:00.0: Pause control frames disabled on all ports Oct 12 18:23:14 mazinger kernel: [107906.678475] qlcnic 0000:07:00.0: firmware hang detected Oct 12 18:23:14 mazinger kernel: [107906.678482] qlcnic 0000:07:00.0: Dumping hw/fw registers Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3e1f80, Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_0_PC: 0x6d920, PEG_NET_1_PC: 0x6d976, Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6edbe, Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_4_PC: 0x1e2f3 Oct 12 18:23:14 mazinger kernel: [107906.680107] qlcnic 0000:07:00.1: firmware hang detected Oct 12 18:23:14 mazinger kernel: [107906.680385] qlcnic 0000:07:00.1: Dumping hw/fw registers Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3e1f80, Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_0_PC: 0x6d920, PEG_NET_1_PC: 0x6d976, Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6edbe, Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_4_PC: 0x1e2f3 Oct 12 18:23:14 mazinger kernel: [107906.695571] br-dmz: port 1(ens2f0.2) entered disabled state Oct 12 18:23:15 mazinger kernel: [107907.690629] br-str: port 1(ens2f1.10) entered disabled state Oct 12 18:23:16 mazinger kernel: [107908.706988] qlcnic 0000:07:00.1: Detected state change from DEV_NEED_RESET, skipping ack check Oct 12 18:23:17 mazinger kernel: [107909.423713] qlcnic 0000:07:00.0 ens2f0: Dump data 15044136 bytes captured, dump data address = ffffc900334c3000, template header size 36864 bytes, template address = ffffc900193da000 Oct 12 18:23:21 mazinger kernel: [107912.800338] qlcnic 0000:07:00.0: loading firmware from flash Oct 12 18:23:27 mazinger kernel: [107919.137580] qlcnic 0000:07:00.0: Driver v5.3.63, firmware v4.20.1 Oct 12 18:23:27 mazinger kernel: [107919.501555] qlcnic 0000:07:00.1: Driver v5.3.63, firmware v4.20.1 Oct 12 18:23:28 mazinger kernel: [107920.425737] qlcnic 0000:07:00.0 ens2f0: Rx Context[0] Created, state 0x2 Oct 12 18:23:28 mazinger kernel: [107920.435780] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8000] Created, state 0x2 Oct 12 18:23:28 mazinger kernel: [107920.453103] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8008] Created, state 0x2 Oct 12 18:23:29 mazinger kernel: [107921.598651] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x800a] Created, state 0x2 Oct 12 18:23:29 mazinger kernel: [107921.615752] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x800c] Created, state 0x2 Oct 12 18:23:30 mazinger kernel: [107922.196706] qlcnic 0000:07:00.1 ens2f1: Rx Context[1] Created, state 0x2 Oct 12 18:23:30 mazinger kernel: [107922.406680] qlcnic 0000:07:00.1 ens2f1: Tx Context[0x8001] Created, state 0x2 Oct 12 18:23:30 mazinger kernel: [107922.422646] qlcnic 0000:07:00.1 ens2f1: Tx Context[0x8009] Created, state 0x2 Oct 12 18:23:30 mazinger kernel: [107922.439890] qlcnic 0000:07:00.1 ens2f1: Tx Context[0x800b] Created, state 0x2 Oct 12 18:23:30 mazinger kernel: [107922.456417] qlcnic 0000:07:00.1 ens2f1: Tx Context[0x800d] Created, state 0x2 Oct 12 18:23:31 mazinger kernel: [107923.500128] qlcnic 0000:07:00.0 ens2f0: NIC Link is up Oct 12 18:23:31 mazinger kernel: [107923.500360] br-dmz: port 1(ens2f0.2) entered forwarding state Oct 12 18:23:31 mazinger kernel: [107923.500375] br-dmz: port 1(ens2f0.2) entered forwarding state Oct 12 18:23:31 mazinger kernel: [107923.500680] qlcnic 0000:07:00.1 ens2f1: NIC Link is up Oct 12 18:23:31 mazinger kernel: [107923.500971] br-str: port 1(ens2f1.10) entered forwarding state Oct 12 18:23:31 mazinger kernel: [107923.500985] br-str: port 1(ens2f1.10) entered forwarding state --------------- Sometimes it also has kernel errors and need to be rebooted to recover the connectivity: Oct 9 14:36:41 mazinger kernel: [262273.497512] ------------[ cut here ]------------ Oct 9 14:36:41 mazinger kernel: [262273.497821] WARNING: CPU: 6 PID: 0 at /build/linux-z2ccW0/linux-4.4.0/net/sched/sch_generic.c:306 dev_watchdog+0x237/0x240() Oct 9 14:36:41 mazinger kernel: [262273.498083] NETDEV WATCHDOG: ens2f0 (qlcnic): transmit queue 0 timed out Oct 9 14:36:41 mazinger kernel: [262273.498579] Modules linked in: joydev binfmt_misc hpwdt ipmi_ssif bridge intel_rapl x86_pkg_temp_thermal input_leds intel_powerclamp serio_raw sb_edac edac_core lpc_ich 8250_fintek hpilo ioatdma shpchp ipmi_si ipmi_msghandler mac_hid kvm_intel kvm irqbypass ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc coretemp drbd lru_cache autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper qlcnic hid_generic tg3 igb dca hpsa vxlan cryptd usbhid ptp psmouse ip6_udp_tunnel pata_acpi hid i2c_algo_bit scsi_transport_sas pps_core udp_tunnel wmi fjes Oct 9 14:36:41 mazinger kernel: [262273.498651] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.4.0-96-generic #119-Ubuntu Oct 9 14:36:41 mazinger kernel: [262273.498652] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015 Oct 9 14:36:41 mazinger kernel: [262273.498654] 0000000000000286 fc090740aa4761f7 ffff881fbf783d98 ffffffff813fabd3 Oct 9 14:36:41 mazinger kernel: [262273.498666] ffff881fbf783de0 ffffffff81d715f8 ffff881fbf783dd0 ffffffff810812e2 Oct 9 14:36:41 mazinger kernel: [262273.498668] 0000000000000000 ffff881fade31b00 0000000000000006 ffff881fade30000 Oct 9 14:36:41 mazinger kernel: [262273.498681] Call Trace: Oct 9 14:36:41 mazinger kernel: [262273.498683] <IRQ> [<ffffffff813fabd3>] dump_stack+0x63/0x90 Oct 9 14:36:41 mazinger kernel: [262273.498691] [<ffffffff810812e2>] warn_slowpath_common+0x82/0xc0 Oct 9 14:36:41 mazinger kernel: [262273.498693] [<ffffffff8108137c>] warn_slowpath_fmt+0x5c/0x80 Oct 9 14:36:41 mazinger kernel: [262273.498697] [<ffffffff8175eca7>] dev_watchdog+0x237/0x240 Oct 9 14:36:41 mazinger kernel: [262273.498700] [<ffffffff8175ea70>] ? qdisc_rcu_free+0x40/0x40 Oct 9 14:36:41 mazinger kernel: [262273.498705] [<ffffffff810ed035>] call_timer_fn+0x35/0x120 Oct 9 14:36:41 mazinger kernel: [262273.498708] [<ffffffff8175ea70>] ? qdisc_rcu_free+0x40/0x40 Oct 9 14:36:41 mazinger kernel: [262273.498711] [<ffffffff810ed9ea>] run_timer_softirq+0x23a/0x2f0 Oct 9 14:36:41 mazinger kernel: [262273.498714] [<ffffffff81085dc1>] __do_softirq+0x101/0x290 Oct 9 14:36:41 mazinger kernel: [262273.498717] [<ffffffff810860c3>] irq_exit+0xa3/0xb0 Oct 9 14:36:41 mazinger kernel: [262273.498721] [<ffffffff81845d22>] smp_apic_timer_interrupt+0x42/0x50 Oct 9 14:36:41 mazinger kernel: [262273.498724] [<ffffffff81843fe2>] apic_timer_interrupt+0x82/0x90 Oct 9 14:36:41 mazinger kernel: [262273.498726] <EOI> [<ffffffff816d680e>] ? cpuidle_enter_state+0x10e/0x2b0 Oct 9 14:36:41 mazinger kernel: [262273.498731] [<ffffffff816d69e7>] cpuidle_enter+0x17/0x20 Oct 9 14:36:41 mazinger kernel: [262273.498735] [<ffffffff810c47c2>] call_cpuidle+0x32/0x60 Oct 9 14:36:41 mazinger kernel: [262273.498737] [<ffffffff816d69c3>] ? cpuidle_select+0x13/0x20 Oct 9 14:36:41 mazinger kernel: [262273.498739] [<ffffffff810c4a80>] cpu_startup_entry+0x290/0x350 Oct 9 14:36:41 mazinger kernel: [262273.498743] [<ffffffff810517b4>] start_secondary+0x154/0x190 Oct 9 14:36:41 mazinger kernel: [262273.498749] ---[ end trace 6388d35f388918bc ]--- Oct 9 14:36:41 mazinger kernel: [262273.498765] qlcnic 0000:07:00.0 ens2f0: rds_ring=0 crb_rcv_producer=3113 producer=3114 num_desc=4096 Oct 9 14:36:41 mazinger kernel: [262273.498773] qlcnic 0000:07:00.0 ens2f0: rds_ring=1 crb_rcv_producer=1023 producer=0 num_desc=1024 Oct 9 14:36:41 mazinger kernel: [262273.498781] qlcnic 0000:07:00.0 ens2f0: sds_ring=0 crb_sts_consumer=659 consumer=659 crb_intr_mask=0 num_desc=4096 Oct 9 14:36:41 mazinger kernel: [262273.498788] qlcnic 0000:07:00.0 ens2f0: sds_ring=1 crb_sts_consumer=2894 consumer=2894 crb_intr_mask=0 num_desc=4096 Oct 9 14:36:41 mazinger kernel: [262273.498792] qlcnic 0000:07:00.0 ens2f0: sds_ring=2 crb_sts_consumer=3092 consumer=3092 crb_intr_mask=0 num_desc=4096 Oct 9 14:36:41 mazinger kernel: [262273.498796] qlcnic 0000:07:00.0 ens2f0: sds_ring=3 crb_sts_consumer=570 consumer=570 crb_intr_mask=0 num_desc=4096 Oct 9 14:36:41 mazinger kernel: [262273.498798] qlcnic 0000:07:00.0 ens2f0: Tx ring=0 Context Id=0x8000 Oct 9 14:36:41 mazinger kernel: [262273.498800] qlcnic 0000:07:00.0 ens2f0: xmit_finished=161917485, xmit_called=161920455, xmit_on=0, xmit_off=2 Oct 9 14:36:41 mazinger kernel: [262273.498802] qlcnic 0000:07:00.0 ens2f0: crb_intr_mask=0 Oct 9 14:36:41 mazinger kernel: [262273.498805] qlcnic 0000:07:00.0 ens2f0: hw_producer=481, sw_producer=481 sw_consumer=491, hw_consumer=491 Oct 9 14:36:41 mazinger kernel: [262273.498807] qlcnic 0000:07:00.0 ens2f0: Total desc=1024, Available desc=10 Oct 9 14:36:41 mazinger kernel: [262273.498809] qlcnic 0000:07:00.0 ens2f0: Tx ring=1 Context Id=0x8008 Oct 9 14:36:41 mazinger kernel: [262273.498811] qlcnic 0000:07:00.0 ens2f0: xmit_finished=152057037, xmit_called=152059997, xmit_on=0, xmit_off=2 Oct 9 14:36:41 mazinger kernel: [262273.498813] qlcnic 0000:07:00.0 ens2f0: crb_intr_mask=0 Oct 9 14:36:41 mazinger kernel: [262273.498816] qlcnic 0000:07:00.0 ens2f0: hw_producer=81, sw_producer=81 sw_consumer=91, hw_consumer=91 Oct 9 14:36:41 mazinger kernel: [262273.498818] qlcnic 0000:07:00.0 ens2f0: Total desc=1024, Available desc=10 Oct 9 14:36:41 mazinger kernel: [262273.498819] qlcnic 0000:07:00.0 ens2f0: Tx ring=2 Context Id=0x800a Oct 9 14:36:41 mazinger kernel: [262273.498821] qlcnic 0000:07:00.0 ens2f0: xmit_finished=133645903, xmit_called=133648936, xmit_on=0, xmit_off=2 Oct 9 14:36:41 mazinger kernel: [262273.498824] qlcnic 0000:07:00.0 ens2f0: crb_intr_mask=0 Oct 9 14:36:41 mazinger kernel: [262273.498827] qlcnic 0000:07:00.0 ens2f0: hw_producer=572, sw_producer=572 sw_consumer=582, hw_consumer=582 Oct 9 14:36:41 mazinger kernel: [262273.498828] qlcnic 0000:07:00.0 ens2f0: Total desc=1024, Available desc=10 Oct 9 14:36:41 mazinger kernel: [262273.498830] qlcnic 0000:07:00.0 ens2f0: Tx ring=3 Context Id=0x800c Oct 9 14:36:41 mazinger kernel: [262273.498836] qlcnic 0000:07:00.0 ens2f0: xmit_finished=162932700, xmit_called=162935603, xmit_on=0, xmit_off=2 Oct 9 14:36:41 mazinger kernel: [262273.498843] qlcnic 0000:07:00.0 ens2f0: crb_intr_mask=0 Oct 9 14:36:41 mazinger kernel: [262273.498850] qlcnic 0000:07:00.0 ens2f0: hw_producer=568, sw_producer=568 sw_consumer=578, hw_consumer=578 Oct 9 14:36:41 mazinger kernel: [262273.498857] qlcnic 0000:07:00.0 ens2f0: Total desc=1024, Available desc=10 Oct 9 14:36:41 mazinger kernel: [262273.498863] qlcnic 0000:07:00.0 ens2f0: Tx timeout, reset adapter context. Oct 9 14:36:43 mazinger kernel: [262275.251864] qlcnic 0000:07:00.0: CDRP command failed: [7] Oct 9 14:36:43 mazinger kernel: [262275.252143] qlcnic 0000:07:00.0: Host MBX regs(2) Oct 9 14:36:43 mazinger kernel: [262275.252146] 00000039 Oct 9 14:36:43 mazinger kernel: [262275.252148] 00050032 <6>[262275.252150] Oct 9 14:36:43 mazinger kernel: [262275.252153] qlcnic 0000:07:00.0: FW MBX regs(3) Oct 9 14:36:43 mazinger kernel: [262275.252155] 00000007 Oct 9 14:36:43 mazinger kernel: [262275.252156] 00000000 00000000 Oct 9 14:36:43 mazinger kernel: [262275.252158] Oct 9 14:36:43 mazinger kernel: [262275.252166] qlcnic 0000:07:00.0 ens2f0: Failed to Delete interrupts 7 Oct 9 14:36:43 mazinger kernel: [262275.279376] br-dmz: port 1(ens2f0.2) entered disabled state Oct 9 14:36:43 mazinger kernel: [262275.447095] qlcnic 0000:07:00.0 ens2f0: Rx Context[0] Created, state 0x2 Oct 9 14:36:43 mazinger kernel: [262275.493365] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8000] Created, state 0x2 Oct 9 14:36:43 mazinger kernel: [262275.509816] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x800e] Created, state 0x2 Oct 9 14:36:43 mazinger kernel: [262275.527651] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8010] Created, state 0x2 Oct 9 14:36:43 mazinger kernel: [262275.543852] qlcnic 0000:07:00.0 ens2f0: Tx Context[0x8012] Created, state 0x2 Oct 9 14:36:43 mazinger kernel: [262275.545966] qlcnic 0000:07:00.0 ens2f0: qlcnic_reset_hw_context: soft reset complete ----------- What I have tried to fix it: - I have upgraded the interface firmware to the latest version provided by HP: # ethtool -i ens2f0 driver: qlcnic version: 5.3.63 firmware-version: 4.20.1 expansion-rom-version: bus-info: 0000:07:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no - I have opened a case with HP. Following their recomendations I have upgraded the firmware of the server to the latest version. After capturing a AHS (Active Health System) log the have told me there isn't a hardware problem and it should be a software issue. - I have tried HWE Kernel (version 4.10.x) which comes with a newer version of qlcnic module (5.3.65) but it didn't solved the problem. - After reading about some problems with TOS and virtual environments, I have disabled TOS/GOS and other configuration in the interfaces: auto <iface> iface <iface> inet manual pre-up /sbin/ethtool --offload <iface> gso off tso off sg off gro off I have found similar problems googling but all of them were solved applying one/some of those things. The issue seems to be related to this kind of interfaces and using them with virtual environments. ** Affects: kernel-package (Ubuntu) Importance: Undecided Status: New ** Tags: ganeti qlcnic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1723482 Title: qlcnic firmware hang detected kvm ganeti To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/kernel-package/+bug/1723482/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs