Public bug reported:

1) Ubuntu release:

Description:    Ubuntu 16.04.3 LTS
Release:        16.04

2) Package version:

* linux-image-extra-4.4.0-96-generic (4.4.0-96.119)
* Also with HWE kernel (4.10.x)

3) What I expect:

I have a 10G interface (HP NC523SFP 10Gb 2-port) in a HP ProLiant DL380p
Gen8, BIOS P70 07/01/2015. The interface is configured using the module
qlcnic and it works with the names ens2f0 and ens2f1. They also have
VLANs configured.

I have installed Ganeti software and bridges over those interfaces, br-
dmz over ens2f0 and br-str over ens2f1.

Everything should work without connectivity loss.

4) What happened instead:

The interface loses the connectivity from time to time, although it
recovers itself, with the following error:

Oct 12 18:23:14 mazinger kernel: [107906.678468] qlcnic 0000:07:00.1: Pause 
control frames disabled on all ports
Oct 12 18:23:14 mazinger kernel: [107906.678470] qlcnic 0000:07:00.0: Pause 
control frames disabled on all ports
Oct 12 18:23:14 mazinger kernel: [107906.678475] qlcnic 0000:07:00.0: firmware 
hang detected
Oct 12 18:23:14 mazinger kernel: [107906.678482] qlcnic 0000:07:00.0: Dumping 
hw/fw registers
Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_HALT_STATUS1: 0x40001502, 
PEG_HALT_STATUS2: 0x3e1f80,
Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_0_PC: 0x6d920, 
PEG_NET_1_PC: 0x6d976,
Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_2_PC: 0x149, 
PEG_NET_3_PC: 0x6edbe,
Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_4_PC: 0x1e2f3
Oct 12 18:23:14 mazinger kernel: [107906.680107] qlcnic 0000:07:00.1: firmware 
hang detected
Oct 12 18:23:14 mazinger kernel: [107906.680385] qlcnic 0000:07:00.1: Dumping 
hw/fw registers
Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_HALT_STATUS1: 0x40001502, 
PEG_HALT_STATUS2: 0x3e1f80,
Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_0_PC: 0x6d920, 
PEG_NET_1_PC: 0x6d976,
Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_2_PC: 0x149, 
PEG_NET_3_PC: 0x6edbe,
Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_4_PC: 0x1e2f3
Oct 12 18:23:14 mazinger kernel: [107906.695571] br-dmz: port 1(ens2f0.2) 
entered disabled state
Oct 12 18:23:15 mazinger kernel: [107907.690629] br-str: port 1(ens2f1.10) 
entered disabled state
Oct 12 18:23:16 mazinger kernel: [107908.706988] qlcnic 0000:07:00.1: Detected 
state change from DEV_NEED_RESET, skipping ack check
Oct 12 18:23:17 mazinger kernel: [107909.423713] qlcnic 0000:07:00.0 ens2f0: 
Dump data 15044136 bytes captured, dump data address = ffffc900334c3000, 
template header size 36864 bytes, template address = ffffc900193da000
Oct 12 18:23:21 mazinger kernel: [107912.800338] qlcnic 0000:07:00.0: loading 
firmware from flash
Oct 12 18:23:27 mazinger kernel: [107919.137580] qlcnic 0000:07:00.0: Driver 
v5.3.63, firmware v4.20.1
Oct 12 18:23:27 mazinger kernel: [107919.501555] qlcnic 0000:07:00.1: Driver 
v5.3.63, firmware v4.20.1
Oct 12 18:23:28 mazinger kernel: [107920.425737] qlcnic 0000:07:00.0 ens2f0: Rx 
Context[0] Created, state 0x2
Oct 12 18:23:28 mazinger kernel: [107920.435780] qlcnic 0000:07:00.0 ens2f0: Tx 
Context[0x8000] Created, state 0x2
Oct 12 18:23:28 mazinger kernel: [107920.453103] qlcnic 0000:07:00.0 ens2f0: Tx 
Context[0x8008] Created, state 0x2
Oct 12 18:23:29 mazinger kernel: [107921.598651] qlcnic 0000:07:00.0 ens2f0: Tx 
Context[0x800a] Created, state 0x2
Oct 12 18:23:29 mazinger kernel: [107921.615752] qlcnic 0000:07:00.0 ens2f0: Tx 
Context[0x800c] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.196706] qlcnic 0000:07:00.1 ens2f1: Rx 
Context[1] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.406680] qlcnic 0000:07:00.1 ens2f1: Tx 
Context[0x8001] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.422646] qlcnic 0000:07:00.1 ens2f1: Tx 
Context[0x8009] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.439890] qlcnic 0000:07:00.1 ens2f1: Tx 
Context[0x800b] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.456417] qlcnic 0000:07:00.1 ens2f1: Tx 
Context[0x800d] Created, state 0x2
Oct 12 18:23:31 mazinger kernel: [107923.500128] qlcnic 0000:07:00.0 ens2f0: 
NIC Link is up
Oct 12 18:23:31 mazinger kernel: [107923.500360] br-dmz: port 1(ens2f0.2) 
entered forwarding state
Oct 12 18:23:31 mazinger kernel: [107923.500375] br-dmz: port 1(ens2f0.2) 
entered forwarding state
Oct 12 18:23:31 mazinger kernel: [107923.500680] qlcnic 0000:07:00.1 ens2f1: 
NIC Link is up
Oct 12 18:23:31 mazinger kernel: [107923.500971] br-str: port 1(ens2f1.10) 
entered forwarding state
Oct 12 18:23:31 mazinger kernel: [107923.500985] br-str: port 1(ens2f1.10) 
entered forwarding state
---------------

Sometimes it also has kernel errors and need to be rebooted to recover
the connectivity:

Oct  9 14:36:41 mazinger kernel: [262273.497512] ------------[ cut here 
]------------
Oct  9 14:36:41 mazinger kernel: [262273.497821] WARNING: CPU: 6 PID: 0 at 
/build/linux-z2ccW0/linux-4.4.0/net/sched/sch_generic.c:306 
dev_watchdog+0x237/0x240()
Oct  9 14:36:41 mazinger kernel: [262273.498083] NETDEV WATCHDOG: ens2f0 
(qlcnic): transmit queue 0 timed out
Oct  9 14:36:41 mazinger kernel: [262273.498579] Modules linked in: joydev 
binfmt_misc hpwdt ipmi_ssif bridge intel_rapl x86_pkg_temp_thermal input_leds 
intel_powerclamp serio_raw sb_edac edac_core lpc_ich 8250_fintek hpilo ioatdma 
shpchp ipmi_si ipmi_msghandler mac_hid kvm_intel kvm irqbypass ib_iser rdma_cm 
iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi 8021q garp mrp stp llc coretemp drbd lru_cache autofs4 
btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx 
xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper qlcnic hid_generic tg3 igb dca hpsa vxlan cryptd usbhid 
ptp psmouse ip6_udp_tunnel pata_acpi hid i2c_algo_bit scsi_transport_sas 
pps_core udp_tunnel wmi fjes
Oct  9 14:36:41 mazinger kernel: [262273.498651] CPU: 6 PID: 0 Comm: swapper/6 
Not tainted 4.4.0-96-generic #119-Ubuntu
Oct  9 14:36:41 mazinger kernel: [262273.498652] Hardware name: HP ProLiant 
DL380p Gen8, BIOS P70 07/01/2015
Oct  9 14:36:41 mazinger kernel: [262273.498654]  0000000000000286 
fc090740aa4761f7 ffff881fbf783d98 ffffffff813fabd3
Oct  9 14:36:41 mazinger kernel: [262273.498666]  ffff881fbf783de0 
ffffffff81d715f8 ffff881fbf783dd0 ffffffff810812e2
Oct  9 14:36:41 mazinger kernel: [262273.498668]  0000000000000000 
ffff881fade31b00 0000000000000006 ffff881fade30000
Oct  9 14:36:41 mazinger kernel: [262273.498681] Call Trace:
Oct  9 14:36:41 mazinger kernel: [262273.498683]  <IRQ>  [<ffffffff813fabd3>] 
dump_stack+0x63/0x90
Oct  9 14:36:41 mazinger kernel: [262273.498691]  [<ffffffff810812e2>] 
warn_slowpath_common+0x82/0xc0
Oct  9 14:36:41 mazinger kernel: [262273.498693]  [<ffffffff8108137c>] 
warn_slowpath_fmt+0x5c/0x80
Oct  9 14:36:41 mazinger kernel: [262273.498697]  [<ffffffff8175eca7>] 
dev_watchdog+0x237/0x240
Oct  9 14:36:41 mazinger kernel: [262273.498700]  [<ffffffff8175ea70>] ? 
qdisc_rcu_free+0x40/0x40
Oct  9 14:36:41 mazinger kernel: [262273.498705]  [<ffffffff810ed035>] 
call_timer_fn+0x35/0x120
Oct  9 14:36:41 mazinger kernel: [262273.498708]  [<ffffffff8175ea70>] ? 
qdisc_rcu_free+0x40/0x40
Oct  9 14:36:41 mazinger kernel: [262273.498711]  [<ffffffff810ed9ea>] 
run_timer_softirq+0x23a/0x2f0
Oct  9 14:36:41 mazinger kernel: [262273.498714]  [<ffffffff81085dc1>] 
__do_softirq+0x101/0x290
Oct  9 14:36:41 mazinger kernel: [262273.498717]  [<ffffffff810860c3>] 
irq_exit+0xa3/0xb0
Oct  9 14:36:41 mazinger kernel: [262273.498721]  [<ffffffff81845d22>] 
smp_apic_timer_interrupt+0x42/0x50
Oct  9 14:36:41 mazinger kernel: [262273.498724]  [<ffffffff81843fe2>] 
apic_timer_interrupt+0x82/0x90
Oct  9 14:36:41 mazinger kernel: [262273.498726]  <EOI>  [<ffffffff816d680e>] ? 
cpuidle_enter_state+0x10e/0x2b0
Oct  9 14:36:41 mazinger kernel: [262273.498731]  [<ffffffff816d69e7>] 
cpuidle_enter+0x17/0x20
Oct  9 14:36:41 mazinger kernel: [262273.498735]  [<ffffffff810c47c2>] 
call_cpuidle+0x32/0x60
Oct  9 14:36:41 mazinger kernel: [262273.498737]  [<ffffffff816d69c3>] ? 
cpuidle_select+0x13/0x20
Oct  9 14:36:41 mazinger kernel: [262273.498739]  [<ffffffff810c4a80>] 
cpu_startup_entry+0x290/0x350
Oct  9 14:36:41 mazinger kernel: [262273.498743]  [<ffffffff810517b4>] 
start_secondary+0x154/0x190
Oct  9 14:36:41 mazinger kernel: [262273.498749] ---[ end trace 
6388d35f388918bc ]---
Oct  9 14:36:41 mazinger kernel: [262273.498765] qlcnic 0000:07:00.0 ens2f0: 
rds_ring=0 crb_rcv_producer=3113 producer=3114 num_desc=4096
Oct  9 14:36:41 mazinger kernel: [262273.498773] qlcnic 0000:07:00.0 ens2f0: 
rds_ring=1 crb_rcv_producer=1023 producer=0 num_desc=1024
Oct  9 14:36:41 mazinger kernel: [262273.498781] qlcnic 0000:07:00.0 ens2f0: 
sds_ring=0 crb_sts_consumer=659 consumer=659 crb_intr_mask=0 num_desc=4096
Oct  9 14:36:41 mazinger kernel: [262273.498788] qlcnic 0000:07:00.0 ens2f0: 
sds_ring=1 crb_sts_consumer=2894 consumer=2894 crb_intr_mask=0 num_desc=4096
Oct  9 14:36:41 mazinger kernel: [262273.498792] qlcnic 0000:07:00.0 ens2f0: 
sds_ring=2 crb_sts_consumer=3092 consumer=3092 crb_intr_mask=0 num_desc=4096
Oct  9 14:36:41 mazinger kernel: [262273.498796] qlcnic 0000:07:00.0 ens2f0: 
sds_ring=3 crb_sts_consumer=570 consumer=570 crb_intr_mask=0 num_desc=4096
Oct  9 14:36:41 mazinger kernel: [262273.498798] qlcnic 0000:07:00.0 ens2f0: Tx 
ring=0 Context Id=0x8000
Oct  9 14:36:41 mazinger kernel: [262273.498800] qlcnic 0000:07:00.0 ens2f0: 
xmit_finished=161917485, xmit_called=161920455, xmit_on=0, xmit_off=2
Oct  9 14:36:41 mazinger kernel: [262273.498802] qlcnic 0000:07:00.0 ens2f0: 
crb_intr_mask=0
Oct  9 14:36:41 mazinger kernel: [262273.498805] qlcnic 0000:07:00.0 ens2f0: 
hw_producer=481, sw_producer=481 sw_consumer=491, hw_consumer=491
Oct  9 14:36:41 mazinger kernel: [262273.498807] qlcnic 0000:07:00.0 ens2f0: 
Total desc=1024, Available desc=10
Oct  9 14:36:41 mazinger kernel: [262273.498809] qlcnic 0000:07:00.0 ens2f0: Tx 
ring=1 Context Id=0x8008
Oct  9 14:36:41 mazinger kernel: [262273.498811] qlcnic 0000:07:00.0 ens2f0: 
xmit_finished=152057037, xmit_called=152059997, xmit_on=0, xmit_off=2
Oct  9 14:36:41 mazinger kernel: [262273.498813] qlcnic 0000:07:00.0 ens2f0: 
crb_intr_mask=0
Oct  9 14:36:41 mazinger kernel: [262273.498816] qlcnic 0000:07:00.0 ens2f0: 
hw_producer=81, sw_producer=81 sw_consumer=91, hw_consumer=91
Oct  9 14:36:41 mazinger kernel: [262273.498818] qlcnic 0000:07:00.0 ens2f0: 
Total desc=1024, Available desc=10
Oct  9 14:36:41 mazinger kernel: [262273.498819] qlcnic 0000:07:00.0 ens2f0: Tx 
ring=2 Context Id=0x800a
Oct  9 14:36:41 mazinger kernel: [262273.498821] qlcnic 0000:07:00.0 ens2f0: 
xmit_finished=133645903, xmit_called=133648936, xmit_on=0, xmit_off=2
Oct  9 14:36:41 mazinger kernel: [262273.498824] qlcnic 0000:07:00.0 ens2f0: 
crb_intr_mask=0
Oct  9 14:36:41 mazinger kernel: [262273.498827] qlcnic 0000:07:00.0 ens2f0: 
hw_producer=572, sw_producer=572 sw_consumer=582, hw_consumer=582
Oct  9 14:36:41 mazinger kernel: [262273.498828] qlcnic 0000:07:00.0 ens2f0: 
Total desc=1024, Available desc=10
Oct  9 14:36:41 mazinger kernel: [262273.498830] qlcnic 0000:07:00.0 ens2f0: Tx 
ring=3 Context Id=0x800c
Oct  9 14:36:41 mazinger kernel: [262273.498836] qlcnic 0000:07:00.0 ens2f0: 
xmit_finished=162932700, xmit_called=162935603, xmit_on=0, xmit_off=2
Oct  9 14:36:41 mazinger kernel: [262273.498843] qlcnic 0000:07:00.0 ens2f0: 
crb_intr_mask=0
Oct  9 14:36:41 mazinger kernel: [262273.498850] qlcnic 0000:07:00.0 ens2f0: 
hw_producer=568, sw_producer=568 sw_consumer=578, hw_consumer=578
Oct  9 14:36:41 mazinger kernel: [262273.498857] qlcnic 0000:07:00.0 ens2f0: 
Total desc=1024, Available desc=10
Oct  9 14:36:41 mazinger kernel: [262273.498863] qlcnic 0000:07:00.0 ens2f0: Tx 
timeout, reset adapter context.
Oct  9 14:36:43 mazinger kernel: [262275.251864] qlcnic 0000:07:00.0: CDRP 
command failed: [7]
Oct  9 14:36:43 mazinger kernel: [262275.252143] qlcnic 0000:07:00.0: Host MBX 
regs(2)
Oct  9 14:36:43 mazinger kernel: [262275.252146] 00000039 
Oct  9 14:36:43 mazinger kernel: [262275.252148] 00050032 <6>[262275.252150] 
Oct  9 14:36:43 mazinger kernel: [262275.252153] qlcnic 0000:07:00.0: FW MBX 
regs(3)
Oct  9 14:36:43 mazinger kernel: [262275.252155] 00000007 
Oct  9 14:36:43 mazinger kernel: [262275.252156] 00000000 00000000 
Oct  9 14:36:43 mazinger kernel: [262275.252158] 
Oct  9 14:36:43 mazinger kernel: [262275.252166] qlcnic 0000:07:00.0 ens2f0: 
Failed to Delete interrupts 7
Oct  9 14:36:43 mazinger kernel: [262275.279376] br-dmz: port 1(ens2f0.2) 
entered disabled state
Oct  9 14:36:43 mazinger kernel: [262275.447095] qlcnic 0000:07:00.0 ens2f0: Rx 
Context[0] Created, state 0x2
Oct  9 14:36:43 mazinger kernel: [262275.493365] qlcnic 0000:07:00.0 ens2f0: Tx 
Context[0x8000] Created, state 0x2
Oct  9 14:36:43 mazinger kernel: [262275.509816] qlcnic 0000:07:00.0 ens2f0: Tx 
Context[0x800e] Created, state 0x2
Oct  9 14:36:43 mazinger kernel: [262275.527651] qlcnic 0000:07:00.0 ens2f0: Tx 
Context[0x8010] Created, state 0x2
Oct  9 14:36:43 mazinger kernel: [262275.543852] qlcnic 0000:07:00.0 ens2f0: Tx 
Context[0x8012] Created, state 0x2
Oct  9 14:36:43 mazinger kernel: [262275.545966] qlcnic 0000:07:00.0 ens2f0: 
qlcnic_reset_hw_context: soft reset complete
-----------

What I have tried to fix it:

- I have upgraded the interface firmware to the latest version provided
by HP:

# ethtool -i ens2f0
driver: qlcnic
version: 5.3.63
firmware-version: 4.20.1
expansion-rom-version: 
bus-info: 0000:07:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

- I have opened a case with HP. Following their recomendations I have
upgraded the firmware of the server to the latest version. After
capturing a AHS (Active Health System) log the have told me there isn't
a hardware problem and it should be a software issue.

- I have tried HWE Kernel (version 4.10.x) which comes with a newer
version of qlcnic module (5.3.65) but it didn't solved the problem.

- After reading about some problems with TOS and virtual environments, I
have disabled TOS/GOS and other configuration in the interfaces:

auto <iface>
iface <iface> inet manual
    pre-up /sbin/ethtool --offload <iface> gso off tso off sg off gro off


I have found similar problems googling but all of them were solved applying 
one/some of those things. The issue seems to be related to this kind of 
interfaces and using them with virtual environments.

** Affects: kernel-package (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: ganeti qlcnic

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1723482

Title:
  qlcnic firmware hang detected kvm ganeti

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/kernel-package/+bug/1723482/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to