Public bug reported:

Issue Description:
We encountered a network device timeout error on our server, as indicated by a 
NETDEV WATCHDOG timeout event. The error occurred specifically on the transmit 
queue 4 of the network interface eno12399np0, which uses the bnxt_en driver.

Error Log:

Time of Incident: May 31 03:53:35
Error Message:
yaml
Copy code
NETDEV WATCHDOG: eno12399np0 (bnxt_en): transmit queue 4 timed out
WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:472 dev_watchdog+0x270/0x280
Kernel Version: 5.4.0-182-generic #202-Ubuntu
Hardware: Dell Inc. PowerEdge R650, BIOS 1.13.2 dated 12/19/2023
Modules Linked:
A comprehensive list of kernel modules active at the time was provided, 
including networking and system management modules, which may be relevant to 
diagnosing the issue.

Steps Taken:
We have checked physical connections and rebooted the server without resolving 
the issue. The network interface seems to sporadically fail, leading to these 
watchdog timeouts.

Questions:

Has anyone experienced similar issues with the bnxt_en driver or similar 
hardware configurations?
Are there known issues with this driver version on Ubuntu 20.04 LTS that could 
lead to transmit queue timeouts?
Any recommendations on driver updates, kernel patches, or configuration changes 
that could help mitigate this problem?
Additional Context:

The server is critical to our operations, handling high network traffic loads.
This is the first occurrence after a recent system update.
Request for Assistance:

Insights on debugging further at the kernel level or specific logs that would 
be useful to examine.
Suggestions for temporary workarounds or permanent fixes from community members 
with experience in network management and kernel troubleshooting.


May 31 03:53:35 onf-hk-comp006 kernel: [16160.756411] ------------[ cut here 
]------------
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756415] NETDEV WATCHDOG: 
eno12399np0 (bnxt_en): transmit queue 4 timed out
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756450] WARNING: CPU: 2 PID: 0 at 
net/sched/sch_generic.c:472 dev_watchdog+0x270/0x280
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756452] Modules linked in: 
nf_conntrack_netlink vhost_net vhost tap xsk_diag udp_diag raw_diag unix_diag 
af_packet_diag netlink_diag tcp_diag inet_diag ip6table_raw xt_CT xt_mac xt_set 
xt_multiport xt_tcpudp xt_state xt_conntrack xt_comment xt_physdev 
ip_set_hash_net ip_set iptable_raw veth sch_ingress vxlan ebtable_filter 
ip6_udp_tunnel udp_tunnel ebtables ip6table_filter nfnetlink_cttimeout 
nfnetlink iptable_filter bpfilter aufs rdma_ucm ib_uverbs rdma_cm iw_cm ib_cm 
ib_core overlay 8021q garp mrp bonding nls_iso8859_1 dm_multipath scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua ipmi_ssif binfmt_misc intel_rapl_msr intel_rapl_common 
joydev nfit x86_pkg_temp_thermal intel_powerclamp dell_smbios input_leds dcdbas 
dell_wmi_descriptor wmi_bmof coretemp kvm_intel kvm mei_me isst_if_mbox_pci 
isst_if_mmio isst_if_common mei ipmi_si ipmi_devintf ipmi_msghandler 
acpi_power_meter mac_hid sch_fq_codel openvswitch nsh nf_conncount nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6_tables msr
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756505]  br_netfilter bridge 
ramoops efi_pstore reed_solomon stp llc ip_tables x_tables autofs4 btrfs 
zstd_compress raid10 raid1 raid0 multipath linear dm_thin_pool 
dm_persistent_data dm_bio_prison dm_bufio raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor mgag200 drm_vram_helper 
i2c_algo_bit ttm hid_generic drm_kms_helper syscopyarea raid6_pq sysfillrect 
sysimgblt libcrc32c usbhid hid crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel fb_sys_fops aesni_intel crypto_simd cryptd nvme glue_helper 
ahci drm nvme_core bnxt_en tg3 i2c_i801 libahci wmi
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756543] CPU: 2 PID: 0 Comm: 
swapper/2 Not tainted 5.4.0-182-generic #202-Ubuntu
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756546] Hardware name: Dell Inc. 
PowerEdge R650/0FGCWW, BIOS 1.13.2 12/19/2023
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756551] RIP: 
0010:dev_watchdog+0x270/0x280
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756556] Code: eb 9d 48 8b 5d d0 
c6 05 ba 7c 2a 01 01 48 89 df e8 25 ae fa ff 44 89 e1 48 89 de 48 c7 c7 80 a6 
20 b4 48 89 c2 e8 be 46 14 00 <0f> 0b e9 77 ff ff ff 66 0f 1f 84 00 00 00 00 00 
0f 1f 44 00 00 55
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756559] RSP: 
0018:ffffae574017ce38 EFLAGS: 00010282
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756562] RAX: 0000000000000000 
RBX: ffff9ead25d40000 RCX: 0000000000000006
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756564] RDX: 0000000000000007 
RSI: 0000000000000086 RDI: ffff9ead3f65c8c0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756566] RBP: ffffae574017ce70 
R08: 000000000000094a R09: 0000000000000004
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756567] R10: 0000000000000000 
R11: 0000000000000001 R12: 0000000000000004
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756569] R13: ffff9ead25d4dbc0 
R14: 000000000000004a R15: ffff9ead25d40480
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756572] FS:  
0000000000000000(0000) GS:ffff9ead3f640000(0000) knlGS:0000000000000000
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756574] CS:  0010 DS: 0000 ES: 
0000 CR0: 0000000080050033
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756576] CR2: 00007f311800b3c0 
CR3: 0000003f1c522004 CR4: 0000000000762ee0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756578] DR0: 0000000000000000 
DR1: 0000000000000000 DR2: 0000000000000000
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756580] DR3: 0000000000000000 
DR6: 00000000fffe0ff0 DR7: 0000000000000400
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756581] PKRU: 55555554
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756583] Call Trace:
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756586]  <IRQ>
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756596]  ? 
show_regs.cold+0x1a/0x1f
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756603]  ? __warn+0x98/0xe0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756607]  ? 
dev_watchdog+0x270/0x280
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756613]  ? report_bug+0xd1/0x100
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756621]  ? do_error_trap+0x9b/0xc0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756624]  ? do_invalid_op+0x3c/0x50
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756628]  ? 
dev_watchdog+0x270/0x280
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756634]  ? invalid_op+0x1e/0x30
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756638]  ? 
dev_watchdog+0x270/0x280
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756641]  ? 
dev_watchdog+0x270/0x280
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756645]  ? 
pfifo_fast_enqueue+0x150/0x150
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756652]  call_timer_fn+0x32/0x130
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756658]  
__run_timers.part.0+0x180/0x280
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756663]  ? 
timerqueue_add+0x9b/0xb0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756668]  ? 
enqueue_hrtimer+0x43/0xa0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756671]  ? ktime_get+0x3e/0xa0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756676]  
run_timer_softirq+0x2a/0x50
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756682]  __do_softirq+0xd1/0x2c1
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756687]  irq_exit+0xae/0xb0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756692]  
smp_apic_timer_interrupt+0x7b/0x140
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756697]  
apic_timer_interrupt+0xf/0x20
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756699]  </IRQ>
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756706] RIP: 
0010:cpuidle_enter_state+0xc5/0x450
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756710] Code: ff e8 cf 06 83 ff 
80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 65 03 00 00 31 ff e8 f2 
1e 89 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8f 02 00 00 49 63 cd 4c 8b 7d d0 
4c 2b 7d c8 48 8d
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756712] RSP: 
0018:ffffae5740397e38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756715] RAX: ffff9ead3f66ff00 
RBX: ffffffffb4969be0 RCX: 000000000000001f
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756717] RDX: 0000000000000000 
RSI: 000000002dd27b80 RDI: 0000000000000000
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756718] RBP: ffffae5740397e78 
R08: 00000eb2b824f134 R09: 000000007fffffff
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756720] R10: ffff9ead3f66ebc0 
R11: ffff9ead3f66eba0 R12: ffff9ead33291800
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756722] R13: 0000000000000002 
R14: 0000000000000002 R15: ffff9ead33291800
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756728]  ? 
cpuidle_enter_state+0xa1/0x450
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756733]  cpuidle_enter+0x2e/0x40
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756739]  call_cpuidle+0x23/0x40
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756742]  do_idle+0x1dd/0x270
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756747]  
cpu_startup_entry+0x20/0x30
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756754]  
start_secondary+0x178/0x1d0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756760]  
secondary_startup_64+0xa4/0xb0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756764] ---[ end trace 
73ce74318a7baae1 ]---
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756771] bnxt_en 0000:31:00.0 
eno12399np0: TX timeout detected, starting reset task!

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2067712

Title:
  NETDEV WATCHDOG: eno12399np0 (bnxt_en): transmit queue 4 timed out

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2067712/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to