Public bug reported:
Issue Description: We encountered a network device timeout error on our server, as indicated by a NETDEV WATCHDOG timeout event. The error occurred specifically on the transmit queue 4 of the network interface eno12399np0, which uses the bnxt_en driver. Error Log: Time of Incident: May 31 03:53:35 Error Message: yaml Copy code NETDEV WATCHDOG: eno12399np0 (bnxt_en): transmit queue 4 timed out WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:472 dev_watchdog+0x270/0x280 Kernel Version: 5.4.0-182-generic #202-Ubuntu Hardware: Dell Inc. PowerEdge R650, BIOS 1.13.2 dated 12/19/2023 Modules Linked: A comprehensive list of kernel modules active at the time was provided, including networking and system management modules, which may be relevant to diagnosing the issue. Steps Taken: We have checked physical connections and rebooted the server without resolving the issue. The network interface seems to sporadically fail, leading to these watchdog timeouts. Questions: Has anyone experienced similar issues with the bnxt_en driver or similar hardware configurations? Are there known issues with this driver version on Ubuntu 20.04 LTS that could lead to transmit queue timeouts? Any recommendations on driver updates, kernel patches, or configuration changes that could help mitigate this problem? Additional Context: The server is critical to our operations, handling high network traffic loads. This is the first occurrence after a recent system update. Request for Assistance: Insights on debugging further at the kernel level or specific logs that would be useful to examine. Suggestions for temporary workarounds or permanent fixes from community members with experience in network management and kernel troubleshooting. May 31 03:53:35 onf-hk-comp006 kernel: [16160.756411] ------------[ cut here ]------------ May 31 03:53:35 onf-hk-comp006 kernel: [16160.756415] NETDEV WATCHDOG: eno12399np0 (bnxt_en): transmit queue 4 timed out May 31 03:53:35 onf-hk-comp006 kernel: [16160.756450] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:472 dev_watchdog+0x270/0x280 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756452] Modules linked in: nf_conntrack_netlink vhost_net vhost tap xsk_diag udp_diag raw_diag unix_diag af_packet_diag netlink_diag tcp_diag inet_diag ip6table_raw xt_CT xt_mac xt_set xt_multiport xt_tcpudp xt_state xt_conntrack xt_comment xt_physdev ip_set_hash_net ip_set iptable_raw veth sch_ingress vxlan ebtable_filter ip6_udp_tunnel udp_tunnel ebtables ip6table_filter nfnetlink_cttimeout nfnetlink iptable_filter bpfilter aufs rdma_ucm ib_uverbs rdma_cm iw_cm ib_cm ib_core overlay 8021q garp mrp bonding nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif binfmt_misc intel_rapl_msr intel_rapl_common joydev nfit x86_pkg_temp_thermal intel_powerclamp dell_smbios input_leds dcdbas dell_wmi_descriptor wmi_bmof coretemp kvm_intel kvm mei_me isst_if_mbox_pci isst_if_mmio isst_if_common mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6_tables msr May 31 03:53:35 onf-hk-comp006 kernel: [16160.756505] br_netfilter bridge ramoops efi_pstore reed_solomon stp llc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid1 raid0 multipath linear dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor mgag200 drm_vram_helper i2c_algo_bit ttm hid_generic drm_kms_helper syscopyarea raid6_pq sysfillrect sysimgblt libcrc32c usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel fb_sys_fops aesni_intel crypto_simd cryptd nvme glue_helper ahci drm nvme_core bnxt_en tg3 i2c_i801 libahci wmi May 31 03:53:35 onf-hk-comp006 kernel: [16160.756543] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.4.0-182-generic #202-Ubuntu May 31 03:53:35 onf-hk-comp006 kernel: [16160.756546] Hardware name: Dell Inc. PowerEdge R650/0FGCWW, BIOS 1.13.2 12/19/2023 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756551] RIP: 0010:dev_watchdog+0x270/0x280 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756556] Code: eb 9d 48 8b 5d d0 c6 05 ba 7c 2a 01 01 48 89 df e8 25 ae fa ff 44 89 e1 48 89 de 48 c7 c7 80 a6 20 b4 48 89 c2 e8 be 46 14 00 <0f> 0b e9 77 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756559] RSP: 0018:ffffae574017ce38 EFLAGS: 00010282 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756562] RAX: 0000000000000000 RBX: ffff9ead25d40000 RCX: 0000000000000006 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756564] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff9ead3f65c8c0 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756566] RBP: ffffae574017ce70 R08: 000000000000094a R09: 0000000000000004 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756567] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000004 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756569] R13: ffff9ead25d4dbc0 R14: 000000000000004a R15: ffff9ead25d40480 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756572] FS: 0000000000000000(0000) GS:ffff9ead3f640000(0000) knlGS:0000000000000000 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756574] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756576] CR2: 00007f311800b3c0 CR3: 0000003f1c522004 CR4: 0000000000762ee0 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756578] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756580] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756581] PKRU: 55555554 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756583] Call Trace: May 31 03:53:35 onf-hk-comp006 kernel: [16160.756586] <IRQ> May 31 03:53:35 onf-hk-comp006 kernel: [16160.756596] ? show_regs.cold+0x1a/0x1f May 31 03:53:35 onf-hk-comp006 kernel: [16160.756603] ? __warn+0x98/0xe0 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756607] ? dev_watchdog+0x270/0x280 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756613] ? report_bug+0xd1/0x100 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756621] ? do_error_trap+0x9b/0xc0 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756624] ? do_invalid_op+0x3c/0x50 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756628] ? dev_watchdog+0x270/0x280 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756634] ? invalid_op+0x1e/0x30 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756638] ? dev_watchdog+0x270/0x280 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756641] ? dev_watchdog+0x270/0x280 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756645] ? pfifo_fast_enqueue+0x150/0x150 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756652] call_timer_fn+0x32/0x130 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756658] __run_timers.part.0+0x180/0x280 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756663] ? timerqueue_add+0x9b/0xb0 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756668] ? enqueue_hrtimer+0x43/0xa0 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756671] ? ktime_get+0x3e/0xa0 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756676] run_timer_softirq+0x2a/0x50 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756682] __do_softirq+0xd1/0x2c1 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756687] irq_exit+0xae/0xb0 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756692] smp_apic_timer_interrupt+0x7b/0x140 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756697] apic_timer_interrupt+0xf/0x20 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756699] </IRQ> May 31 03:53:35 onf-hk-comp006 kernel: [16160.756706] RIP: 0010:cpuidle_enter_state+0xc5/0x450 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756710] Code: ff e8 cf 06 83 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 65 03 00 00 31 ff e8 f2 1e 89 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8f 02 00 00 49 63 cd 4c 8b 7d d0 4c 2b 7d c8 48 8d May 31 03:53:35 onf-hk-comp006 kernel: [16160.756712] RSP: 0018:ffffae5740397e38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756715] RAX: ffff9ead3f66ff00 RBX: ffffffffb4969be0 RCX: 000000000000001f May 31 03:53:35 onf-hk-comp006 kernel: [16160.756717] RDX: 0000000000000000 RSI: 000000002dd27b80 RDI: 0000000000000000 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756718] RBP: ffffae5740397e78 R08: 00000eb2b824f134 R09: 000000007fffffff May 31 03:53:35 onf-hk-comp006 kernel: [16160.756720] R10: ffff9ead3f66ebc0 R11: ffff9ead3f66eba0 R12: ffff9ead33291800 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756722] R13: 0000000000000002 R14: 0000000000000002 R15: ffff9ead33291800 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756728] ? cpuidle_enter_state+0xa1/0x450 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756733] cpuidle_enter+0x2e/0x40 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756739] call_cpuidle+0x23/0x40 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756742] do_idle+0x1dd/0x270 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756747] cpu_startup_entry+0x20/0x30 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756754] start_secondary+0x178/0x1d0 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756760] secondary_startup_64+0xa4/0xb0 May 31 03:53:35 onf-hk-comp006 kernel: [16160.756764] ---[ end trace 73ce74318a7baae1 ]--- May 31 03:53:35 onf-hk-comp006 kernel: [16160.756771] bnxt_en 0000:31:00.0 eno12399np0: TX timeout detected, starting reset task! ** Affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2067712 Title: NETDEV WATCHDOG: eno12399np0 (bnxt_en): transmit queue 4 timed out To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2067712/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs