This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1887723 and then change the status of the bug to 'Confirmed'. If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'. This change has been made by an automated script, maintained by the Ubuntu Kernel Team. ** Changed in: linux (Ubuntu) Status: New => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-oem-5.6 in Ubuntu. https://bugs.launchpad.net/bugs/1887723 Title: mlx5_core: Error cqe on cqn Status in linux package in Ubuntu: Incomplete Status in linux-oem-5.6 package in Ubuntu: New Bug description: I have encountered the following repeating error with kernel 5.6.0-1018-oem. Network was disturbed and error kept repeating until for one hour until the system was hung. 316294.820469] mlx5_core 0000:44:00.1 enp68s0f1: Error cqe on cqn 0x816, ci 0xc5, sqn 0x1908, opcode 0xd, syndrome 0x4, vendor syndrome 0x51 [316294.833103] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [316294.833106] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [316294.833110] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [316294.833116] 00000030: 00 00 00 00 04 00 51 04 0e 00 19 08 53 64 dc d2 [316294.833118] WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0x364, len: 128 [316294.833120] 00000000: 00 53 64 0e 00 19 08 07 00 00 00 08 00 00 00 00 [316294.833121] 00000010: 00 00 00 00 c0 00 05 a0 00 00 00 00 00 42 00 a3 [316294.833123] 00000020: 8e bf 47 d7 86 14 ad f8 ef 46 08 00 45 00 12 34 [316294.833124] 00000030: 76 d8 40 00 40 06 77 97 c3 a8 4a 4a 5f 67 cc fa [316294.833126] 00000040: 01 bb d8 2a 5c 7e 3d a0 b0 c5 3e 74 80 18 00 0b [316294.833127] 00000050: 4c 7b 00 00 01 01 08 0a 63 59 a1 46 00 41 05 b4 [316294.833129] 00000060: 00 00 12 00 00 08 01 01 00 00 00 00 c2 c6 0b 74 [316294.833130] 00000070: 00 00 00 44 00 08 01 01 00 00 00 00 c3 09 6c fc [316294.833144] mlx5_core 0000:44:00.1 enp68s0f1: ERR CQE on SQ: 0x1908 [316294.996328] enp68s0f1: hw csum failure [316295.000262] skb len=1500 headroom=78 headlen=1500 tailroom=22 [316295.000262] mac=(64,14) net=(78,40) trans=118 [316295.000262] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) [316295.000262] csum(0x81a5 ip_summed=2 complete_sw=0 valid=0 level=0) [316295.000262] hash(0x322a7dd7 sw=0 l4=1) proto=0x86dd pkttype=0 iif=0 [316295.029909] dev name=enp68s0f1 feat=0x0x0010a1821fd14ba9 ... [316295.943994] Hardware name: ASUSTeK COMPUTER INC. RS500A-E10-RS12U/KRPA-U16 Series, BIOS 0703 03/06/2020 [316295.943995] Call Trace: [316295.943997] <IRQ> [316295.944002] dump_stack+0x6d/0x9a [316295.944006] netdev_rx_csum_fault.part.0+0x41/0x45 [316295.944007] __skb_gro_checksum_complete.cold+0xb/0x10 [316295.944009] tcp6_gro_receive+0xdc/0x1c0 [316295.944010] ipv6_gro_receive+0x1dc/0x460 [316295.944012] ? kmem_cache_alloc+0x16d/0x230 [316295.944017] dev_gro_receive+0x2fb/0x690 [316295.996284] ? mlx5e_build_rx_skb+0x38c/0xb60 [mlx5_core] [316296.010778] napi_gro_receive+0x39/0x140 [316296.010793] mlx5e_handle_rx_cqe+0xa5/0x150 [mlx5_core] [316296.010808] mlx5e_poll_rx_cq+0x7fe/0x910 [mlx5_core] [316296.010825] mlx5e_napi_poll+0xda/0x610 [mlx5_core] [316296.010843] ? mlx5_eq_comp_int+0x149/0x1b0 [mlx5_core] [316296.010850] net_rx_action+0x13a/0x370 [316296.010859] __do_softirq+0xe1/0x2d6 [316296.010862] irq_exit+0xae/0xb0 [316296.010863] do_IRQ+0x5a/0xf0 [316296.010865] common_interrupt+0xf/0xf [316296.010866] </IRQ> [316296.010868] RIP: 0010:cpuidle_enter_state+0xca/0x3e0 [316296.010869] Code: ff e8 aa 7d 7e ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 ea 02 00 00 31 ff e8 2d 01 85 ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 3f 02 00 00 49 63 d4 4c 8b 7d d0 4c 2b 7d c8 48 8d [316296.010870] RSP: 0018:ffff9d84002cfe38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda [316296.010872] RAX: ffff91110b62ce00 RBX: ffff9110ac1d1c00 RCX: 000000000000001f [316296.010872] RDX: 0000000000000000 RSI: 00000000334bfb91 RDI: 0000000000000000 [316296.010873] RBP: ffff9d84002cfe78 R08: 00011fab2ae67109 R09: 00011faebfd6b300 [316296.010873] R10: ffff91110b62bac4 R11: ffff91110b62baa4 R12: 0000000000000002 [316296.010874] R13: ffffffff8f978700 R14: 0000000000000002 R15: ffff9110ac1d1c00 [316296.010876] ? cpuidle_enter_state+0xa6/0x3e0 [316296.010878] cpuidle_enter+0x2e/0x40 [316296.010880] call_cpuidle+0x23/0x40 [316296.010881] do_idle+0x1e7/0x280 [316296.010882] cpu_startup_entry+0x20/0x30 [316296.010885] start_secondary+0x167/0x1c0 [316296.010886] secondary_startup_64+0xa4/0xb0 # lspci -v -s 0000:44:00.1 44:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] Subsystem: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] Flags: bus master, fast devsel, latency 0, IRQ 254, NUMA node 0 Memory at b0000000 (64-bit, prefetchable) [size=32M] Expansion ROM at b5300000 [disabled] [size=1M] Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Capabilities: [180] Single Root I/O Virtualization (SR-IOV) Capabilities: [230] Access Control Services Kernel driver in use: mlx5_core Kernel modules: mlx5_core To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1887723/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp