** Description changed: - Hi, - we have the following issue which affects a lot of our customers this issue fixes upstream and need to add the fixes to ubuntu 18.04. + BugLink: https://bugs.launchpad.net/bugs/1854842 - Mlx5 driver: Tail padding HW Checksum crash in Ubuntu 18.04 kernel - Ubuntu-4.15.0-72 + [Impact] - Crach log: + On machines equipped with Mellanox NIC's, in this particular case, + Mellanox 5 series NICs using the mlx5_core driver, there is a kernel + splat when sending large IP packets which have padding at the end. - [ 785.337368] Call Trace: - [ 785.337372] <IRQ> - [ 785.337388] dump_stack+0x63/0x8e - [ 785.337397] netdev_rx_csum_fault+0x38/0x40 - [ 785.337403] __skb_checksum_complete+0xbc/0xd0 - [ 785.337408] nf_ip_checksum+0xc3/0xf0 - [ 785.337417] icmp_error+0x27d/0x310 [nf_conntrack_ipv4] - [ 785.337431] nf_conntrack_in+0x15a/0x510 [nf_conntrack] - [ 785.337437] ? __skb_checksum+0x68/0x330 - [ 785.337441] ipv4_conntrack_in+0x1c/0x20 [nf_conntrack_ipv4] - [ 785.337449] nf_hook_slow+0x48/0xc0 - [ 785.337452] ? skb_send_sock+0x50/0x50 - [ 785.337460] ip_rcv+0x301/0x360 - [ 785.337463] ? inet_del_offload+0x40/0x40 - [ 785.337468] __netif_receive_skb_core+0x432/0xb80 - [ 785.337473] __netif_receive_skb+0x18/0x60 - [ 785.337477] ? __netif_receive_skb+0x18/0x60 - [ 785.337481] netif_receive_skb_internal+0x45/0xe0 - [ 785.337483] napi_gro_receive+0xc5/0xf0 - [ 785.337517] mlx5e_handle_rx_cqe+0x48d/0x5e0 [mlx5_core] - [ 785.337524] ? enqueue_task_rt+0x1b4/0x2e0 - [ 785.337546] mlx5e_poll_rx_cq+0xd1/0x8c0 [mlx5_core] - [ 785.337566] mlx5e_napi_poll+0x9d/0x290 [mlx5_core] - [ 785.337569] net_rx_action+0x140/0x3a0 - [ 785.337574] __do_softirq+0xe4/0x2d4 - [ 785.337580] irq_exit+0xc5/0xd0 - [ 785.337583] do_IRQ+0x86/0xe0 - [ 785.337588] common_interrupt+0x8c/0x8c - [ 785.337590] </IRQ> - [ 785.337598] RIP: 0010:cpuidle_enter_state+0xa4/0x2f0 - [ 785.337600] RSP: 0018:ffffad8d8329fe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9 - [ 785.337604] RAX: ffff8a6c7f7e1840 RBX: 000000b6d9bf6a06 RCX: 000000000000001f - [ 785.337605] RDX: 000000b6d9bf6a06 RSI: ffd4a4b4c86359ce RDI: 0000000000000000 - [ 785.337607] RBP: ffffad8d8329fea8 R08: 0000000000000004 R09: 0000000000021080 - [ 785.337609] R10: ffffad8d8329fe38 R11: 0056b80166a42400 R12: ffff8a6c7f7ece18 - [ 785.337610] R13: 0000000000000005 R14: ffffffffaff73438 R15: 0000000000000000 + enp6s0f0: hw csum failure + CPU: 19 PID: 0 Comm: swapper/19 Not tainted 4.15.0-72-generic + Call Trace: + <IRQ> + dump_stack+0x63/0x8e + netdev_rx_csum_fault+0x38/0x40 + __skb_checksum_complete+0xbc/0xd0 + nf_ip_checksum+0xc3/0xf0 + icmp_error+0x27d/0x310 [nf_conntrack_ipv4] + nf_conntrack_in+0x15a/0x510 [nf_conntrack] + ? __skb_checksum+0x68/0x330 + ipv4_conntrack_in+0x1c/0x20 [nf_conntrack_ipv4] + nf_hook_slow+0x48/0xc0 + ? skb_send_sock+0x50/0x50 + ip_rcv+0x301/0x360 + ? inet_del_offload+0x40/0x40 + __netif_receive_skb_core+0x432/0xb80 + __netif_receive_skb+0x18/0x60 + ? __netif_receive_skb+0x18/0x60 + netif_receive_skb_internal+0x45/0xe0 + napi_gro_receive+0xc5/0xf0 + mlx5e_handle_rx_cqe+0x48d/0x5e0 [mlx5_core] + ? enqueue_task_rt+0x1b4/0x2e0 + mlx5e_poll_rx_cq+0xd1/0x8c0 [mlx5_core] + mlx5e_napi_poll+0x9d/0x290 [mlx5_core] + net_rx_action+0x140/0x3a0 + __do_softirq+0xe4/0x2d4 + irq_exit+0xc5/0xd0 + do_IRQ+0x86/0xe0 + common_interrupt+0x8c/0x8c + </IRQ> + This bug is a further attempt to fix these splats, as there has been + previous fixes in LP #1840854 and a series of commits which landed in + 4.15.0-67 (LP #1847155) as a part of upstream -stable patches. - [HOW TO REPRODUCE]: - with scapy on the sender side please run the following commands: - 1) a=Ether(dst='ff:ff:ff:ff:ff:ff')/IP(dst='127.0.0.1')/ICMP()/Padding(load='\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe') + This bug will also fix the same problems on the new Mellanox CX6 and + Bluefield hardware, which has been enabled already via previous upstream + -stable patches which landed in LP #1847155. + + [Fix] + + This particular issue was fixed for Mellanox series 5 drivers in the + following commits: + + commit 0aa1d18615c163f92935b806dcaff9157645233a + Author: Saeed Mahameed <sae...@mellanox.com> + Date: Tue Mar 12 00:24:52 2019 -0700 + Subject: net/mlx5e: Rx, Fixup skb checksum for packets with tail padding + + This commit required a minor backport. + + This commit was selected for upstream -stable in 4.19.76 and 5.0.10. + This commit appears to be omitted from "Bionic update: upstream stable patchset 2019-10-07", which is LP #1847155, probably due to requiring a backport. + + commit db849faa9bef993a1379dc510623f750a72fa7ce + Author: Saeed Mahameed <sae...@mellanox.com> + Date: Fri May 3 13:14:59 2019 -0700 + Subject: net/mlx5e: Rx, Fix checksum calculation for new hardware + + This commit required a minor backport. + + This commit was selected for upstream -stable in 5.1.21 and 5.2.4. + This commit has already been applied to the disco kernel, as part of stable updates. + + [Testcase] + + The following scapy script will reproduce this issue. Run from the + machine with the Mellanox series 5 NIC: + + 1) + a=Ether(dst='ff:ff:ff:ff:ff:ff')/IP(dst='127.0.0.1')/ICMP()/Padding(load='\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe') 2) sendp(a, iface='enp6s0f0') - 3) check the dmesg i the receiver side + 3) Check dmesg on the reciever side. The example uses localhost, so + check dmesg. - [ADDITIONAL INFO]: - This issue fixes upstream by the following set of patches: - net/mlx5e: Rx, Fix checksum calculation for new hardware --> db849faa9bef993a1379dc510623f750a72fa7ce - net/mlx5e: Rx, Check ip headers sanity - > 0318a7b7fcad9765931146efa7ca3a034194737c - net/mlx5e: Rx, Fixup skb checksum for packets with tail padding --> 0aa1d18615c163f92935b806dcaff9157645233a - net/mlx5e: XDP, Avoid checksum complete when XDP prog is loaded --> 5d0bb3bac4b9f6c22280b04545626fdfd99edc6b - mlx5: fix get_ip_proto() --> ef6fcd455278c2be3032a346cc66d9dd9866b787 - net/mlx5e: Allow reporting of checksum unnecessary --> b856df28f9230a47669efbdd57896084caadb2b3 - net/mlx5e: don't set CHECKSUM_COMPLETE on SCTP packets --> fe1dc069990c1f290ef6b99adb46332c03258f38 - net/mlx5e: Set ECN for received packets using CQE indication --> f007c13d4ad62f494c83897eda96437005df4a91 - net/mlx5e: Add likely to the common RX checksum flow --> 63a612f984a1fae040ab6f1c6a0f1fdcdf1954b8 - net/mlx5e: CHECKSUM_COMPLETE offload for VLAN/QinQ packets --> f938daeee95eb36ef6b431bf054a5cc6cdada112 + I have built some test kernels, which are available here: + https://launchpad.net/~mruffell/+archive/ubuntu/lp1854842-test + This kernel contains 0aa1d18615c163f92935b806dcaff9157645233a. - attached the /var/log/kern.log file. + and + + https://launchpad.net/~mruffell/+archive/ubuntu/lp1854842-test-2 + This kernel contains db849faa9bef993a1379dc510623f750a72fa7ce. + + If you install the test kernels the issue is resolved. + + [Regression Potential] + + The changes are limited to the mlx5_core driver, and only modify how + packet checksums are calculated when padding is involved. + + Both patches have been accepted and published by upstream -stable, and + are widely accepted by the community. + + Because of this, I believe the risk of regression is low.
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1854842 Title: mlx5_core reports hardware checksum error for padded packets on Mellanox NICs Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: In Progress Bug description: BugLink: https://bugs.launchpad.net/bugs/1854842 [Impact] On machines equipped with Mellanox NIC's, in this particular case, Mellanox 5 series NICs using the mlx5_core driver, there is a kernel splat when sending large IP packets which have padding at the end. enp6s0f0: hw csum failure CPU: 19 PID: 0 Comm: swapper/19 Not tainted 4.15.0-72-generic Call Trace: <IRQ> dump_stack+0x63/0x8e netdev_rx_csum_fault+0x38/0x40 __skb_checksum_complete+0xbc/0xd0 nf_ip_checksum+0xc3/0xf0 icmp_error+0x27d/0x310 [nf_conntrack_ipv4] nf_conntrack_in+0x15a/0x510 [nf_conntrack] ? __skb_checksum+0x68/0x330 ipv4_conntrack_in+0x1c/0x20 [nf_conntrack_ipv4] nf_hook_slow+0x48/0xc0 ? skb_send_sock+0x50/0x50 ip_rcv+0x301/0x360 ? inet_del_offload+0x40/0x40 __netif_receive_skb_core+0x432/0xb80 __netif_receive_skb+0x18/0x60 ? __netif_receive_skb+0x18/0x60 netif_receive_skb_internal+0x45/0xe0 napi_gro_receive+0xc5/0xf0 mlx5e_handle_rx_cqe+0x48d/0x5e0 [mlx5_core] ? enqueue_task_rt+0x1b4/0x2e0 mlx5e_poll_rx_cq+0xd1/0x8c0 [mlx5_core] mlx5e_napi_poll+0x9d/0x290 [mlx5_core] net_rx_action+0x140/0x3a0 __do_softirq+0xe4/0x2d4 irq_exit+0xc5/0xd0 do_IRQ+0x86/0xe0 common_interrupt+0x8c/0x8c </IRQ> This bug is a further attempt to fix these splats, as there has been previous fixes in LP #1840854 and a series of commits which landed in 4.15.0-67 (LP #1847155) as a part of upstream -stable patches. This bug will also fix the same problems on the new Mellanox CX6 and Bluefield hardware, which has been enabled already via previous upstream -stable patches which landed in LP #1847155. [Fix] This particular issue was fixed for Mellanox series 5 drivers in the following commits: commit 0aa1d18615c163f92935b806dcaff9157645233a Author: Saeed Mahameed <sae...@mellanox.com> Date: Tue Mar 12 00:24:52 2019 -0700 Subject: net/mlx5e: Rx, Fixup skb checksum for packets with tail padding This commit required a minor backport. This commit was selected for upstream -stable in 4.19.76 and 5.0.10. This commit appears to be omitted from "Bionic update: upstream stable patchset 2019-10-07", which is LP #1847155, probably due to requiring a backport. commit db849faa9bef993a1379dc510623f750a72fa7ce Author: Saeed Mahameed <sae...@mellanox.com> Date: Fri May 3 13:14:59 2019 -0700 Subject: net/mlx5e: Rx, Fix checksum calculation for new hardware This commit required a minor backport. This commit was selected for upstream -stable in 5.1.21 and 5.2.4. This commit has already been applied to the disco kernel, as part of stable updates. [Testcase] The following scapy script will reproduce this issue. Run from the machine with the Mellanox series 5 NIC: 1) a=Ether(dst='ff:ff:ff:ff:ff:ff')/IP(dst='127.0.0.1')/ICMP()/Padding(load='\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe') 2) sendp(a, iface='enp6s0f0') 3) Check dmesg on the reciever side. The example uses localhost, so check dmesg. I have built some test kernels, which are available here: https://launchpad.net/~mruffell/+archive/ubuntu/lp1854842-test This kernel contains 0aa1d18615c163f92935b806dcaff9157645233a. and https://launchpad.net/~mruffell/+archive/ubuntu/lp1854842-test-2 This kernel contains db849faa9bef993a1379dc510623f750a72fa7ce. If you install the test kernels the issue is resolved. [Regression Potential] The changes are limited to the mlx5_core driver, and only modify how packet checksums are calculated when padding is involved. Both patches have been accepted and published by upstream -stable, and are widely accepted by the community. Because of this, I believe the risk of regression is low. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1854842/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp