Hello Mohammad,

The kernel team has reviewed the patches and they have received two acks
from senior kernel developers:

https://lists.ubuntu.com/archives/kernel-team/2019-December/106516.html
https://lists.ubuntu.com/archives/kernel-team/2020-January/106624.html

>From there, the patch was applied to bionic master-next branch, which
means it will be included in this current SRU cycle:

https://lists.ubuntu.com/archives/kernel-team/2020-January/106643.html

The next steps is for the kernel team to build the kernel update and
push it to -proposed. This will likely happen at the end of this week /
early next week. Once this happens, I will write again and ask you to
test the kernel in -proposed to make sure that it fixes the problem.

If it does, we can tag this bug as verified, and wait for the kernel to
be released.

I'll keep you informed of all progress, and I'll write back soon when it
is time to test the kernel in -proposed.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854842

Title:
  mlx5_core reports hardware checksum error for padded packets on
  Mellanox NICs

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1854842

  [Impact]

  On machines equipped with Mellanox NIC's, in this particular case,
  Mellanox 5 series NICs using the mlx5_core driver, there is a kernel
  splat when sending large IP packets which have padding at the end.

  enp6s0f0: hw csum failure
  CPU: 19 PID: 0 Comm: swapper/19 Not tainted 4.15.0-72-generic
  Call Trace:
  <IRQ>
  dump_stack+0x63/0x8e
  netdev_rx_csum_fault+0x38/0x40
  __skb_checksum_complete+0xbc/0xd0
  nf_ip_checksum+0xc3/0xf0
  icmp_error+0x27d/0x310 [nf_conntrack_ipv4]
  nf_conntrack_in+0x15a/0x510 [nf_conntrack]
  ? __skb_checksum+0x68/0x330
  ipv4_conntrack_in+0x1c/0x20 [nf_conntrack_ipv4]
  nf_hook_slow+0x48/0xc0
  ? skb_send_sock+0x50/0x50
  ip_rcv+0x301/0x360
  ? inet_del_offload+0x40/0x40
  __netif_receive_skb_core+0x432/0xb80
  __netif_receive_skb+0x18/0x60
  ? __netif_receive_skb+0x18/0x60
  netif_receive_skb_internal+0x45/0xe0
  napi_gro_receive+0xc5/0xf0
  mlx5e_handle_rx_cqe+0x48d/0x5e0 [mlx5_core]
  ? enqueue_task_rt+0x1b4/0x2e0
  mlx5e_poll_rx_cq+0xd1/0x8c0 [mlx5_core]
  mlx5e_napi_poll+0x9d/0x290 [mlx5_core]
  net_rx_action+0x140/0x3a0
  __do_softirq+0xe4/0x2d4
  irq_exit+0xc5/0xd0
  do_IRQ+0x86/0xe0
  common_interrupt+0x8c/0x8c
  </IRQ>

  This bug is a further attempt to fix these splats, as there has been
  previous fixes in LP #1840854 and a series of commits which landed in
  4.15.0-67 (LP #1847155) as a part of upstream -stable patches.

  This bug will also fix the same problems on the new Mellanox CX6 and
  Bluefield hardware, which has been enabled already via previous
  upstream -stable patches which landed in LP #1847155.

  [Fix]

  This particular issue was fixed for Mellanox series 5 drivers in the
  following commits:

  commit 0aa1d18615c163f92935b806dcaff9157645233a
  Author: Saeed Mahameed <sae...@mellanox.com>
  Date:   Tue Mar 12 00:24:52 2019 -0700
  Subject: net/mlx5e: Rx, Fixup skb checksum for packets with tail padding

  This commit required a minor backport.

  This commit was selected for upstream -stable in 4.19.76 and 5.0.10.
  This commit appears to be omitted from "Bionic update: upstream stable 
patchset 2019-10-07", which is LP #1847155, probably due to requiring a 
backport.

  commit db849faa9bef993a1379dc510623f750a72fa7ce
  Author: Saeed Mahameed <sae...@mellanox.com>
  Date:   Fri May 3 13:14:59 2019 -0700
  Subject: net/mlx5e: Rx, Fix checksum calculation for new hardware

  This commit required a minor backport.

  This commit was selected for upstream -stable in 5.1.21 and 5.2.4.
  This commit has already been applied to the disco kernel, as part of stable 
updates.

  [Testcase]

  The following scapy script will reproduce this issue. Run from the
  machine with the Mellanox series 5 NIC:

  1)
  
a=Ether(dst='ff:ff:ff:ff:ff:ff')/IP(dst='127.0.0.1')/ICMP()/Padding(load='\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe')

  2) sendp(a, iface='enp6s0f0')

  3) Check dmesg on the reciever side. The example uses localhost, so
  check dmesg.

  I have built some test kernels, which are available here:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1854842-test
  This kernel contains 0aa1d18615c163f92935b806dcaff9157645233a.

  and

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1854842-test-2
  This kernel contains db849faa9bef993a1379dc510623f750a72fa7ce.

  If you install the test kernels the issue is resolved.

  [Regression Potential]

  The changes are limited to the mlx5_core driver, and only modify how
  packet checksums are calculated when padding is involved.

  Both patches have been accepted and published by upstream -stable, and
  are widely accepted by the community.

  Because of this, I believe the risk of regression is low.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1854842/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to