** Description changed:

- For the customer OpenStack deployment we deploy infra nodes on Dell R630
- servers. The servers have onboard Broadcom's NetXtreme II BCM57800 NIC
- (quad port: 2x1G ports, 2x10G ports). For each port in UP state, we
- observe 100% CPU load. So in total, we observe 4 CPUs with 100% load.
+ [Impact]
  
- perf report shows function bnx2x_ptp_task taking up much of the CPUs
- time: https://pastebin.canonical.com/p/kfrpd6Pwh5/
+ * The PTP feature in bnx2x driver is implemented in a way that if the
+ NIC firmware takes some time to perform the timestamping - which is
+ observed as a bad register read in bnx2x_ptp_task() - then the ptp
+ worker function will reschedule itself indefinitely until the value read
+ from the register is meaningful. With that behavior, if an userspace
+ tool request a bad configured RX filter to bnx2x (or if NIC firmware has
+ any other issue in timestamping), the function bnx2x_ptp_task() will be
+ rescheduled forever and cause a unbound resource consumption. This
+ manifests as a kworker thread consuming 100% of CPU.
  
- Also, /var/log/syslog contains the following outputs every few seconds:
  
- [1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
- [1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
- [1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
- [1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
+ * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":
  
- So, the problem seems to be in a "timestampped" TX packet; the driver
- for some reason (to be yet understood) get an unexpected value from a
- register and then, it that same function, reschedule itself to try again
- this register read, read gets a bad value again, and so on infinitely.
+ "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
+ outstanding packet to timestamp, this packet will not be timestamped"
  
- This is showing in the system as the 100% CPU usage kthreads; the
- message "The device supports only a single outstanding packet to
- timestamp, this packet will not be timestamped" happens because the
- driver can only timestamp a single TX packet at a time, and given it's
- stuck trying, it cannot accept another packet in this "queue".
+ Also, by using ftrace user can notice that function bnx2x_ptp_task() is
+ being called a lot, and by enabling bnx2x PTP debugging log (ethtool -s
+ <iface> msglvl 16777216) it's possible to observe the following message
+ flooding the kernel log:
  
- The infinite loop appears to be:
+ "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp yet"
  
- static void bnx2x_ptp_task(struct work_struct *work)
- {
- struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task);
- int port = BP_PORT(bp);
- u32 val_seq;
- u64 timestamp, ns;
- struct skb_shared_hwtstamps shhwtstamps;
  
- /* Read Tx timestamp registers */
- val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID :
- NIG_REG_P0_TLLH_PTP_BUF_SEQID);
- if (val_seq & 0x10000) {
- [...]
- } else {
- DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n");
- /* Reschedule to keep checking for a valid timestamp value */
- schedule_work(&bp->ptp_task);
- }
+ * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
+ git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
+ Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.
  
- It appears that val_seq & 0x10000 is never true, so the task constantly
- reschedules itself immediately. Instrumenting the function shows that it
- is being called in excess of 100,000 times per second. The REG_RD call
- does appear to be expensive (as it's a register read from the device)
- and shows high in the perf report, but that by itself doesn't appear to
- be the root cause (i.e., it's not hanging forever in the REG_RD).
  
- The cause appears to be that the driver is not prepared to deal with the
- PTP request never being completed by the hardware. It's unclear why it
- isn't completing, but regardless, the driver should not loop forever
- here.
+ [Test case]
+ 
+ Reproducing the problem is not difficult; we've used chrony in Bionic to
+ trigger the problem. The steps are:
+ 
+ a) Install chrony on Bionic in a system with working NIC managed by
+ bnx2x;
+ 
+ b) Edit chrony configuration and add: "hwtimestamp *" to the top of its
+ conf file;
+ 
+ c) Restart chrony service
+ 
+ Check dmesg for the "[...]single outstanding packet" message and the
+ overall CPU workload using a tool like "top" to observe a kthread
+ consuming 100% of CPU.
+ 
+ 
+ [Regression potential]
+ 
+ The patch scope is restricted to bnx2x ptp handler, and was validated by
+ the driver maintainer. If there's any possibility of regressions, we
+ believe the worst would be an issue affecting the packet timestamping,
+ not messing with the regular xmit path for the driver.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Confirmed
Status in linux source package in FF-Series:
  Confirmed

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s <iface> msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to