Thanks Przemyslaw, good explanation on bug's description! I'm dealing with this one, will update status here with news.
Cheers, Guilherme ** Also affects: linux (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Disco) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Eoan) Importance: Undecided Status: Incomplete ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Ff-series) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Eoan) Status: Incomplete => Confirmed ** Changed in: linux (Ubuntu Ff-series) Status: New => Confirmed ** Changed in: linux (Ubuntu Disco) Status: New => Confirmed ** Changed in: linux (Ubuntu Cosmic) Status: New => Confirmed ** Changed in: linux (Ubuntu Bionic) Status: New => Confirmed ** Changed in: linux (Ubuntu Xenial) Status: New => Confirmed ** Changed in: linux (Ubuntu Xenial) Importance: Undecided => High ** Changed in: linux (Ubuntu Bionic) Importance: Undecided => High ** Changed in: linux (Ubuntu Cosmic) Importance: Undecided => High ** Changed in: linux (Ubuntu Disco) Importance: Undecided => High ** Changed in: linux (Ubuntu Eoan) Importance: Undecided => High ** Changed in: linux (Ubuntu Ff-series) Importance: Undecided => High ** Changed in: linux (Ubuntu Xenial) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Cosmic) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Ff-series) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Eoan) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Disco) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Tags removed: bionic ** Tags added: bnx2x sts ** Description changed: For the customer OpenStack deployment we deploy infra nodes on Dell R630 servers. The servers have onboard Broadcom's NetXtreme II BCM57800 NIC (quad port: 2x1G ports, 2x10G ports). For each port in UP state, we observe 100% CPU load. So in total, we observe 4 CPUs with 100% load. perf report shows function bnx2x_ptp_task taking up much of the CPUs time: https://pastebin.canonical.com/p/kfrpd6Pwh5/ Also, /var/log/syslog contains the following outputs every few seconds: - [1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped - [1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped - [1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped - [1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped + [1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped + [1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped + [1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped + [1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped So, the problem seems to be in a "timestampped" TX packet; the driver for some reason (to be yet understood) get an unexpected value from a register and then, it that same function, reschedule itself to try again this register read, read gets a bad value again, and so on infinitely. This is showing in the system as the 100% CPU usage kthreads; the message "The device supports only a single outstanding packet to timestamp, this packet will not be timestamped" happens because the driver can only timestamp a single TX packet at a time, and given it's stuck trying, it cannot accept another packet in this "queue". The infinite loop appears to be: - static void bnx2x_ptp_task(struct work_struct *work) - { - struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task); - int port = BP_PORT(bp); - u32 val_seq; - u64 timestamp, ns; - struct skb_shared_hwtstamps shhwtstamps; + static void bnx2x_ptp_task(struct work_struct *work) + { + struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task); + int port = BP_PORT(bp); + u32 val_seq; + u64 timestamp, ns; + struct skb_shared_hwtstamps shhwtstamps; - /* Read Tx timestamp registers */ - val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID : - NIG_REG_P0_TLLH_PTP_BUF_SEQID); - if (val_seq & 0x10000) { - [...] - } else { - DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n"); - /* Reschedule to keep checking for a valid timestamp value */ - schedule_work(&bp->ptp_task); - } + /* Read Tx timestamp registers */ + val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID : + NIG_REG_P0_TLLH_PTP_BUF_SEQID); + if (val_seq & 0x10000) { + [...] + } else { + DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n"); + /* Reschedule to keep checking for a valid timestamp value */ + schedule_work(&bp->ptp_task); + } It appears that val_seq & 0x10000 is never true, so the task constantly reschedules itself immediately. Instrumenting the function shows that it is being called in excess of 100,000 times per second. The REG_RD call does appear to be expensive (as it's a register read from the device) and shows high in the perf report, but that by itself doesn't appear to be the root cause (i.e., it's not hanging forever in the REG_RD). The cause appears to be that the driver is not prepared to deal with the PTP request never being completed by the hardware. It's unclear why it isn't completing, but regardless, the driver should not loop forever here. - - - Additional info: - - - ubuntu@infra-1:~$ uname -a - Linux infra-1 4.15.0-50-generic #54-Ubuntu SMP Mon May 6 18:46:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Lin - - - ubuntu@infra-1:~$ lspci | grep Broadcom - 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10) - 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10) - 01:00.2 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10) - 01:00.3 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10) - - - ubuntu@infra-1:~$ lspci -n | grep 01:00 - 01:00.0 0200: 14e4:168a (rev 10) - 01:00.1 0200: 14e4:168a (rev 10) - 01:00.2 0200: 14e4:168a (rev 10) - 01:00.3 0200: 14e4:168a (rev 10) - - - ubuntu@infra-1:~/deploy$ sudo lshw -c network - *-network:0 - description: Ethernet interface - product: NetXtreme II BCM57800 1/10 Gigabit Ethernet - vendor: Broadcom Inc. and subsidiaries - physical id: 0 - bus info: pci@0000:01:00.0 - logical name: eno1 - version: 10 - serial: 42:39:92:e0:66:b6 - size: 10Gbit/s - capacity: 10Gbit/s - width: 64 bits - clock: 33MHz - capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical tp 100bt 100bt-fd 1000bt-fd 10000bt-fd autonegotiation - configuration: autonegotiation=on broadcast=yes driver=bnx2x driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 phy 1.45 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s - resources: irq:79 memory:95000000-957fffff memory:95800000-95ffffff memory:96030000-9603ffff memory:91a00000-91a7ffff - *-network:1 - description: Ethernet interface - product: NetXtreme II BCM57800 1/10 Gigabit Ethernet - vendor: Broadcom Inc. and subsidiaries - physical id: 0.1 - bus info: pci@0000:01:00.1 - logical name: eno2 - version: 10 - serial: 42:39:92:e0:66:b6 - size: 10Gbit/s - capacity: 10Gbit/s - width: 64 bits - clock: 33MHz - capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical tp 100bt 100bt-fd 1000bt-fd 10000bt-fd autonegotiation - configuration: autonegotiation=on broadcast=yes driver=bnx2x driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 phy 1.45 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s - resources: irq:90 memory:94000000-947fffff memory:94800000-94ffffff memory:96020000-9602ffff memory:91a80000-91afffff - *-network:2 - description: Ethernet interface - product: NetXtreme II BCM57800 1/10 Gigabit Ethernet - vendor: Broadcom Inc. and subsidiaries - physical id: 0.2 - bus info: pci@0000:01:00.2 - logical name: eno3 - version: 10 - serial: 52:f2:aa:63:a5:3c - size: 1Gbit/s - capacity: 1Gbit/s - width: 64 bits - clock: 33MHz - capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation - configuration: autonegotiation=on broadcast=yes driver=bnx2x driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=1Gbit/s - resources: irq:90 memory:93000000-937fffff memory:93800000-93ffffff memory:96010000-9601ffff memory:91b00000-91b7ffff - *-network:3 - description: Ethernet interface - product: NetXtreme II BCM57800 1/10 Gigabit Ethernet - vendor: Broadcom Inc. and subsidiaries - physical id: 0.3 - bus info: pci@0000:01:00.3 - logical name: eno4 - version: 10 - serial: 52:f2:aa:63:a5:3c - size: 1Gbit/s - capacity: 1Gbit/s - width: 64 bits - clock: 33MHz - capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation - configuration: autonegotiation=on broadcast=yes driver=bnx2x driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=1Gbit/s - resources: irq:111 memory:92000000-927fffff memory:92800000-92ffffff memory:96000000-9600ffff memory:91b80000-91bfffff - *-network:0 - description: Ethernet interface - physical id: 3 - logical name: bond1.1166 - serial: 42:39:92:e0:66:b6 - capabilities: ethernet physical - configuration: autonegotiation=off broadcast=yes driver=802.1Q VLAN Support driverversion=1.8 duplex=full firmware=N/A link=yes multicast=yes - *-network:1 - description: Ethernet interface - physical id: 4 - logical name: bond1 - serial: 42:39:92:e0:66:b6 - capabilities: ethernet physical - configuration: autonegotiation=off broadcast=yes driver=bonding driverversion=3.7.1 duplex=full firmware=2 link=yes master=yes multicast=yes - *-network:2 - description: Ethernet interface - physical id: 5 - logical name: broam - serial: 36:76:ae:d3:1d:3b - capabilities: ethernet physical - configuration: broadcast=yes driver=bridge driverversion=2.3 firmware=N/A ip=10.246.65.10 link=yes multicast=yes - *-network:3 - description: Ethernet interface - physical id: 6 - logical name: brinternal - serial: ce:27:22:0d:8b:d1 - capabilities: ethernet physical - configuration: broadcast=yes driver=bridge driverversion=2.3 firmware=N/A ip=10.246.66.10 link=yes multicast=yes - *-network:4 - description: Ethernet interface - physical id: 7 - logical name: bond1.1171 - serial: 42:39:92:e0:66:b6 - capabilities: ethernet physical - configuration: autonegotiation=off broadcast=yes driver=802.1Q VLAN Support driverversion=1.8 duplex=full firmware=N/A link=yes multicast=yes - *-network:5 - description: Ethernet interface - physical id: 8 - logical name: bond0 - serial: 52:f2:aa:63:a5:3c - capabilities: ethernet physical - configuration: autonegotiation=off broadcast=yes driver=bonding driverversion=3.7.1 duplex=full firmware=2 link=yes master=yes multicast=yes - *-network:6 - description: Ethernet interface - physical id: 9 - logical name: brexternal - serial: 5e:e0:5c:1f:da:01 - capabilities: ethernet physical - configuration: broadcast=yes driver=bridge driverversion=2.3 firmware=N/A ip=10.246.71.10 link=yes multicast=yes - - - ubuntu@infra-1:~$ modinfo bnx2x - filename: /lib/modules/4.15.0-50-generic/kernel/drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko - firmware: bnx2x/bnx2x-e2-7.13.1.0.fw - firmware: bnx2x/bnx2x-e1h-7.13.1.0.fw - firmware: bnx2x/bnx2x-e1-7.13.1.0.fw - version: 1.712.30-0 - license: GPL - description: QLogic BCM57710/57711/57711E/57712/57712_MF/57800/57800_MF/57810/57810_MF/57840/57840_MF Driver - author: Eliezer Tamir - srcversion: 5338D57FE057310DCD66774 - alias: pci:v000014E4d0000163Fsv*sd*bc*sc*i* - alias: pci:v000014E4d0000163Esv*sd*bc*sc*i* - alias: pci:v000014E4d0000163Dsv*sd*bc*sc*i* - alias: pci:v00001077d000016ADsv*sd*bc*sc*i* - alias: pci:v000014E4d000016ADsv*sd*bc*sc*i* - alias: pci:v00001077d000016A4sv*sd*bc*sc*i* - alias: pci:v000014E4d000016A4sv*sd*bc*sc*i* - alias: pci:v000014E4d000016ABsv*sd*bc*sc*i* - alias: pci:v000014E4d000016AFsv*sd*bc*sc*i* - alias: pci:v000014E4d000016A2sv*sd*bc*sc*i* - alias: pci:v00001077d000016A1sv*sd*bc*sc*i* - alias: pci:v000014E4d000016A1sv*sd*bc*sc*i* - alias: pci:v000014E4d0000168Dsv*sd*bc*sc*i* - alias: pci:v000014E4d000016AEsv*sd*bc*sc*i* - alias: pci:v000014E4d0000168Esv*sd*bc*sc*i* - alias: pci:v000014E4d000016A9sv*sd*bc*sc*i* - alias: pci:v000014E4d000016A5sv*sd*bc*sc*i* - alias: pci:v000014E4d0000168Asv*sd*bc*sc*i* - alias: pci:v000014E4d0000166Fsv*sd*bc*sc*i* - alias: pci:v000014E4d00001663sv*sd*bc*sc*i* - alias: pci:v000014E4d00001662sv*sd*bc*sc*i* - alias: pci:v000014E4d00001650sv*sd*bc*sc*i* - alias: pci:v000014E4d0000164Fsv*sd*bc*sc*i* - alias: pci:v000014E4d0000164Esv*sd*bc*sc*i* - depends: mdio,libcrc32c,ptp - retpoline: Y - intree: Y - name: bnx2x - vermagic: 4.15.0-50-generic SMP mod_unload - signat: PKCS#7 - signer: - sig_key: - sig_hashalgo: md4 - parm: num_queues: Set number of queues (default is as a number of CPUs) (int) - parm: disable_tpa: Disable the TPA (LRO) feature (int) - parm: int_mode: Force interrupt mode other than MSI-X (1 INT#x; 2 MSI) (int) - parm: dropless_fc: Pause on exhausted host ring (int) - parm: mrrs: Force Max Read Req Size (0..3) (for debug) (int) - parm: debug: Default debug msglevel (int) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1832082 Title: bnx2x driver causes 100% CPU load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs