Paweł ran some more XDP tests yesterday and from it found a couple of
issues. One is a panic in the mlx5 driver unloading the bpf program
(mlx5e_xdp_xmit); he will send a send a separate email for that problem.

The problem I wanted to discuss here is statistics for XDP context. The
short of it is that we need consistency in the counters across NIC
drivers and virtual devices. Right now stats are specific to a driver
with no clear accounting for the packets and bytes handled in XDP.

For example virtio has some stats as device private data extracted via
ethtool:
$ ethtool -S eth2 | grep xdp
    ...
     rx_queue_3_xdp_packets: 5291
     rx_queue_3_xdp_tx: 0
     rx_queue_3_xdp_redirects: 5163
     rx_queue_3_xdp_drops: 0
    ...
     tx_queue_3_xdp_tx: 5163
     tx_queue_3_xdp_tx_drops: 0

And the standard counters appear to track bytes and packets for Rx, but
not Tx if the packet is forwarded in XDP.

Similarly, mlx5 has some counters (thanks to Jesper and Toke for helping
out here):

$ ethtool -S mlx5p1 | grep xdp
     rx_xdp_drop: 86468350180
     rx_xdp_redirect: 18860584
     rx_xdp_tx_xmit: 0
     rx_xdp_tx_full: 0
     rx_xdp_tx_err: 0
     rx_xdp_tx_cqe: 0
     tx_xdp_xmit: 0
     tx_xdp_full: 0
     tx_xdp_err: 0
     tx_xdp_cqes: 0
    ...
     rx3_xdp_drop: 86468350180
     rx3_xdp_redirect: 18860556
     rx3_xdp_tx_xmit: 0
     rx3_xdp_tx_full: 0
     rx3_xdp_tx_err: 0
     rx3_xdp_tx_cqes: 0
    ...
     tx0_xdp_xmit: 0
     tx0_xdp_full: 0
     tx0_xdp_err: 0
     tx0_xdp_cqes: 0
    ...

And no accounting in standard stats for packets handled in XDP.

And then if I understand Jesper's data correctly, the i40e driver does
not have device specific data:

$ ethtool -S i40e1  | grep xdp
[NOTHING]


But rather bumps the standard counters:

sudo ./xdp_rxq_info --dev i40e1 --action XDP_DROP

Running XDP on dev:i40e1 (ifindex:3) action:XDP_DROP options:no_touch
XDP stats       CPU     pps         issue-pps
XDP-RX CPU      1       36,156,872  0
XDP-RX CPU      total   36,156,872

RXQ stats       RXQ:CPU pps         issue-pps
rx_queue_index    1:1   36,156,878  0
rx_queue_index    1:sum 36,156,878


$ ethtool_stats.pl --dev i40e1

Show adapter(s) (i40e1) statistics (ONLY that changed!)
Ethtool(i40e1   ) stat:   2711292859 (  2,711,292,859) <= port.rx_bytes /sec
Ethtool(i40e1   ) stat:      6274204 (      6,274,204) <=
port.rx_dropped /sec
Ethtool(i40e1   ) stat:     42363867 (     42,363,867) <=
port.rx_size_64 /sec
Ethtool(i40e1   ) stat:     42363950 (     42,363,950) <=
port.rx_unicast /sec
Ethtool(i40e1   ) stat:   2165051990 (  2,165,051,990) <= rx-1.bytes /sec
Ethtool(i40e1   ) stat:     36084200 (     36,084,200) <= rx-1.packets /sec
Ethtool(i40e1   ) stat:         5385 (          5,385) <= rx_dropped /sec
Ethtool(i40e1   ) stat:     36089727 (     36,089,727) <= rx_unicast /sec


We really need consistency in the counters and at a minimum, users
should be able to track packet and byte counters for both Rx and Tx
including XDP.

It seems to me the Rx and Tx packet, byte and dropped counters returned
for the standard device stats (/proc/net/dev, ip -s li show, ...) should
include all packets managed by the driver regardless of whether they are
forwarded / dropped in XDP or go up the Linux stack. This also aligns
with mlxsw and the stats it shows which are packets handled by the hardware.

>From there the private stats can include XDP specifics as desired --
like the drops and redirects but that those should be add-ons and even
here some consistency makes life easier for users.

The same standards should be also be applied to virtual devices built on
top of the ports -- e.g,  vlans. I have an API now that allows bumping
stats for vlan devices.

Keeping the basic xdp packets in the standard counters allows Paweł, for
example, to continue to monitor /proc/net/dev.

Can we get agreement on this? And from there, get updates to the mlx5
and virtio drivers?

David

Reply via email to