CC: Willem and some driver folks for more input, context: extending
https://lore.kernel.org/all/[email protected]/
to cover virtio stats.
On Tue, 27 Feb 2024 16:03:00 +0800 Xuan Zhuo wrote:
> +static const struct virtnet_stat_desc virtnet_stats_rx_basic_desc[] = {
> + VIRTNET_STATS_DESC(rx, basic, packets),
> + VIRTNET_STATS_DESC(rx, basic, bytes),
Covered.
> + VIRTNET_STATS_DESC(rx, basic, notifications),
> + VIRTNET_STATS_DESC(rx, basic, interrupts),
I haven't seen HW devices count interrupts coming from a specific
queue (there's usually a lot more queues than IRQs these days),
let's keep these in ethtool -S for now, unless someone has a HW use
case.
> + VIRTNET_STATS_DESC(rx, basic, drops),
> + VIRTNET_STATS_DESC(rx, basic, drop_overruns),
These are important, but we need to make sure we have a good definition
for vendors to follow...
drops I'd define as "sum of all packets which came into the device, but
never left it, including but not limited to: packets dropped due to
lack of buffer space, processing errors, explicitly set policies and
packet filters."
Call it hw-rx-drops ?
overruns is a bit harder to precisely define. I was thinking of
something more broad, like: "packets dropped due to transient lack of
resources, such as buffer space, host descriptors etc."
For context why not just go with virtio spec definition of "no
descriptors" - for HW devices, what exact point in the pipeline drops
depends on how back pressure is configured/implemented, and fetching
descriptors is high latency, so differentiating between "PCIe is slow"
and "host didn't post descriptors" is hard in practice.
Call it hw-rx-drop-overruns ?
> +static const struct virtnet_stat_desc virtnet_stats_tx_basic_desc[] = {
> + VIRTNET_STATS_DESC(tx, basic, packets),
> + VIRTNET_STATS_DESC(tx, basic, bytes),
> +
> + VIRTNET_STATS_DESC(tx, basic, notifications),
> + VIRTNET_STATS_DESC(tx, basic, interrupts),
> +
> + VIRTNET_STATS_DESC(tx, basic, drops),
These 5 same as rx.
> + VIRTNET_STATS_DESC(tx, basic, drop_malformed),
These I'd call hw-tx-drop-errors, "packets dropped because they were
invalid or malformed"?
> +static const struct virtnet_stat_desc virtnet_stats_rx_csum_desc[] = {
> + VIRTNET_STATS_DESC(rx, csum, csum_valid),
I think in kernel parlance that would translate to CHECKSUM_UNNECESSARY?
So let's call it rx-csum-unnecessary ?
I'd skip the hw- prefix for this one, it doesn't matter to the user if
the HW or SW counted it.
> + VIRTNET_STATS_DESC(rx, csum, needs_csum),
Hm, I think this is a bit software/virt device specific, presumably
rx-csum-partial for the kernel, up to you whether to make it ethtool -S
or netlink.
> + VIRTNET_STATS_DESC(rx, csum, csum_none),
> + VIRTNET_STATS_DESC(rx, csum, csum_bad),
These two make sense as is in netlink, should be fairly commonly
reported by devices. Maybe add a note in "bad" that packets with
bad csum are not discarded, but still delivered to the stack.
> +static const struct virtnet_stat_desc virtnet_stats_tx_csum_desc[] = {
> + VIRTNET_STATS_DESC(tx, csum, needs_csum),
> + VIRTNET_STATS_DESC(tx, csum, csum_none),
tx- version of what names we pick for rx-, netlink seems appropriate.
> +static const struct virtnet_stat_desc virtnet_stats_rx_gso_desc[] = {
> + VIRTNET_STATS_DESC(rx, gso, gso_packets),
> + VIRTNET_STATS_DESC(rx, gso, gso_bytes),
I used the term "GSO" in conversations about Rx and it often confuses
people. Let's use "GRO", so hw-gro-packets, and hw-gro-bytes ?
Or maybe coalesce? "hw-rx-coalesce" ? That's quite a bit longer..
Ah, and please mention in the doc that these counters "do not cover LRO
i.e. any coalescing implementation which doesn't follow GRO rules".
> + VIRTNET_STATS_DESC(rx, gso, gso_packets_coalesced),
hw-gro-wire-packets ?
No strong preference on the naming, but I find that saying -wire
makes it 100% clear to everyone what the meaning is.
> + VIRTNET_STATS_DESC(rx, gso, gso_bytes_coalesced),
The documentation in the virtio spec seems to be identical
to the one for gso_packets, which gotta be unintentional?
I'm guessing this is hw-gro-wire-bytes? I.e. headers counted
multiple times?
> +static const struct virtnet_stat_desc virtnet_stats_tx_gso_desc[] = {
> + VIRTNET_STATS_DESC(tx, gso, gso_packets),
> + VIRTNET_STATS_DESC(tx, gso, gso_bytes),
> + VIRTNET_STATS_DESC(tx, gso, gso_segments),
> + VIRTNET_STATS_DESC(tx, gso, gso_segments_bytes),
these 4 make sense as mirror of the Rx
> + VIRTNET_STATS_DESC(tx, gso, gso_packets_noseg),
> + VIRTNET_STATS_DESC(tx, gso, gso_bytes_noseg),
Not sure what these are :) unless someone knows what it is and that
HW devices report it, let's keep them in ethtool -S ?
> +static const struct virtnet_stat_desc virtnet_stats_rx_speed_desc[] = {
> + VIRTNET_STATS_DESC(rx, speed, packets_allowance_exceeded),
hw-rx-drop-ratelimits ?
"Allowance exceeded" is a bit of a mouthful to me, perhaps others
disagree. The description from the virtio spec is quite good.
> + VIRTNET_STATS_DESC(rx, speed, bytes_allowance_exceeded),
No strong preference whether to expose this as a standard stat or
ethtool -S, we don't generally keep byte counters for drops, so
this would be special.
> +static const struct virtnet_stat_desc virtnet_stats_tx_speed_desc[] = {
> + VIRTNET_STATS_DESC(tx, speed, packets_allowance_exceeded),
> + VIRTNET_STATS_DESC(tx, speed, packets_allowance_exceeded),
same as rx