On Thu, 5 Feb 2026 15:02:23 +0000
Bruce Richardson <[email protected]> wrote:
> TL;DR
> ------
>
> For a quick demo, apply patches, run e.g. testpmd and then in a separate
> terminal run:
>
> ./usertools/dpdk-telemetry-watcher.py -d1T eth.tx
>
> Output, updated once per second, will be traffic rate per port e.g.:
>
> Connected to application: "dpdk-testpmd"
> Time /ethdev/stats,0.opackets /ethdev/stats,1.opackets Total
> 16:29:12 5,213,119 5,214,304 10,427,423
>
>
> Fuller details
> --------------
>
> While we have the dpdk-telemetry.py CLI app for interactive querying of
> telemetry on the commandline, and a telemetry exporter script for
> sending telemetry to external tools for real-time monitoring, we don't
> have an app that can print real-time stats for DPDK apps on the
> terminal. This patchset adds such a script, developed with the help of
> Github copilot to fill a need that I found in my testing. Submitting it
> here in the hopes that others find it of use.
>
> The script acts as a wrapper around the existing dpdk-telemetry.py
> script, and pipes the commands to that script and reads the responses,
> querying it once per second. It takes a number of flag parameters, such
> as the ones above:
> - "-d" for delta values, i.e. PPS rather than total packets
> - "-1" for single-line output, i.e. no scrolling up the screen
> - "-T" to display a total column
>
> Other flag parameters can be seen by looking at the help output.
>
> Beyond the flags, the script also takes a number of positional
> parameters, which refer to specific stats to display. These stats must
> be numeric values, and should take the form of the telemetry command to
> send, followed by a "." and the stat within the result which is to be
> tracked. As above, a stat would be e.g. "/ethdev/stats,0.opackets",
> where we send "/ethdev/stats,0" to telemetry and extract the "opackets"
> part of the result.
>
> However, specifying individual stats can be awkward, so some shortcuts
> are provided too for the common case of monitoring ethernet ports. Any
> positional arg starting with "eth" will be replaced by the set of
> equivalent values for each port, e.g. "eth.imissed" will track the
> imissed value on all ports in use in the app. The ipackets and opackets
> values, as common metrics, are also available as shortened values as
> just "rx" and "tx", so in the example above, "eth.tx" means to track the
> opackets stat for every ethdev port.
>
> Finally, the script also has reconnection support so you can leave it
> running while you start and stop your application in another terminal.
> The watcher will try and reconnect to a running instance every second.
>
> v4:
> - Updated docs following AI review
> - Converted one missed f-string to regular string
>
> v3:
> Updated following AI review
> - removed unnecessary f-string
> - added documnentation in guides/tools
> - added release note entry
>
> v2:
> - improve reconnection handling, eliminating some crashes seen in testing.
>
> Bruce Richardson (7):
> usertools: add new script to monitor telemetry on terminal
> usertools/telemetry-watcher: add displaying stats
> usertools/telemetry-watcher: add delta and timeout opts
> usertools/telemetry-watcher: add total and one-line opts
> usertools/telemetry-watcher: add thousands separator
> usertools/telemetry-watcher: add eth name shortcuts
> usertools/telemetry-watcher: support reconnection
>
> doc/guides/rel_notes/release_26_03.rst | 7 +
> doc/guides/tools/index.rst | 1 +
> doc/guides/tools/telemetrywatcher.rst | 184 +++++++++++
> usertools/dpdk-telemetry-watcher.py | 435 +++++++++++++++++++++++++
> usertools/meson.build | 1 +
> 5 files changed, 628 insertions(+)
> create mode 100644 doc/guides/tools/telemetrywatcher.rst
> create mode 100755 usertools/dpdk-telemetry-watcher.py
>
> --
> 2.51.0
>
This didn't get merged so will need to be rebased.
You may want to address these AI review comments.
Review of [PATCH v4 1-7/7] usertools: dpdk-telemetry-watcher
============================================================
Nice tool — having a continuous monitoring wrapper around
dpdk-telemetry.py is a practical addition. Patches 1-6 are
clean and well-structured. Patch 7 (reconnection support)
has several correctness issues described below.
Patch 7/7: usertools/telemetry-watcher: support reconnection
------------------------------------------------------------
Error: monitor_stats `continue` on failed query causes
IndexError on next delta iteration.
When `query_telemetry` returns (process, None), the code
does `continue`, skipping `current_values.append(...)`.
At the end of the loop body, `prev_values = current_values`
stores a shorter list. On the next iteration,
`prev_values[i]` raises IndexError for the missing indices.
Suggested fix: when data is None, append prev_values[i]
(or 0) as the current_value so the list length is preserved:
process, data = query_telemetry(process, command)
if not data:
current_values.append(prev_values[i] if i < len(prev_values) else 0)
row += "N/A".rjust(25)
continue
Error: BrokenPipeError not handled in query_telemetry.
When the DPDK application dies, the subprocess's stdin pipe
breaks. The initial `process.stdin.write()` / `.flush()`
before the reconnection loop will raise BrokenPipeError
instead of returning an empty readline(). The reconnection
logic never triggers.
Suggested fix: wrap the write+flush+readline in a
try/except (BrokenPipeError, OSError) and treat it the
same as an empty response — fall into the reconnection
loop.
Warning: old subprocess not cleaned up on reconnection.
In query_telemetry, when readline() returns empty and
reconnection begins, the old process object is replaced
without calling process.terminate() or process.wait().
The dead subprocess accumulates as a zombie. Similarly,
create_telemetry_process now calls print_connected_app
which can fail and return None, leaking the just-created
Popen object.
Suggested fix: add a small helper to clean up a process
(terminate, close pipes, wait), and call it before setting
process = None in the reconnection path. In
create_telemetry_process, if print_connected_app fails,
terminate the process before returning None.
Warning: expand_shortcuts and validate_stats lose the
reconnected process handle.
Both functions update their local `process` variable via
query_telemetry's return value, but neither returns the
(possibly new) process to the caller. If a reconnection
happens during shortcut expansion or validation, the
caller in monitor_stats still holds the old dead process
object.
Suggested fix: have expand_shortcuts and validate_stats
return the process alongside their current return values,
or restructure so monitor_stats passes process by
reference (e.g., as a mutable container).
Info: recursive call between create_telemetry_process and
print_connected_app.
create_telemetry_process calls print_connected_app, which
calls query_telemetry, which on disconnect calls
create_telemetry_process again. This indirect recursion
works in practice (Python has a high default recursion
limit and the retry loop in query_telemetry breaks the
chain), but it is fragile and hard to follow. Consider
separating the "connect" step from the "verify connection"
step to avoid the recursive dependency.
Reviewed-by: Stephen Hemminger <[email protected]>