On Thu,  5 Feb 2026 15:02:23 +0000
Bruce Richardson <[email protected]> wrote:

> TL;DR
> ------
> 
> For a  quick demo, apply patches, run e.g. testpmd and then in a separate
> terminal run:
> 
>   ./usertools/dpdk-telemetry-watcher.py -d1T eth.tx
> 
> Output, updated once per second, will be traffic rate per port e.g.:
> 
> Connected to application: "dpdk-testpmd"
> Time       /ethdev/stats,0.opackets /ethdev/stats,1.opackets        Total
> 16:29:12                  5,213,119                5,214,304   10,427,423
> 
> 
> Fuller details
> --------------
> 
> While we have the dpdk-telemetry.py CLI app for interactive querying of
> telemetry on the commandline, and a telemetry exporter script for
> sending telemetry to external tools for real-time monitoring, we don't
> have an app that can print real-time stats for DPDK apps on the
> terminal. This patchset adds such a script, developed with the help of
> Github copilot to fill a need that I found in my testing. Submitting it
> here in the hopes that others find it of use.
> 
> The script acts as a wrapper around the existing dpdk-telemetry.py
> script, and pipes the commands to that script and reads the responses,
> querying it once per second. It takes a number of flag parameters, such
> as the ones above:
>  - "-d" for delta values, i.e. PPS rather than total packets
>  - "-1" for single-line output, i.e. no scrolling up the screen
>  - "-T" to display a total column
> 
> Other flag parameters can be seen by looking at the help output.
> 
> Beyond the flags, the script also takes a number of positional
> parameters, which refer to specific stats to display. These stats must
> be numeric values, and should take the form of the telemetry command to
> send, followed by a "." and the stat within the result which is to be
> tracked. As above, a stat would be e.g. "/ethdev/stats,0.opackets",
> where we send "/ethdev/stats,0" to telemetry and extract the "opackets"
> part of the result.
> 
> However, specifying individual stats can be awkward, so some shortcuts
> are provided too for the common case of monitoring ethernet ports. Any
> positional arg starting with "eth" will be replaced by the set of
> equivalent values for each port, e.g. "eth.imissed" will track the
> imissed value on all ports in use in the app. The ipackets and opackets
> values, as common metrics, are also available as shortened values as
> just "rx" and "tx", so in the example above, "eth.tx" means to track the
> opackets stat for every ethdev port.
> 
> Finally, the script also has reconnection support so you can leave it
> running while you start and stop your application in another terminal.
> The watcher will try and reconnect to a running instance every second.
> 
> v4:
> - Updated docs following AI review
> - Converted one missed f-string to regular string
> 
> v3:
> Updated following AI review
> - removed unnecessary f-string
> - added documnentation in guides/tools
> - added release note entry
> 
> v2:
> - improve reconnection handling, eliminating some crashes seen in testing.
> 
> Bruce Richardson (7):
>   usertools: add new script to monitor telemetry on terminal
>   usertools/telemetry-watcher: add displaying stats
>   usertools/telemetry-watcher: add delta and timeout opts
>   usertools/telemetry-watcher: add total and one-line opts
>   usertools/telemetry-watcher: add thousands separator
>   usertools/telemetry-watcher: add eth name shortcuts
>   usertools/telemetry-watcher: support reconnection
> 
>  doc/guides/rel_notes/release_26_03.rst |   7 +
>  doc/guides/tools/index.rst             |   1 +
>  doc/guides/tools/telemetrywatcher.rst  | 184 +++++++++++
>  usertools/dpdk-telemetry-watcher.py    | 435 +++++++++++++++++++++++++
>  usertools/meson.build                  |   1 +
>  5 files changed, 628 insertions(+)
>  create mode 100644 doc/guides/tools/telemetrywatcher.rst
>  create mode 100755 usertools/dpdk-telemetry-watcher.py
> 
> --
> 2.51.0
> 

This didn't get merged so will need to be rebased.
You may want to address these AI review comments.

Review of [PATCH v4 1-7/7] usertools: dpdk-telemetry-watcher
============================================================

Nice tool — having a continuous monitoring wrapper around
dpdk-telemetry.py is a practical addition. Patches 1-6 are
clean and well-structured. Patch 7 (reconnection support)
has several correctness issues described below.


Patch 7/7: usertools/telemetry-watcher: support reconnection
------------------------------------------------------------

Error: monitor_stats `continue` on failed query causes
  IndexError on next delta iteration.

  When `query_telemetry` returns (process, None), the code
  does `continue`, skipping `current_values.append(...)`.
  At the end of the loop body, `prev_values = current_values`
  stores a shorter list. On the next iteration,
  `prev_values[i]` raises IndexError for the missing indices.

  Suggested fix: when data is None, append prev_values[i]
  (or 0) as the current_value so the list length is preserved:

    process, data = query_telemetry(process, command)
    if not data:
        current_values.append(prev_values[i] if i < len(prev_values) else 0)
        row += "N/A".rjust(25)
        continue


Error: BrokenPipeError not handled in query_telemetry.

  When the DPDK application dies, the subprocess's stdin pipe
  breaks. The initial `process.stdin.write()` / `.flush()`
  before the reconnection loop will raise BrokenPipeError
  instead of returning an empty readline(). The reconnection
  logic never triggers.

  Suggested fix: wrap the write+flush+readline in a
  try/except (BrokenPipeError, OSError) and treat it the
  same as an empty response — fall into the reconnection
  loop.


Warning: old subprocess not cleaned up on reconnection.

  In query_telemetry, when readline() returns empty and
  reconnection begins, the old process object is replaced
  without calling process.terminate() or process.wait().
  The dead subprocess accumulates as a zombie. Similarly,
  create_telemetry_process now calls print_connected_app
  which can fail and return None, leaking the just-created
  Popen object.

  Suggested fix: add a small helper to clean up a process
  (terminate, close pipes, wait), and call it before setting
  process = None in the reconnection path. In
  create_telemetry_process, if print_connected_app fails,
  terminate the process before returning None.


Warning: expand_shortcuts and validate_stats lose the
  reconnected process handle.

  Both functions update their local `process` variable via
  query_telemetry's return value, but neither returns the
  (possibly new) process to the caller. If a reconnection
  happens during shortcut expansion or validation, the
  caller in monitor_stats still holds the old dead process
  object.

  Suggested fix: have expand_shortcuts and validate_stats
  return the process alongside their current return values,
  or restructure so monitor_stats passes process by
  reference (e.g., as a mutable container).


Info: recursive call between create_telemetry_process and
  print_connected_app.

  create_telemetry_process calls print_connected_app, which
  calls query_telemetry, which on disconnect calls
  create_telemetry_process again. This indirect recursion
  works in practice (Python has a high default recursion
  limit and the retry loop in query_telemetry breaks the
  chain), but it is fragile and hard to follow. Consider
  separating the "connect" step from the "verify connection"
  step to avoid the recursive dependency.


Reviewed-by: Stephen Hemminger <[email protected]>

Reply via email to