Aleksandr Iushmanov created FLINK-39394:
-------------------------------------------
Summary: Job overview metrics (business/backpressure/data skew)
are showing N/A when some nodes are finished
Key: FLINK-39394
URL: https://issues.apache.org/jira/browse/FLINK-39394
Project: Flink
Issue Type: Bug
Components: Runtime / Web Frontend
Reporter: Aleksandr Iushmanov
When a streaming job has a mix of RUNNING and FINISHED vertices (e.g., a
STATEMENT SET with bounded and unbounded sources), the job overview page in the
Flink Web UI shows "N/A" for backpressure, busyness, and
data skew metrics on all vertices — including the ones that are still running.
Root cause:
In job-overview.component.ts, mergeWithBackPressureAndSkew() uses forkJoin to
load subtask metrics for every vertex. For a FINISHED vertex, the REST endpoint
/jobs/\{jid}/vertices/\{vid}/subtasks/metrics returns an
empty array. loadMetricsWithAllAggregates() maps this to an empty object {},
and the code then accesses result.backPressuredTimeMsPerSecond.max — which
throws a TypeError because the key is undefined. Since
forkJoin fails atomically, the outer catchError discards metrics for all
vertices, not just the finished one. The same pattern exists in
mergeWithWatermarks().
Fix:
1. Guard against missing metric keys before accessing .max / .skew
2. Add per-node catchError inside the forkJoin so a single vertex failure
does not affect other vertices
How to reproduce:
Run a streaming job where one vertex is bounded (e.g., EXECUTE STATEMENT SET
with one INSERT from VALUES and one from an unbounded source). The bounded
source vertex and its downstream sink chain will transition
to FINISHED. Open the job overview — all vertices will show "N/A" for
backpressure, busyness, and data skew.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)