Aleksandr Iushmanov created FLINK-39394:
-------------------------------------------

             Summary: Job overview metrics (business/backpressure/data skew) 
are showing N/A when some nodes are finished
                 Key: FLINK-39394
                 URL: https://issues.apache.org/jira/browse/FLINK-39394
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Web Frontend
            Reporter: Aleksandr Iushmanov


When a streaming job has a mix of RUNNING and FINISHED vertices (e.g., a 
STATEMENT SET with bounded and unbounded sources), the job overview page in the 
Flink Web UI shows "N/A" for backpressure, busyness, and
  data skew metrics on all vertices — including the ones that are still running.

  Root cause:

  In job-overview.component.ts, mergeWithBackPressureAndSkew() uses forkJoin to 
load subtask metrics for every vertex. For a FINISHED vertex, the REST endpoint 
/jobs/\{jid}/vertices/\{vid}/subtasks/metrics returns an
   empty array. loadMetricsWithAllAggregates() maps this to an empty object {}, 
and the code then accesses result.backPressuredTimeMsPerSecond.max — which 
throws a TypeError because the key is undefined. Since
  forkJoin fails atomically, the outer catchError discards metrics for all 
vertices, not just the finished one. The same pattern exists in 
mergeWithWatermarks().

  Fix:

  1. Guard against missing metric keys before accessing .max / .skew
  2. Add per-node catchError inside the forkJoin so a single vertex failure 
does not affect other vertices

  How to reproduce:

  Run a streaming job where one vertex is bounded (e.g., EXECUTE STATEMENT SET 
with one INSERT from VALUES and one from an unbounded source). The bounded 
source vertex and its downstream sink chain will transition
  to FINISHED. Open the job overview — all vertices will show "N/A" for 
backpressure, busyness, and data skew.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to