mengw15 opened a new issue, #5363:
URL: https://github.com/apache/texera/issues/5363

   ### What happened?
   
   Execution results (`results`, `runtime_stats`, `console_logs` Iceberg 
tables) are written to a **global storage backend** (Iceberg catalog server + 
S3 / MinIO) that survives well beyond the lifetime of the Computing Unit (CU) 
that produced them. However, the result-read endpoints — the WebSocket and REST 
resources that the frontend talks to for the Result panel — are hosted by a 
Dropwizard app that lives **inside the CU pod**, and the Iceberg client that 
connects to the storage backend is currently only wired up there.
   
   When a user terminates the CU and later reopens the same workflow, the 
frontend has no live CU to talk to for results. The Result panel comes up empty 
**even though the underlying data is still intact** in the global storage.
   
   This is an asymmetry between data lifetime and access-path lifetime:
   
   ```
   Storage backend (global):  CU created ─── CU terminated ─── workflow reopened
                                                                 data still 
present ✓
   
   Retrieval path (in CU)  :  CU created ─── CU terminated ─── workflow reopened
                                             retrieval path gone ✗
   ```
   
   **Why it happens (architecture):**
   
   Texera deploys **two separate Dropwizard applications**:
   
   - **`TexeraWebApplication`** 
([amber/.../web/TexeraWebApplication.scala](https://github.com/apache/texera/blob/a820f6727/amber/src/main/scala/org/apache/texera/web/TexeraWebApplication.scala))
 — the brain-layer web server (dashboard, auth, workflow CRUD, admin, etc.). 
One per Texera deployment, persistent. **Does not import or use 
`IcebergCatalogInstance` / `IcebergDocument` / `DocumentFactory`** — the brain 
currently has no Iceberg client at all.
   - **`ComputingUnitMaster`** 
([amber/.../web/ComputingUnitMaster.scala](https://github.com/apache/texera/blob/a820f6727/amber/src/main/scala/org/apache/texera/web/ComputingUnitMaster.scala))
 — runs **inside each CU pod** (the container named `computing-unit-master` in 
[KubernetesClient.scala:138](https://github.com/apache/texera/blob/a820f6727/computing-unit-managing-service/src/main/scala/org/apache/texera/service/util/KubernetesClient.scala#L138)).
 Registers the result-read endpoints `WorkflowWebsocketResource` (line 116) and 
`WorkflowExecutionsResource` (line 182). Hosts `WorkflowExecutionService` / 
`ExecutionResultService` / `ExecutionStatsService` / `ExecutionConsoleService` 
/ `ResultExportService`, all of which obtain their Iceberg client via 
[`IcebergCatalogInstance`](https://github.com/apache/texera/blob/a820f6727/common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/IcebergCatalogInstance.scala)
 — a `Scala object` (per-JVM singleton) that **only gets
  instantiated in this JVM**.
   
   `ComputingUnitManagingResource @DELETE /{cuid}/terminate` 
([ComputingUnitManagingResource.scala:622-641](https://github.com/apache/texera/blob/a820f6727/computing-unit-managing-service/src/main/scala/org/apache/texera/service/resource/ComputingUnitManagingResource.scala#L622-L641))
 calls `KubernetesClient.deletePod(cuid)`, which tears down the entire 
`ComputingUnitMaster` JVM — taking the WebSocket endpoints, REST resources, and 
the `IcebergCatalogInstance` client with it. The storage backend on the other 
end (catalog server + S3) is untouched, but there is no longer any wired-up 
client able to talk to it on behalf of the user.
   
   Related: #4126 (Migrate to Result Service and MinIO for Execution Results), 
#5135 (Per-user Iceberg warehouse).
   
   ### How to reproduce?
   
   1. Create a workflow with an operator that produces a result table (e.g., 
Filter → sink).
   2. Execute on a CU; observe the rows in the Result panel.
   3. Terminate the CU (Dashboard → Computing Units → Terminate).
   4. Reopen the same workflow.
   5. The Result panel does not display the prior execution's results. The 
underlying data is still present in the global storage backend — verifiable via 
`s3 ls` — but the UI has no path to reach it without a live CU.
   
   ### Version
   
   1.1.0-incubating (Pre-release/Master)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to