The GitHub Actions job "Required Checks" on 
texera.git/gh-readonly-queue/main/pr-5629-3dab771a2fe3ea5bf97c4c69cfbd761f9cd01e54
 has failed.
Run started by GitHub user aicam (triggered by aicam).

Head commit for run:
2cdb1fe20c2fef594e0379e31b349fb4f2899475 / ali risheh <[email protected]>
fix(access-control-service): include port in computing unit pod URI and use 
Envoy Gateway for distributed CUs (#5629)

### What changes were proposed in this PR?

Make the in-cluster address of a computing unit come from a single
source of truth — the URI recorded when its pod is created — and ensure
that URI is complete (includes the port). This lets the gateway route a
user to a computing unit located **anywhere it can reach** (in the local
cluster, another cluster, or an external host), instead of being limited
to a reconstructed in-cluster address. See #5630.

Two related changes:

**1. Include the port in the generated pod URI**
(`computing-unit-managing-service`)

`KubernetesClient.generatePodURI` builds the address stored as the
computing unit's `uri` (via `setUri` in `ComputingUnitManagingResource`)
and returned to clients as `nodeAddresses`. The pod's container listens
on `KubernetesConfig.computeUnitPortNumber` (declared with
`withContainerPort(...)` in the same file), but the generated URI
omitted the port, so the persisted address was not directly connectable.
The port is now appended:

```scala
s"...svc.cluster.local:${KubernetesConfig.computeUnitPortNumber}"
```

**2. Route using the recorded URI** (`access-control-service`)

`AccessControlResource` rebuilt the computing unit's address from
`KubernetesConfig` on every authorization request, duplicating the
construction logic in `generatePodURI` and pinning every CU to the local
cluster. It now reads the URI recorded for the unit and returns it as
the `Host` for the gateway to route to. If no URI has been recorded, the
unit is not routable and the request is **refused with `403`** (no
in-cluster fallback, per review).

### Routing flow

The access-control service is the gateway's external authorizer; the
`Host` it returns is the upstream Envoy forwards the (upgraded)
connection to. Because that host comes from the unit's recorded URI, the
same gateway can reach computing units in different locations:

```mermaid
flowchart LR
    FE["Frontend<br/>(/wsapi?cuid=N)"] --> GW["Envoy Gateway"]
    GW -. "ext-auth: authorize + get Host" .-> ACS["access-control-service"]
    ACS -- "read recorded uri for CU N" --> DB[("workflow_computing_unit")]
    ACS -- "Host = recorded uri<br/>(or 403 if none)" --> GW
    GW == "dynamic forward proxy<br/>to returned Host" ==> R{Where the CU lives}
    R --> CU1["In-cluster CU pod<br/>computing-unit-N...svc.cluster.local:port"]
    R --> CU2["CU in another cluster"]
    R --> CU3["External / remote CU host:port"]
```

### Any related issues, documentation, discussions?

- Closes #5630.
- Builds on the Envoy Gateway / ext-auth routing introduced in #4191
(unified Envoy Gateway) and #3598 (access-control-service as the
ext-auth service for computing-unit traffic).

### How was this PR tested?
On live deployment.
<img width="1835" height="960" alt="Screenshot from 2026-06-13 13-31-00"
src="https://github.com/user-attachments/assets/d56a48f9-b99d-4d36-827a-0a4ce54995fd";
/>


### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <[email protected]>

Report URL: https://github.com/apache/texera/actions/runs/27579201394

With regards,
GitHub Actions via GitBox

Reply via email to