huan233usc opened a new pull request, #2656:
URL: https://github.com/apache/iceberg-rust/pull/2656

   ## Which issue does this PR close?
   
   Part of #1690 — client support for REST server-side scan planning.
   
   ## What changes are included in this PR?
   
   Adds a client implementation of the REST scan-planning protocol
   (`planTableScan` / `fetchPlanningResult` / `fetchScanTasks`). When a catalog
   advertises the planning endpoints, a table scan delegates planning to the
   server and consumes the returned `FileScanTask`s instead of reading manifests
   locally; otherwise it transparently falls back to native client-side 
planning.
   
   ## Design
   
   The whole feature hangs off a single seam — `TableScan::plan_files()` — so
   execution (`to_arrow`, the Arrow reader) is untouched and DataFusion needs no
   changes.
   
   ```
   Table::scan() ─► TableScan::plan_files()
      │ injected ScanPlanner present?
      ├─ no  ─► native manifest planning (unchanged)
      └─ yes ─► ScanPlanner::plan_table_scan(ScanPlanRequest)
                   │
           ┌───────┴─ endpoint negotiation: gate on the endpoints advertised
           │          by GET /v1/config (else FeatureUnsupported → native)
           ▼
      POST .../plan ──► COMPLETED ─────────────────────────────┐
                    └─► SUBMITTED ─► poll GET .../plan/{id}      │
                                     (exp backoff 1s→60s)  ──────┤ COMPLETED
      plan-tasks? ─► POST .../tasks (tokens may recurse) ────────┤
                                                                 ▼
      convert wire content-files ─► FileScanTask (public builders)
      + build a plan-scoped FileIO from vended `storage-credentials`
                                                                 ▼
      ServerScanPlan { tasks, file_io } ─► to_arrow() reads tasks through 
file_io
      (on Drop before completion: best-effort DELETE .../plan/{id})
   ```
   
   ### Components / flow
   
   1. **Injection seam** — a narrow `ScanPlanner` capability trait
      (`crates/iceberg/src/scan/planner.rs`). `Table`/`TableScanBuilder` carry 
an
      optional `Arc<dyn ScanPlanner>`; `plan_files()` delegates to it and falls
      back to native planning on `ErrorKind::FeatureUnsupported`. The core
      `Catalog` trait is untouched.
   2. **Endpoint negotiation** — `CatalogConfig` now parses the `endpoints` 
field
      of `GET /v1/config`; the scan-plan calls are gated by an 
`Endpoint::check`.
   3. **Wire DTOs** — request/response types for plan / fetch-planning-result /
      fetch-scan-tasks, plus a lean content-file shape (only the fields a
      `FileScanTask` needs).
   4. **State machine** — submit → poll-with-backoff → fan-out `fetchScanTasks`
      (plan-task tokens may produce more tasks), with a best-effort
      `DELETE .../plan/{id}` if the scan is dropped mid-flight.
   5. **Conversion** — wire content-files → `FileScanTask` via the public
      builders (no `DataFile` internals); the scan's own bound filter is used as
      the per-task row predicate, and pushed down as Iceberg expression JSON 
when
      losslessly encodable.
   6. **Credential vending** — `ScanPlanner::plan_table_scan` returns
      `ServerScanPlan { tasks, file_io }`; the planner builds a plan-scoped
      `FileIO` from the `storage-credentials` the server returns, and `to_arrow`
      reads data files through it.
   
   ### Alternative injection design
   
   The same feature with the planning capability placed on the core `Catalog`
   trait (`Catalog::plan_table_scan`, `Table` holding `Arc<dyn Catalog>`) 
instead
   of a narrow `ScanPlanner` trait is in
   https://github.com/huan233usc/iceberg-rust/pull/2 for comparison. This PR 
uses
   the narrow-trait design because it keeps the central `Catalog` trait minimal
   and avoids giving every `Table` a back-reference to the full catalog.
   
   ## Are these changes tested?
   
   Yes — unit tests for the wire DTOs, endpoint codec, and expression-JSON
   serialization, conversion tests, and end-to-end `mockito` tests covering the
   completed-inline, submitted-then-polled, and recursive plan-task fan-out 
paths.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to