smaheshwar-pltr opened a new pull request, #3511: URL: https://github.com/apache/iceberg-python/pull/3511
# Rationale for this change Split out of #3364 at the reviewers' request (to @rambleraptor's and @kevinjqliu's point that the architecture refactor and the new feature are easier to review separately). This is PR 1 of 2 — the `IncrementalAppendScan` feature lands in a follow-up PR based on this branch. **This PR is a pure refactor with no behavioural change.** The diff looks large, but it is almost entirely code being moved, not changed. # Changes - Introduce `BaseScan(ABC)` as a superclass of `TableScan`, holding the snapshot-independent surface: `table_metadata`, `io`, `row_filter`, `selected_fields`, `case_sensitive`, `options`, `limit`, the chaining helpers (`select` / `filter` / `with_case_sensitive` / `update`), and the format-converter sinks (`to_pandas` / `to_polars` / `to_duckdb` / `to_ray`) built on an abstract `to_arrow()`. - `TableScan` keeps everything snapshot-specific — `snapshot_id`, `catalog`, `table_identifier`, `snapshot()`, the snapshot-aware `projection()`, `use_ref()`, and abstract `count()` — so its existing public surface is unchanged. - Extract `ManifestGroupPlanner` from `DataScan` (the `_build_*` evaluators, `_check_sequence_number`, and the manifest-entry / file-scan-task planning) and route all of `DataScan`'s local planning — `scan_plan_helper()` and `_plan_files_local()` — through it, so the partition / metrics / residual pipeline lives in exactly one place. # Back-compatibility - `to_arrow_batch_reader` stays concrete on `DataScan` and is **not** made abstract on `BaseScan`, so external `TableScan` subclasses that were valid before still instantiate. - `to_pandas` / `to_polars` become concrete defaults on `BaseScan`, and `to_duckdb` / `to_ray` move up to `BaseScan` too. This **loosens** `TableScan`'s abstract contract (they were abstract on `TableScan` before) without breaking existing subclasses. - All docstrings and explanatory comments are preserved. # Are these changes tested? The existing `DataScan` unit and integration tests exercise the refactored planning path unchanged. A regression test is added for the no-current-snapshot planning path, which now flows through `ManifestGroupPlanner`. # Are there any user-facing changes? No. Some previously-`DataScan`-only converters (`to_duckdb` / `to_ray`) are now inherited by `TableScan` as well, which is purely additive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
