SreeramGarlapati opened a new pull request, #2543:
URL: https://github.com/apache/iceberg-rust/pull/2543

   ### Summary
   
   - add `RewriteManifestsAction` exposed as `Transaction::rewrite_manifests()`
   - group live `DataFile` entries by partition tuple within the default 
partition spec, roll new manifests by `target_size_bytes`, and commit a 
`Replace` snapshot
   - preserve `sequence_number`, `file_sequence_number`, and (v3) 
`first_row_id` via `ManifestWriter::add_existing_file`
   - carry forward total-`*` summary keys; emit `manifests-created`, 
`manifests-replaced`, `manifests-kept`, `entries-processed`
   
   ### Why
   
   `apache/iceberg-rust` had no manifest-compaction primitive. Long-running 
streaming/append workloads accumulate small manifests, which inflates planning 
cost. Java ships `BaseRewriteManifests` for this; this PR is the rust analog at 
the transaction-primitive layer (per the architectural guidance in #1453).
   
   ### Scope
   
   - format versions: v1, v2, v3
   - knob set matches Java parity exactly: `target_size_bytes` (default 8 MiB, 
mirrors `commit.manifest.target-size-bytes`), plus inherited 
`snapshot_properties` / `commit_uuid` / `key_metadata` via builder
   - only the default partition spec is rewritten; manifests bound to other 
specs and DELETE manifests are kept verbatim
   - short-circuits to no-op when there's nothing to merge
   
   Out of scope (deferrable to follow-ups): `rewrite_if` predicate, 
`cluster_by`, custom `spec_id` / `staging_location`, `iceberg-datafusion` 
SQL-procedure layer.
   
   ### Tests
   
   - 6 inline unit tests (no-current-snapshot error, single-small-manifest 
no-op, multi-manifest merge preserves sequence numbers on v2, target-size rolls 
multiple manifests, v3 row-lineage preserved, summary + Replace operation)
   - `cargo test -p iceberg` (1302 passed, 0 failed)
   - `cargo clippy -p iceberg --all-targets -- -D warnings`
   - `cargo fmt --check`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to