laskoviymishka opened a new issue, #1215:
URL: https://github.com/apache/iceberg-go/issues/1215

   ## Goal
   
   Add `Transaction.DynamicPartitionOverwrite` — the iceberg-go equivalent of 
Java's `ReplacePartitions` (`BaseReplacePartitions`) and PyIceberg's 
`Transaction.dynamic_partition_overwrite`. Given an Arrow table, it detects the 
partitions present in the incoming data, atomically deletes the existing data 
in exactly those partitions, and appends the new data, leaving untouched 
partitions alone.
   
   ## Background
   
   A first attempt was made in #482, but it predates the partitioned-write and 
copy-on-write-overwrite machinery that has since landed, so most of that PR is 
now redundant:
   
   - Partitioned writes are native — `recordsToDataFiles` routes to the 
partitioned fanout / clustered writers.
   - Overwrite-by-filter already exists — `Transaction.Overwrite`, 
`performCopyOnWriteDeletion`, `mergeOverwrite`.
   - Transform-aware predicate projection exists — `Transform.Project` — which 
removes the identity-transform-only limitation of the original attempt.
   
   What remains is small and composable: derive a partition-matching predicate 
from the written data files, then drive the existing overwrite path. Landing it 
in reviewable slices rather than one drop.
   
   ## Scope (decomposable across PRs)
   
   - **Phase 1 — Partition-match predicate.** A transform-aware helper that 
turns a set of touched partition tuples plus the partition spec into a 
`BooleanExpression` selecting exactly those partitions. Standalone and 
unit-tested, no transaction wiring.
   - **Phase 2 — `DynamicPartitionOverwrite` API.** Compose the partitioned 
write, touched-partition collection via `DataFile.Partition()`, the Phase 1 
predicate, and the existing copy-on-write overwrite. Resolve the deletion 
mechanism for non-identity transforms. Keep the unpartitioned and empty-table 
guards.
   - **Phase 3 — Happy-path & interop tests.** Multi-partition, null partition, 
copy-on-write vs merge-on-read, and a Spark round-trip for cross-engine parity.
   - **Phase 4 (optional) — Docs & example/CLI exposure.**
   
   ## Parity references
   
   - Java: `org.apache.iceberg.BaseReplacePartitions`
   - PyIceberg: `Transaction.dynamic_partition_overwrite`
   
   Credit to @dttung2905 for the original implementation in #482.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to