mxm opened a new pull request, #16831:
URL: https://github.com/apache/iceberg/pull/16831

   This is the first of six commits factored out from #15996, implementing the 
ConvertEqualityDeletes maintenance task. The task converts equality-delete 
files written to a staging branch into deletion vectors on a target branch, 
running inside a Flink job: Planner, Reader, PK index, DV writer, Committer. 
This PR adds only the shared data model and key serialization those operators 
exchange. The operators and public API follow in later PRs which depend on this 
one.
   
   The records are the messages passed along the graph:
   
     - ReadCommand: Planner to Reader. Wraps a ContentScanTask (a native 
FileScanTask, the FlinkAddedRowsScanTask wrapper for added DataFiles, or an 
EqualityDeleteFileScanTask) plus sequence numbers and a staging flag.
     - IndexCommand: Reader to PK index, keyed by primary key. ADD_DATA_ROW and 
ADD_STAGING_DATA_ROW index rows; RESOLVE_DELETE resolve deletes against the 
index; CLEAR_INDEX (broadcast) evicts stale keys after an external main commit.
     - DVPosition: PK index to DV writer, keyed by data file path. Specifies a 
row to mark it deleted and carries the data file's spec id and encoded 
partition so the writer needs no manifest scan.
     - DVWriteResult: DV writer to Committer, the DVs written or an abort.
     - EqualityConvertPlan: Plan for DV writer and Committer, per-cycle 
metadata.
   
   StructLikeSerializer encodes equality keys and partition tuples via 
Conversions.toByteBuffer. serializeKey prefixes each key with the partition 
spec id and the equality field ids, so rows under different specs or different 
equality-field sets never collide.
   
   We try to use Java records so Flink's record serializer handles the 
keyed-stream types (IndexCommand, DVPosition, SerializedEqualityValues) without 
falling back to Kryo (TestFlinkPojoTypes asserts this).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to