Guosmilesmile opened a new pull request, #14127:
URL: https://github.com/apache/iceberg/pull/14127

   This PR is split into two parts to support preserving lineage information in 
Flink RewriteDataFiles. It only supports RewriteDataFiles for streaming 
compaction.
   
   1. Adds readers in Flink for `_row_id` and `_last_updated_sequence_number`.
   
https://github.com/apache/iceberg/commit/8adfe2ca77e521373fd912e637e8a76911f7a772
   This change mainly aligns with/references 
https://github.com/apache/iceberg/pull/12836
   
   2. When RewriteDataFiles executes rewrite tasks for a PlannedGroup, if the 
table is detected to support RowLineage, it rewrites the schema to add `ROW_ID` 
and `LAST_UPDATED_SEQUENCE_NUMBER`. It then reads the newly added `ROW_ID` and 
`LAST_UPDATED_SEQUENCE_NUMBER` fields and writes the lineage information into 
the merged DataFiles.
   
https://github.com/apache/iceberg/commit/13ef2738d46e09876fb114f9583b985ec758fbde


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to