Guosmilesmile opened a new pull request, #14127: URL: https://github.com/apache/iceberg/pull/14127
This PR is split into two parts to support preserving lineage information in Flink RewriteDataFiles. It only supports RewriteDataFiles for streaming compaction. 1. Adds readers in Flink for `_row_id` and `_last_updated_sequence_number`. https://github.com/apache/iceberg/commit/8adfe2ca77e521373fd912e637e8a76911f7a772 This change mainly aligns with/references https://github.com/apache/iceberg/pull/12836 2. When RewriteDataFiles executes rewrite tasks for a PlannedGroup, if the table is detected to support RowLineage, it rewrites the schema to add `ROW_ID` and `LAST_UPDATED_SEQUENCE_NUMBER`. It then reads the newly added `ROW_ID` and `LAST_UPDATED_SEQUENCE_NUMBER` fields and writes the lineage information into the merged DataFiles. https://github.com/apache/iceberg/commit/13ef2738d46e09876fb114f9583b985ec758fbde -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
