barronfuentes opened a new pull request, #12172: URL: https://github.com/apache/iceberg/pull/12172
This contribution attempts to resolve a bug with the RewriteTablePath action in Apache Iceberg. While the current implementation of RewriteTablePath supports the ability to replicate a portion of an Iceberg tables history, it does not support the ability to do so repeatedly while maintaining history of previous replicated portions in the target table. For example, assume an Iceberg table has two versions: A, B. Immediately after the first version (A) was created in the table, the Rewrite Table Path action was used to replicate the table for disaster recovery. After the second version (B) was added to the source table, Rewrite Table Path was used once again to replicate (append) only the second version to the target table. In the current state, the target table will have all of the metadata and data files associated with both versions A & B, but only the records contained within the most recent replicated version (B) will be available to queries. This pull request aims to address this issue and provide the ability to incrementally replicate Iceberg tables in a manner which results in a target table with contents that are consistent with the source table. The cause of the issue exists in how new Manifest List files are created. A filter which prevents rewriting Manifest Files out of scope for replication is also applied to the content of rewritten Manifest Lists. The exclusion of these historical references is the reason why queries can only access records contained in the latest replication and not the previous replications. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org