barronfuentes opened a new pull request, #12172:
URL: https://github.com/apache/iceberg/pull/12172

   This contribution attempts to resolve a bug with the RewriteTablePath action 
in Apache Iceberg. While the current implementation of RewriteTablePath 
supports the ability to replicate a portion of an Iceberg tables history, it 
does not support the ability to do so repeatedly while maintaining history of 
previous replicated portions in the target table.
   
   For example, assume an Iceberg table has two versions: A, B. Immediately 
after the first version (A) was created in the table, the Rewrite Table Path 
action was used to replicate the table for disaster recovery. After the second 
version (B) was added to the source table, Rewrite Table Path was used once 
again to replicate (append) only the second version to the target table. In the 
current state, the target table will have all of the metadata and data files 
associated with both versions A & B, but only the records contained within the 
most recent replicated version (B) will be available to queries. This pull 
request aims to address this issue and provide the ability to incrementally 
replicate Iceberg tables in a manner which results in a target table with 
contents that are consistent with the source table.
   
   The cause of the issue exists in how new Manifest List files are created. A 
filter which prevents rewriting Manifest Files out of scope for replication is 
also applied to the content of rewritten Manifest Lists. The exclusion of these 
historical references is the reason why queries can only access records 
contained in the latest replication and not the previous replications.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to