vaultah commented on PR #13720:
URL: https://github.com/apache/iceberg/pull/13720#issuecomment-3235133985

   @dramaticlly @stevenzwu 
   
   Let's say we have manifests A, B, and C, added by snapshot 1. We then create 
snapshot 2: its manifest list will contain references to manifests A, B, and C, 
and we also add the reference to the new manifest D. 
   
   Now we use `rewrite_table_path` in incremental mode, starting from snapshot 
1. 
   
   Per your suggestion, it will rewrite just manifest D and the manifest list 
of snapshot 2. We assume that manifests added in snapshot 1 were already 
rewritten before, so we simply update their paths in the manifest list of 
snapshot 2.
   
   For manifest D, the rewritten manifest list will have the new path and the 
new size. For manifests A, B, C, the rewritten manifest list will have new 
paths and their original lengths. In other words, the rewritten manifest list 
will be
   
   | manifest_path | manifest_length | ... |
   | --- | --- | --- |
   | newPath(D) | newLength(D)| |
   | newPath(C) | oldLength(C)| |
   | newPath(B) | oldLength(B)| |
   | newPath(A) | oldLength(A)| |
   
   
   As a result of rewriting, manifest length will almost certainly change, so 
in general `oldLength(A) != newLength(A)`, which means the size of manifest A 
in the rewritten manifest list is incorrect, as in it doesn't match the length 
of the actual physical file at `newPath(A)` that it's referencing. This is the 
scenario from https://github.com/apache/iceberg/issues/13719 that I'm trying to 
solve.
   
   Please help me understand how the correctness is maintained in your 
suggestion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to