dramaticlly commented on PR #12172: URL: https://github.com/apache/iceberg/pull/12172#issuecomment-2641532788
I did some look on this, I think right now the only strict delta manifest-list/manifests/data files are rewritten if `start_version` is provided. So for incremental rewrite and copy does not result in cumulative of all deltas on read. Collected some details from provided unit test examples to help visualize the problem 1st snapshot: 4355148708777346214 2nd snapshot: 936971667881185972 - source table ``` +--------------------------------------------------------------------------------+-----------------------------------------------------------+-------------------+ |manifest-list |path |added_snapshot_id | +--------------------------------------------------------------------------------+-----------------------------------------------------------+-------------------+ |1224/metadata/snap-936971667881185972-1-7a1b9514-44b9-418e-a6b3-dd724a5daa01.avro|1224/metadata/7a1b9514-44b9-418e-a6b3-dd724a5daa01-m0.avro|936971667881185972 | |1224/metadata/snap-936971667881185972-1-7a1b9514-44b9-418e-a6b3-dd724a5daa01.avro|1224/metadata/a3362153-e2aa-4e7c-a04e-7ea9f525d0ce-m0.avro|4355148708777346214| +--------------------------------------------------------------------------------+-----------------------------------------------------------+-------------------+ +----------------------------------------------------------+------+-------------------+-----------------------------------------------------------------------+ |manifest-file |status|snapshot_id |file_path | +----------------------------------------------------------+------+-------------------+-----------------------------------------------------------------------+ |1224/metadata/7a1b9514-44b9-418e-a6b3-dd724a5daa01-m0.avro|1 |936971667881185972 |1224/data/00000-16-2c43c06a-e56e-4c74-a498-393398ac66df-0-00001.parquet| |1224/metadata/a3362153-e2aa-4e7c-a04e-7ea9f525d0ce-m0.avro|1 |4355148708777346214|1224/data/00000-2-f37fff9f-1075-4e06-85e1-fa7c3f22ac48-0-00001.parquet | +----------------------------------------------------------+------+-------------------+-----------------------------------------------------------------------+ ``` - target table ``` +--------------------------------------------------------------------------------+-----------------------------------------------------------+------------------+ |manifest-list |path |added_snapshot_id | +--------------------------------------------------------------------------------+-----------------------------------------------------------+------------------+ |4614/metadata/snap-936971667881185972-1-7a1b9514-44b9-418e-a6b3-dd724a5daa01.avro|4614/metadata/7a1b9514-44b9-418e-a6b3-dd724a5daa01-m0.avro|936971667881185972| +--------------------------------------------------------------------------------+-----------------------------------------------------------+------------------+ +----------------------------------------------------------+------+-------------------+-----------------------------------------------------------------------+ |manifest-file |status|snapshot_id |file_path | +----------------------------------------------------------+------+-------------------+-----------------------------------------------------------------------+ |4614/metadata/7a1b9514-44b9-418e-a6b3-dd724a5daa01-m0.avro|1 |936971667881185972|4614/data/00000-16-2c43c06a-e56e-4c74-a498-393398ac66df-0-00001.parquet | +----------------------------------------------------------+------+-------------------+-----------------------------------------------------------------------+ ``` So incremental copy after 1st snapshot, read of target table ends up with only the delta for all iceberg metadata related to snapshot `936971667881185972` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org