szehon-ho commented on PR #9335: URL: https://github.com/apache/iceberg/pull/9335#issuecomment-1910836939
Clarified with @hsiang-c . Will draw a diagram to illustrate the problem. Imagine following graph: ``` Snapshot1 -> Manifest1 -> Entry1 Snapshot2 -> Manifest1 -> Entry1 Snapshot3 -> Manifest1 -> Entry1 ``` Notice all three snapshots point to the same manifest file. So, given the dedup mechanism, and assuming first-in-first-out, we will only be left with an entry like: | entry | as_of_snapshot | | --- | --- | | Entry1 | Snapshot1 | This seems fine to me, as the word 'as_of_snapshot' make it seem like we want to know when each entry first came into the picture. It does not seem necessary to change the behavior of the table and list every single snapshot that refer to the entry in this table. cc @RussellSpitzer @hsiang-c for thoughts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org