javrasya opened a new issue, #10247: URL: https://github.com/apache/iceberg/issues/10247
### Apache Iceberg version 1.4.3 ### Query engine Spark ### Please describe the bug 🐞 We had to rollback our table because it had some broken snapshots. We are turning that table which gets upserts into a changelog stream in the downstream and process it that way. We use time boundaries. The way how it seems to work is that it looks at the history of the table and do some sort of a time travel query to find the recent snapshot id as of the end timestamp we pass down the the CDC procedure. But since it only uses the history entries which does not give enough info about if the snapshots there are in the link for the current main branch reference. Here is the problematic line which calls the function in the iceberg-core https://github.com/apache/iceberg/blob/426818bfe7fa93e8c677ebf886638d5c50db597b/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java#L530 https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java#L350-L358 I think it should disregard the snapshots when they are no longer in the main branch link -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org