javrasya opened a new issue, #10247:
URL: https://github.com/apache/iceberg/issues/10247

   ### Apache Iceberg version
   
   1.4.3
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   We had to rollback our table because it had some broken snapshots. We are 
turning that table which gets upserts into a changelog stream in the downstream 
and process it that way. We use time boundaries. The way how it seems to work 
is that it looks at the history of the table and do some sort of a time travel 
query to find the recent snapshot id as of the end timestamp we pass down the 
the CDC procedure. 
   
   But since it only uses the history entries which does not give enough info 
about if the snapshots there are in the link for the current main branch 
reference.
   
   Here is the problematic line which calls the function in the iceberg-core
   
   
https://github.com/apache/iceberg/blob/426818bfe7fa93e8c677ebf886638d5c50db597b/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java#L530
   
   
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java#L350-L358
   
   I think it should disregard the snapshots when they are no longer in the 
main branch link
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to