cccs-jc commented on issue #8902: URL: https://github.com/apache/iceberg/issues/8902#issuecomment-1775412513
yes I would skip over all the files of a rewrite snapshot. also I've noticed here when you calculate the `latestOffset` you call `skippedManifestIndexesFromSnapshot`. https://github.com/apache/iceberg/blob/e2b56daf35724700a9b57dbeee5fe23f99c592c4/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java#L350 However, that function does not skip over rewrite snapshots. I think it should? and here, I'm not sure why you add the current number of files to your `currentFileIndex ` counter. https://github.com/apache/iceberg/blob/e2b56daf35724700a9b57dbeee5fe23f99c592c4/core/src/main/java/org/apache/iceberg/MicroBatches.java#L95 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org