pvary commented on PR #10567: URL: https://github.com/apache/iceberg/pull/10567#issuecomment-2242800689
@zhongyujiang: Do I understand correctly, that the issue happens when the following conditions are met: - We have at least 3 FileScanTasks (FS1, FS2, FS3) to read - We have a filter which filters out every record from FS1 - We have a failure after the reader already skipped reading FS1 (file offset is not increased), and started to read FS2 (file offset is increased) IIUC after state restore we will start reading FS1, because the `fileOffset` stored in the state is 1 instead of 2. I can see 2 ways to fix this: 1. Count every file in the `fileOffset` - even the ones which are skipped. This seems more natural to me, but the state need to be converted 2. Count only the non-skipped files in the `fileOffset` - even when restoring the state. This is the fix you have provided, and the state doesn't change in this case. Do I understand the situation correctly? Thanks, Peter -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org