kevinjqliu commented on issue #1506: URL: https://github.com/apache/iceberg-python/issues/1506#issuecomment-2654338470
Thanks! So we can narrow down that this issue only occurs when the row_filter is applied. Based on the logs above, it seems that 1. the same number of manifest files are scanned (113) 2. different number of file tasks and futures (114 vs 113) So for some reason, scanning the same manifests returns different number of data files. Here are some questions to investigate What is the row_filter youre using? Are the manifest files the same in terms of file_path? Are the data files the same in terms of file_path? What is the extra data file and how does it compare to the row_filter -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org