kevinjqliu commented on issue #1506:
URL: 
https://github.com/apache/iceberg-python/issues/1506#issuecomment-2654338470

   Thanks! So we can narrow down that this issue only occurs when the 
row_filter is applied. 
   
   Based on the logs above, it seems that
   1. the same number of manifest files are scanned (113) 
   2. different number of file tasks and futures (114 vs 113)
   
   So for some reason, scanning the same manifests returns different number of 
data files. 
   
   Here are some questions to investigate
   What is the row_filter youre using? 
   Are the manifest files the same in terms of file_path?
   Are the data files the same in terms of file_path? What is the extra data 
file and how does it compare to the row_filter
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to