soumya-ghosh commented on issue #1053:
URL: 
https://github.com/apache/iceberg-python/issues/1053#issuecomment-2350947937

   > What if you just return all unique (data+delete) files?
   
   In this case, output will not match with Spark. Will that be okay?
   
   Also found this [PR from 
Iceberg](https://github.com/apache/iceberg/pull/805),
   > These tables may contain duplicate rows. Deduplication can't be done 
through the current scan interface unless all of the work is done during scan 
planning on a single node. Duplicates are the trade-off for being able to 
process the metadata in parallel for large tables.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to