soumya-ghosh commented on issue #1053: URL: https://github.com/apache/iceberg-python/issues/1053#issuecomment-2350947937
> What if you just return all unique (data+delete) files? In this case, output will not match with Spark. Will that be okay? Also found this [PR from Iceberg](https://github.com/apache/iceberg/pull/805), > These tables may contain duplicate rows. Deduplication can't be done through the current scan interface unless all of the work is done during scan planning on a single node. Duplicates are the trade-off for being able to process the metadata in parallel for large tables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org