kevinjqliu commented on issue #1637: URL: https://github.com/apache/iceberg-python/issues/1637#issuecomment-2652677058
hey @iyad-f sure thing. Iceberg has the concept of sort order https://iceberg.apache.org/spec/#sorting An Iceberg table can declare the data is sorted in certain way so that the engine can read the data more effectively. Write support for sort order is in #271 In this issue, I want to explore read support. Given an iceberg table that is sorted, can we efficiently leverage the sort order when reading? I think there are two components to this. 1. pruning manifests. use the table's sort order to efficiently skip manifest's based on its min/max values (i think this should be part of _InclusiveMetricsEvaluator above) 2. pruning data. push down the sort order to the [data file scan](https://github.com/apache/iceberg-python/blob/86b83e85754b32d864cde764364c2022a0bab92b/pyiceberg/io/pyarrow.py#L1379) (we should investigate whether this is supported in pyarrow) Let me know if that's clear. Happy to chat more -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org