Re: [I] [feature] Table Scan should take into account the table's sort order [iceberg-python]

via GitHub Fri, 21 Feb 2025 09:17:00 -0800


kevinjqliu commented on issue #1637:
URL: 
https://github.com/apache/iceberg-python/issues/1637#issuecomment-2675118949


   > _InclusiveMetricsEvaluator does check for the lower and upper bounds of 
the file
   
   It does, but i dont think sort order is currently applied on the read side 
for the metadata level. I havent dug into the process yet so bare with me. I 
would assume that setting a sort order on a table will allow us to skip even 
more data files. Currently the evaluator would look at each data file and 
evaluate the lower/upper bound. But if the column is sorted, we can apply a 
binary search. Not sure how this is done on the spark/java side but im 
definitely interested to learn more. 
   
   > looking for a way to optimize the data scan at the arrow level?
   
   yea i wonder if pyarrow already allows us to pass in a sort order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [feature] Table Scan should take into account the table's sort order [iceberg-python]

Reply via email to