cnoelle opened a new issue, #45553: URL: https://github.com/apache/arrow/issues/45553
### Describe the usage question you have. Please include as many useful details as possible. I would like to stream data from a dataset consisting of multiple files, partitioned by one column (time). It should be possible to sort the data according to this time column in either ascending or descending way. Is this possible with the Dataset API? Documentation of the `Dataset.sort_by()` method states that it returns an `InMemoryDataset` (https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html#pyarrow.dataset.Dataset.sort_by), or in other words, it immediately reads all files into memory. When using a partitioned dataset and sorting on the partitioning column I would expect that `sort_by()` could determine the order of the required input files only and parse them lazily when I run `to_batches()` on the resulting dataset. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org