TiansuYu commented on issue #30: URL: https://github.com/apache/iceberg-python/issues/30#issuecomment-2306607034
> A PyArrow Dataset can be initiated from a list of file paths: > > Create a FileSystemDataset from explicitly given files. The files must be located on the same filesystem given by the filesystem parameter. Note that in contrary of construction from a single file, passing URIs as paths is not allowed. I have the opposite idea in mind: a Pyarrow representation / dataset protocol should be something like a MemoryBuffer that offers a set of APIs available to every table implementation, which should include the methods that have mentioned in this thread and the google doc. Then other query engine can then load the dataset via this "InMemoryDataset" (thats kind of my mental model for Arrow) as intermediary (not sure we want to expose this directly like this, could be something thats hidden and low level). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org