TiansuYu commented on issue #30:
URL: https://github.com/apache/iceberg-python/issues/30#issuecomment-2306607034

   > A PyArrow Dataset can be initiated from a list of file paths:
   > 
   > Create a FileSystemDataset from explicitly given files. The files must be 
located on the same filesystem given by the filesystem parameter. Note that in 
contrary of construction from a single file, passing URIs as paths is not 
allowed.
   
   I have the opposite idea in mind: a Pyarrow representation / dataset 
protocol should be something like a MemoryBuffer that offers a set of APIs 
available to every table implementation, which should include the methods that 
have mentioned in this thread and the google doc. 
   
   Then other query engine can then load the dataset via this "InMemoryDataset" 
(thats kind of my mental model for Arrow) as intermediary (not sure we want to 
expose this directly like this, could be something thats hidden and low level). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to