Re: [I] Expose PyIceberg table as PyArrow Dataset [iceberg-python]

via GitHub Fri, 23 Aug 2024 01:45:48 -0700


TiansuYu commented on issue #30:
URL: https://github.com/apache/iceberg-python/issues/30#issuecomment-2306607034


   > A PyArrow Dataset can be initiated from a list of file paths:
   > 
   > Create a FileSystemDataset from explicitly given files. The files must be 
located on the same filesystem given by the filesystem parameter. Note that in 
contrary of construction from a single file, passing URIs as paths is not 
allowed.
   
   I have the opposite idea in mind: a Pyarrow representation / dataset 
protocol should be something like a MemoryBuffer that offers a set of APIs 
available to every table implementation, which should include the methods that 
have mentioned in this thread and the google doc. 
   
   Then other query engine can then load the dataset via this "InMemoryDataset" 
(thats kind of my mental model for Arrow) as intermediary (not sure we want to 
expose this directly like this, could be something thats hidden and low level). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Expose PyIceberg table as PyArrow Dataset [iceberg-python]

Reply via email to