Re: [I] [Feature Request] Speed up InspectTable.files() [iceberg-python]

via GitHub Fri, 18 Oct 2024 10:29:18 -0700


corleyma commented on issue #1229:
URL: 
https://github.com/apache/iceberg-python/issues/1229#issuecomment-2422931184


   > Most of the time is spent processing the manifests record-by-record and 
converting each record to a dict
   
   I haven't looked at this closely, but if memory serves, pyiceberg implements 
its own avro reader/writer using Cython.  Concurrency is great, but I wonder if 
we can make big gains by implementing a more direct conversion of avro records 
to pyarrow recordbatches somewhere at that level?  Then, processing the 
manifests could probably be implemented using pyarrow compute functions (C++) 
for a lot of performance gain?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [Feature Request] Speed up InspectTable.files() [iceberg-python]

Reply via email to