corleyma commented on issue #1229: URL: https://github.com/apache/iceberg-python/issues/1229#issuecomment-2422931184
> Most of the time is spent processing the manifests record-by-record and converting each record to a dict I haven't looked at this closely, but if memory serves, pyiceberg implements its own avro reader/writer using Cython. Concurrency is great, but I wonder if we can make big gains by implementing a more direct conversion of avro records to pyarrow recordbatches somewhere at that level? Then, processing the manifests could probably be implemented using pyarrow compute functions (C++) for a lot of performance gain? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org