rdblue opened a new pull request, #6283: URL: https://github.com/apache/iceberg/pull/6283
This improves Avro scan performance when using PyArrowFileIO by about 10x by ensuring that scans are buffered. I was testing scan planning and found that S3 planning took about 130 seconds. Buffering the input stream gets the planning time for the same query to 14 seconds. The problem was that the stream provided by PyArrow was not buffered, so nearly every read operation was causing another request to S3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
