rdblue opened a new pull request, #6283:
URL: https://github.com/apache/iceberg/pull/6283

   This improves Avro scan performance when using PyArrowFileIO by about 10x by 
ensuring that scans are buffered. I was testing scan planning and found that S3 
planning took about 130 seconds. Buffering the input stream gets the planning 
time for the same query to 14 seconds.
   
   The problem was that the stream provided by PyArrow was not buffered, so 
nearly every read operation was causing another request to S3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to