[I] How to best optimize reading from S3? [arrow-go]

via GitHub Wed, 12 Feb 2025 14:30:24 -0800


stevbear opened a new issue, #278:
URL: https://github.com/apache/arrow-go/issues/278


   ### Describe the usage question you have. Please include as many useful 
details as possible.
   
   
   Hi!
   I have a use case of reading certain row groups from S3.
   I see that there is an option BufferedStreamEnabled. 
   When I set BufferedStreamEnabled to false, it seems to try to read all of 
the data of a column for a row group at once, which will, unfortunately, result 
in OOM for us.
   When I set BufferedStreamEnabled to true, the library seems to be reading 
the row group page by page, which is not optimal for cloud usage.
   How can I improve this? I imagine that the best way to improve this would be 
to read multiple pages in one read() sys call?
   
   ### Component(s)
   
   Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] How to best optimize reading from S3? [arrow-go]

Reply via email to