jraut-amz opened a new issue, #47890:
URL: https://github.com/apache/arrow/issues/47890

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   The documentation 
https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet21ArrowReaderProperties14set_pre_bufferEb
 and code comment 
https://github.com/apache/arrow/blob/main/cpp/src/parquet/properties.h#L1079 
say tha read coalescing/pre-buffering are off/false:
   
   ```cpp
     /// Enable read coalescing (default false).
     ///
     /// When enabled, the Arrow reader will pre-buffer necessary regions
     /// of the file in-memory. This is intended to improve performance on
     /// high-latency filesystems (e.g. Amazon S3).
     void set_pre_buffer(bool pre_buffer) { pre_buffer_ = pre_buffer; }
   ```
   
   However, the default is on/true:
   
https://github.com/apache/arrow/blob/main/cpp/src/parquet/properties.h#L1003, ` 
pre_buffer_(true),`
   
   This was introduced by 
https://github.com/apache/arrow/commit/d7017dd0dc567969c79d14aefc3d5a638e66270a#diff-562a81d45d101d71ec2673a6b981c63e8f5299df6f573b7841b67ede5d854936R839
 which changed the default to true but did not change the documentation.
   
   This is particularly problematic with the open memory issue when 
pre-buffering is on https://github.com/apache/arrow/issues/46935.
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to