Ext3h opened a new issue, #48187:
URL: https://github.com/apache/arrow/issues/48187

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   `arrow::util::internal::ZSTDCodec::Compress` respectively 
`arrow::util::internal::ZSTDCodec::Decompress` are using a naked 
`ZSTD_compress` / `ZSTD_decompress` with usually quite small buffers which 
represents the absolute worst case scenario for the ZSTD API.
   
   In the common use case of using Parquet with 8kB block size, a new ZSTD 
context of several MB in size is allocated for every single 8kB block, and then 
immediately released right after.
   
   `ZSTDCodec` should explicitly create a compression/decompression context and 
explicitly re-use the corresponding context for subsequent calls to 
`Compress()` / `Decompress()`.
   
   This mirrors the change already applied to the rust implementation of arrow: 
https://github.com/apache/arrow-rs/pull/8405 - the same observations about the 
performance impact of the current lack of context reuse also apply to the C++ 
implementation.
   
   In addition to the changes alread applied to the Rust implemetation, there's 
also `ZSTD_customMem` respectively `ZSTD_createCCtx_advanced` and 
`ZSTD_createDCtx_advanced` to consider - the `ZSTDCodec` can (and should) be 
properly slaved to the existing memory pool rather than being left to hit the 
systems default heap.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to