Ext3h opened a new issue, #48187: URL: https://github.com/apache/arrow/issues/48187
### Describe the bug, including details regarding any error messages, version, and platform. `arrow::util::internal::ZSTDCodec::Compress` respectively `arrow::util::internal::ZSTDCodec::Decompress` are using a naked `ZSTD_compress` / `ZSTD_decompress` with usually quite small buffers which represents the absolute worst case scenario for the ZSTD API. In the common use case of using Parquet with 8kB block size, a new ZSTD context of several MB in size is allocated for every single 8kB block, and then immediately released right after. `ZSTDCodec` should explicitly create a compression/decompression context and explicitly re-use the corresponding context for subsequent calls to `Compress()` / `Decompress()`. This mirrors the change already applied to the rust implementation of arrow: https://github.com/apache/arrow-rs/pull/8405 - the same observations about the performance impact of the current lack of context reuse also apply to the C++ implementation. In addition to the changes alread applied to the Rust implemetation, there's also `ZSTD_customMem` respectively `ZSTD_createCCtx_advanced` and `ZSTD_createDCtx_advanced` to consider - the `ZSTDCodec` can (and should) be properly slaved to the existing memory pool rather than being left to hit the systems default heap. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
