Hi Yifan, This looks very promising for customers aiming to improve Cassandra performance. I had a few questions on the user experience:
* How does a user enable this feature—via YAML config or through CQL DDL? * If it’s CQL, is it applied at the keyspace or table level? * Is the process for disabling the feature the same? Thanks, Himanshu From: Yifan Cai <[email protected]> Date: Thursday, September 4, 2025 at 7:00 PM To: [email protected] <[email protected]> Subject: RE: [EXTERNAL] [DISCUSS] CEP-54: ZSTD Compression with Dictionary Support CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Noted with thanks. I agree that it does not need to be zstd specific. The additional dict information for CompressionInfo are dictionary id, dictionary bytes and checksum of id and content. It should be common for other dictionary-based compression algorithms. In terms of implementation, I will keep this in mind. - Yifan On Thu, Sep 4, 2025 at 5:49 PM David Capwell <[email protected]<mailto:[email protected]>> wrote: Thanks for bringing this out! My first question when quickly looking at this is can we make the CompressionInfo change agnostic to the algorithm or have the format change based off the algorithm? Lz4 has similar (though not as easy to use as zstd) feature and new algorithms might come out which we want to include later on; It would be a shame to have the format tightly coupled to zstd only. On Sep 4, 2025, at 1:50 PM, Yifan Cai <[email protected]<mailto:[email protected]>> wrote: Hi community, We would like to propose CEP-54: ZSTD Compression with Dictionary Support for adoption by the community: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-54%3A+ZSTD+with+Dictionary+SSTable+Compression This CEP proposes introducing ZSTD with dictionary compression for SSTables. This feature allows users who need it to achieve significant improvements in compression ratio and speed, leading to better performance and storage efficiency. This is an entirely opt-in feature. The proposed ZSTD with dictionary support will enable organizations to achieve: - Faster read/write performance. - Reduced storage footprint. - Increased storage device lifetime from fewer writes. Key design principles: - Zero impact on users who don't enable the feature. - Initial emphasis on simplicity, supporting a single global dictionary per table and manual training, while maintaining extensibility for future automation. - SSTable-attached dictionaries to ensure that operations like backup, restore, and streaming continue to work seamlessly. - Graceful fallback to standard ZSTD compression when a dictionary isn't available. - A critical design constraint to avoid a large number of unique dictionaries, which can hurt decompression speed. This enhancement addresses the need for better storage efficiency and performance by leveraging ZSTD dictionaries, while maintaining complete backward compatibility and requiring no changes to existing deployments that do not enable the feature. Thanks to Jon Haddad for bringing up the topic and providing feedbacks in shaping the design, and to Dinesh Joshi, Joey Lynch, Stefan Miklosovic, and Francisco Guerrero for providing design feedbacks. Thanks in advance for your time and feedback. Please keep the discussion on this mailing list thread. - Yifan
