Hi community,

We would like to propose *CEP-54: ZSTD Compression with Dictionary Support*
for adoption by the community:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-54%3A+ZSTD+with+Dictionary+SSTable+Compression

This CEP proposes introducing ZSTD with dictionary compression for
SSTables. This feature allows users who need it to achieve significant
improvements in compression ratio and speed, leading to better performance
and storage efficiency. This is an entirely opt-in feature.

The proposed ZSTD with dictionary support will enable organizations to
achieve:

- Faster read/write performance.
- Reduced storage footprint.
- Increased storage device lifetime from fewer writes.

Key design principles:

- Zero impact on users who don't enable the feature.
- Initial emphasis on simplicity, supporting a single global dictionary per
table and manual training, while maintaining extensibility for future
automation.
- SSTable-attached dictionaries to ensure that operations like backup,
restore, and streaming continue to work seamlessly.
- Graceful fallback to standard ZSTD compression when a dictionary isn't
available.
- A critical design constraint to avoid a large number of unique
dictionaries, which can hurt decompression speed.

This enhancement addresses the need for better storage efficiency and
performance by leveraging ZSTD dictionaries, while maintaining complete
backward compatibility and requiring no changes to existing deployments
that do not enable the feature.

Thanks to Jon Haddad for bringing up the topic and providing feedbacks in
shaping the design, and to Dinesh Joshi, Joey Lynch, Stefan Miklosovic, and
Francisco Guerrero for providing design feedbacks.

Thanks in advance for your time and feedback. Please keep the discussion on
this mailing list thread.

- Yifan

Reply via email to