This looks super cool would love to see more details. On a general note, a pluggable storage layer allows other storage engines (and possibly datastores) to leverage Cassandras distributed primitives (dynamo, gossip, paxsos?, drivers, cql etc). This could allow Cassandra to fill similar use cases as Dynomite from Netflix.
Also as Sankalp mentioned we get some other benefits including better testability. In my experience with pluggable storage engines (in the MySQL world), the > engine manages all storage that it "owns." The higher tiers in the > architecture don't need to get involved unless multiple storage engines > have to deal with compaction (or similar) issues over the entire database, > e.g., every storage engine has read/write access to every piece of data, > even if that data is owned by another storage engine. > > I don't know enough about Cassandra internals to have an opinion as to > whether or not the above scenario makes sense in the Cassandra context. But > "sharing" (processes or data) between storage engines gets pretty hairy, > easily deadlocky (!), even in something as relatively straightforward as > MySQL. This would be an implementation detail, but given that tables in Cassandra don't know about each other (no joins, foreign keys etc... ignore mv for the moment), but storage engine interactions probably wouldn't be an issue. > This was a long and old debate we had several times in the past. One of > the difficulty of pluggable storage engine is that we need to manage the > differences between the LSMT of native C* and RockDB engine for compaction, > repair, streaming etc... > > Right now all the compaction strategies share the assumption that the data > structure and layout on disk is fixed. With pluggable storage engine, we > need to special case each compaction strategy (or at least the Abstract > class of compaction strategy) for each engine. > The current approach is one storage engine, many compaction strategies for > different use-cases (TWCS for time series, LCS for heavy update...). > > With pluggable storage engine, we'll have a matrix of storage engine x > compaction strategies. > Compaction is part of the storage engine, and if I understand Dikangs design spec, it is bypassed? Cassandras currently storage engine is a log structured merge tree. RocksDB does it's own thing. Again this is an implementation detail about where the storage engine interface line is drawn, but from the above example compaction I think it is a non issue? > And not even mentioning the other operations to handle like streaming and > repair. > Streaming and repair would be the harder problem to solve than compaction imho. -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer