tl;dr: a generic trigger on TABLES that will mirror all writes to
facilitate data migrations between clusters or systems. What is necessary
to ensure full write mirroring/coherency?

When cassandra clusters have several "apps" aka keyspaces serving
applications colocated on them, but the app/keyspace bandwidth and size
demands begin impacting other keyspaces/apps, then one strategy is to
migrate the keyspace to its own dedicated cluster.

With backups/sstableloading, this will entail a delay and therefore a
"coherency" shortfall between the clusters. So typically one would employ a
"double write, read once":

- all updates are mirrored to both clusters
- writes come from the current most coherent.

Often two sstable loads are done:

1) first load
2) turn on double writes/write mirroring
3) a second load is done to finalize coherency
4) switch the app to point to the new cluster now that it is coherent

The double writes and read is the sticking point. We could do it at the app
layer, but if the app wasn't written with that, it is a lot of testing and
customization specific to the framework.

We could theoretically do some sort of proxying of the java-driver somehow,
but all the async structures and complex interfaces/apis would be difficult
to proxy. Maybe there is a lower level in the java-driver that is possible.
This also would only apply to the java-driver, and not
python/go/javascript/other drivers.

Finally, I suppose we could do a trigger on the tables. It would be really
nice if we could add to the cassandra toolbox the basics of a write
mirroring trigger that could be activated "fairly easily"... now I know
there are the complexities of inter-cluster access, and if we are even
using cassandra as the target mirror system (for example there is an
article on triggers write-mirroring to kafka:
https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).

And this starts to get into the complexities of hinted handoff as well. But
fundamentally this seems something that would be a very nice feature
(especially when you NEED it) to have in the core of cassandra.

Finally, is the mutation hook in triggers sufficient to track all incoming
mutations (outside of "shudder" other triggers generating data)

Reply via email to