Also we have 2.1.x and 2.2 clusters, so we can't use CDC since apparently that is a 3.8 feature.
Virtual tables are very exciting so we could do some collating stuff (which I'd LOVE to do with our scheduling application where we can split tasks into near term/most frequent(hours to days), medium-term/less common(days to weeks), long/years ), with the aim of avoiding having to do compaction at all and just truncating buckets as they "expire" for a nice O(1) compaction process. On Fri, Oct 19, 2018 at 9:57 AM Carl Mueller <carl.muel...@smartthings.com> wrote: > new DC and then split is one way, but you have to wait for it to stream, > and then how do you know the dc coherence is good enough to switch the > targetted DC for local_quorum? And then once we split it we'd have downtime > to "change the name" and other work that would distinguish it from the > original cluster, from what I'm told from the peoples that do the DC / > cluster setup and aws provisioning. It is a tool in the toolchest... > > We might be able to get stats of the queries and updates impacting the > cluster in a centralized manner with a trigger too. > > We will probably do stream-to-kafka trigger based on what is on the > intarweb and since we have kafka here already. > > I will look at CDC. > > Thank you everybody! > > > On Fri, Oct 19, 2018 at 3:29 AM Antonis Papaioannou <papai...@ics.forth.gr> > wrote: > >> It reminds me of “shadow writes” described in [1]. >> During data migration the coordinator forwards a copy of any write >> request regarding tokens that are being transferred to the new node. >> >> [1] Incremental Elasticity for NoSQL Data Stores, SRDS’17, >> https://ieeexplore.ieee.org/document/8069080 >> >> >> > On 18 Oct 2018, at 18:53, Carl Mueller >> > <carl.muel...@smartthings.com.INVALID> >> wrote: >> > >> > tl;dr: a generic trigger on TABLES that will mirror all writes to >> > facilitate data migrations between clusters or systems. What is >> necessary >> > to ensure full write mirroring/coherency? >> > >> > When cassandra clusters have several "apps" aka keyspaces serving >> > applications colocated on them, but the app/keyspace bandwidth and size >> > demands begin impacting other keyspaces/apps, then one strategy is to >> > migrate the keyspace to its own dedicated cluster. >> > >> > With backups/sstableloading, this will entail a delay and therefore a >> > "coherency" shortfall between the clusters. So typically one would >> employ a >> > "double write, read once": >> > >> > - all updates are mirrored to both clusters >> > - writes come from the current most coherent. >> > >> > Often two sstable loads are done: >> > >> > 1) first load >> > 2) turn on double writes/write mirroring >> > 3) a second load is done to finalize coherency >> > 4) switch the app to point to the new cluster now that it is coherent >> > >> > The double writes and read is the sticking point. We could do it at the >> app >> > layer, but if the app wasn't written with that, it is a lot of testing >> and >> > customization specific to the framework. >> > >> > We could theoretically do some sort of proxying of the java-driver >> somehow, >> > but all the async structures and complex interfaces/apis would be >> difficult >> > to proxy. Maybe there is a lower level in the java-driver that is >> possible. >> > This also would only apply to the java-driver, and not >> > python/go/javascript/other drivers. >> > >> > Finally, I suppose we could do a trigger on the tables. It would be >> really >> > nice if we could add to the cassandra toolbox the basics of a write >> > mirroring trigger that could be activated "fairly easily"... now I know >> > there are the complexities of inter-cluster access, and if we are even >> > using cassandra as the target mirror system (for example there is an >> > article on triggers write-mirroring to kafka: >> > https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1). >> > >> > And this starts to get into the complexities of hinted handoff as well. >> But >> > fundamentally this seems something that would be a very nice feature >> > (especially when you NEED it) to have in the core of cassandra. >> > >> > Finally, is the mutation hook in triggers sufficient to track all >> incoming >> > mutations (outside of "shudder" other triggers generating data) >> >>