Trigger based has worked for us in the past to get once only output of what’s happened - pushing this to Kafka and using Kafka Connect allowed to then direct the stream to to other endpoints.
CDC based streaming has the issue of duplicates which are technically fine if you don’t care that much about repeat changes coming from replicas. I agree with Ben. If the goal is just to move a key space from one cluster to another that is active and can’t go down, his method will work for sure. Also, is there a specific reason you need to split the cluster? Why not just have another DC and keep it part of the cluster? Do you have more than a hundred tables? Rahul Singh Chief Executive Officer m 202.905.2818 Anant Corporation 1010 Wisconsin Ave NW, Suite 250 Washington, D.C. 20007 We build and manage digital business technology platforms. On Oct 18, 2018, 4:35 PM -0400, Ben Slater <ben.sla...@instaclustr.com>, wrote: > I might be missing something but we’ve done this operation on a few > occasions by: > 1) Commission the new cluster and join it to the existing cluster as a 2nd > DC > 2) Replicate just the keyspace that you want to move to the 2nd DC > 3) Make app changes to read moved tables from 2nd DC > 4) Change keyspace definition to remove moved keyspace from first DC > 5) Split the 2DCs into separate clusters (sever network connections, change > seeds) > > If it’s just a table you moving and not a whole keyspace then you can skip > step 4 and drop the unneeded tables from either side after splitting. This > might mean the new cluster needs to be temporarily bigger than the > end-state during the migration process. > > Cheers > Ben > > On Fri, 19 Oct 2018 at 07:04 Jeff Jirsa <jji...@gmail.com> wrote: > > > Could be done with CDC > > Could be done with triggers > > (Could be done with vtables — double writes or double reads — if they were > > extended to be user facing) > > > > Would be very hard to generalize properly, especially handling failure > > cases (write succeeds in one cluster/table but not the other) which are > > often app specific > > > > > > -- > > Jeff Jirsa > > > > > > > On Oct 18, 2018, at 6:47 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > > > > > > Isn't this what CDC was designed for? > > > > > > https://issues.apache.org/jira/browse/CASSANDRA-8844 > > > > > > On Thu, Oct 18, 2018 at 10:54 AM Carl Mueller > > > <carl.muel...@smartthings.com.invalid> wrote: > > > > > > > tl;dr: a generic trigger on TABLES that will mirror all writes to > > > > facilitate data migrations between clusters or systems. What is > > necessary > > > > to ensure full write mirroring/coherency? > > > > > > > > When cassandra clusters have several "apps" aka keyspaces serving > > > > applications colocated on them, but the app/keyspace bandwidth and size > > > > demands begin impacting other keyspaces/apps, then one strategy is to > > > > migrate the keyspace to its own dedicated cluster. > > > > > > > > With backups/sstableloading, this will entail a delay and therefore a > > > > "coherency" shortfall between the clusters. So typically one would > > employ a > > > > "double write, read once": > > > > > > > > - all updates are mirrored to both clusters > > > > - writes come from the current most coherent. > > > > > > > > Often two sstable loads are done: > > > > > > > > 1) first load > > > > 2) turn on double writes/write mirroring > > > > 3) a second load is done to finalize coherency > > > > 4) switch the app to point to the new cluster now that it is coherent > > > > > > > > The double writes and read is the sticking point. We could do it at the > > app > > > > layer, but if the app wasn't written with that, it is a lot of testing > > and > > > > customization specific to the framework. > > > > > > > > We could theoretically do some sort of proxying of the java-driver > > somehow, > > > > but all the async structures and complex interfaces/apis would be > > difficult > > > > to proxy. Maybe there is a lower level in the java-driver that is > > possible. > > > > This also would only apply to the java-driver, and not > > > > python/go/javascript/other drivers. > > > > > > > > Finally, I suppose we could do a trigger on the tables. It would be > > really > > > > nice if we could add to the cassandra toolbox the basics of a write > > > > mirroring trigger that could be activated "fairly easily"... now I know > > > > there are the complexities of inter-cluster access, and if we are even > > > > using cassandra as the target mirror system (for example there is an > > > > article on triggers write-mirroring to kafka: > > > > https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1). > > > > > > > > And this starts to get into the complexities of hinted handoff as well. > > But > > > > fundamentally this seems something that would be a very nice feature > > > > (especially when you NEED it) to have in the core of cassandra. > > > > > > > > Finally, is the mutation hook in triggers sufficient to track all > > incoming > > > > mutations (outside of "shudder" other triggers generating data) > > > > > > > > > > > > > -- > > > Jonathan Ellis > > > co-founder, http://www.datastax.com > > > @spyced > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > -- > > > *Ben Slater* > > *Chief Product Officer <https://www.instaclustr.com/>* > > <https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr> > <https://www.linkedin.com/company/instaclustr> > > Read our latest technical blog posts here > <https://www.instaclustr.com/blog/>. > > This email has been sent on behalf of Instaclustr Pty. Limited (Australia) > and Instaclustr Inc (USA). > > This email and any attachments may contain confidential and legally > privileged information. If you are not the intended recipient, do not copy > or disclose its content, but please reply to this email immediately and > highlight the error to the sender and then immediately delete the message.