Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances
This seems like a lot of work to create an rsync alternative. I can't really say I see the point. I noticed your "rejected alternatives" mentions it with this note: - However, it might not be permitted by the administrator or available in various environments such as Kubernetes or virtual instances like EC2. Enabling data transfer through a sidecar facilitates smooth instance migration. This feels more like NIH than solving a real problem, as what you've listed is a hypothetical, and one that's easily addressed. Jon On Fri, Apr 5, 2024 at 3:47 AM Venkata Hari Krishna Nukala < n.v.harikrishna.apa...@gmail.com> wrote: > Hi all, > > I have filed CEP-40 [1] for live migrating Cassandra instances using the > Cassandra Sidecar. > > When someone needs to move all or a portion of the Cassandra nodes > belonging to a cluster to different hosts, the traditional approach of > Cassandra node replacement can be time-consuming due to repairs and the > bootstrapping of new nodes. Depending on the volume of the storage service > load, replacements (repair + bootstrap) may take anywhere from a few hours > to days. > > Proposing a Sidecar based solution to address these challenges. This > solution proposes transferring data from the old host (source) to the new > host (destination) and then bringing up the Cassandra process at the > destination, to enable fast instance migration. This approach would help to > minimise node downtime, as it is based on a Sidecar solution for data > transfer and avoids repairs and bootstrap. > > Looking forward to the discussions. > > [1] > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances > > Thanks! > Hari >
Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances
Hi Jon, Thanks for taking the time to read and reply to this proposal. Would encourage you to approach it from an attitude of seeking understanding on the part of the first-time CEP author, as this reply casts it off pretty quickly as NIH. The proposal isn't mine, but I'll offer a few notes on where I see this as valuable: – It's valuable for Cassandra to have an ecosystem-native mechanism of migrating data between physical/virtual instances outside the standard streaming path. As Hari mentions, the current ecosystem-native approach of executing repairs, decommissions, and bootstraps is time-consuming and cumbersome. – An ecosystem-native solution is safer than a bunch of bash and rsync. Defining a safe protocol to migrate data between instances via rsync without downtime is surprisingly difficult - and even moreso to do safely and repeatedly at scale. Enabling this process to be orchestrated by a control plane mechanizing offical endpoints of the database and sidecar – rather than trying to move data around behind its back – is much safer than hoping one's cobbled together the right set of scripts to move data in a way that won't violate strong / transactional consistency guarantees. This complexity is kind of exemplified by the "Migrating One Instance" section of the doc and state machine diagram, which illustrates an approach to solving that problem. – An ecosystem-native approach poses fewer security concerns than rsync. mTLS-authenticated endpoints in the sidecar for data movement eliminate the requirement for orchestration to occur via (typically) high-privilege SSH, which often allows for code execution of some form or complex efforts to scope SSH privileges of particular users; and eliminates the need to manage and secure rsyncd processes on each instance if not via SSH. – An ecosystem-native approach is more instrumentable and measurable than rsync. Support for data migration endpoints in the sidecar would allow for metrics reporting, stats collection, and alerting via mature and modern mechanisms rather than monitoring the output of a shell script. I'll yield to Hari to share more, though today is a public holiday in India. I do see this CEP as solving an important problem. Thanks, – Scott On Apr 8, 2024, at 10:23 AM, Jon Haddad wrote: This seems like a lot of work to create an rsync alternative. I can't really say I see the point. I noticed your "rejected alternatives" mentions it with this note: However, it might not be permitted by the administrator or available in various environments such as Kubernetes or virtual instances like EC2. Enabling data transfer through a sidecar facilitates smooth instance migration . This feels more like NIH than solving a real problem, as what you've listed is a hypothetical, and one that's easily addressed. Jon On Fri, Apr 5, 2024 at 3:47 AM Venkata Hari Krishna Nukala < n.v.harikrishna.apa...@gmail.com > wrote: Hi all, I have filed CEP-40 [1] for live migrating Cassandra instances using the Cassandra Sidecar. When someone needs to move all or a portion of the Cassandra nodes belonging to a cluster to different hosts, the traditional approach of Cassandra node replacement can be time-consuming due to repairs and the bootstrapping of new nodes. Depending on the volume of the storage service load, replacements (repair + bootstrap) may take anywhere from a few hours to days. Proposing a Sidecar based solution to address these challenges. This solution proposes transferring data from the old host (source) to the new host (destination) and then bringing up the Cassandra process at the destination, to enable fast instance migration. This approach would help to minimise node downtime, as it is based on a Sidecar solution for data transfer and avoids repairs and bootstrap. Looking forward to the discussions. [1] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances Thanks! Hari
New episode of the Apache Cassandra(R) Corner!
s3e2 - Otavio Santana (You may have to download it to play) https://drive.google.com/file/d/1RZLP-mpq01LtYBbQPiYS0YbP4WKf0nKG/view?usp=drive_link It will remain in staging for 72 hours, going live (assuming no objections) by Thursday, April 11th. If anyone should have any questions or comments, or if you or someone you know wants to be a guest, please let me know! Thanks, everyone! Aaron
Re: [DISCUSS] Modeling JIRA fix version for subprojects
hi folks - sorry to have dropped the ball on responding to this thread. My 2 cents are as follows - 1. Having a separate JIRA project for each sub-project will add management overhead. This option, however, allows us to model unique workflows for the sub-project. 2. Managing the sub-project as part of the Cassandra JIRA project would imply less management overhead but the sub-project would need to conform to the same workflows. I would pick option 1 unless there is a strong reason and desire to manage a separate Jira project. We can always split out the Java Driver project if things don't work out. OTOH merging a Jira project is harder. Thanks, Dinesh On Thu, Apr 4, 2024 at 12:45 PM Abe Ratnofsky wrote: > CEP-8 proposes using separate Jira projects per Cassandra sub-project: > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation > > > We suggest distinct Jira projects, one per driver, all to be created. > > I don't see any discussion changing that from the [DISCUSS] or vote > threads: > https://lists.apache.org/thread/01pljcncyjyo467l5orh8nf9okrh7oxm > https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp > https://lists.apache.org/thread/crolkrhd4y6tt3k4hsy204xomshlcp4p > > But looks like upon acceptance that was changed: > https://lists.apache.org/thread/dhov01s8dvvh3882oxhkmmfv4tqdd68o > > > New issues will be tracked under the CASSANDRA project on Apache’s JIRA < > https://issues.apache.org/jira/projects/CASSANDRA> under the component > ‘Client/java-driver’. > > I'm in favor of using the same Jira as Cassandra proper. Committership is > project-wide, so having a standardized process (same ticket flow, review > rules, labels, etc. is beneficial). But multiple votes happened based on > the content of the CEP, so we should stick to what was voted on and move to > a separate Jira. > > -- > Abe >
Re: Is there appetite to maintain the gocql driver (in the drivers subproject) ?
If we take this on - are there any active contributors that can be raised as committers to maintain this project? On Wed, Apr 3, 2024 at 2:36 PM Nate McCall wrote: > We've talked through this before. Benjamin sussed out the main issue, > IIRC. > tl,dr: > - The AUTHORS lists everyone who ever made a commit ( > https://github.com/gocql/gocql/blob/master/AUTHORS) > - The license is BSD-3 and explicitly says the copyright is owned by the > authors (https://github.com/gocql/gocql/blob/master/LICENSE#L1) > - We had a previous discussion about 6 years ago: > https://www.mail-archive.com/dev@cassandra.apache.org/msg13008.html > > We can open an issue with LEGAL to see what they say at least? > > -N > > On Tue, Feb 6, 2024 at 10:25 AM Mick Semb Wever wrote: > >> >> The current sole maintainer of the gocql driver has stated the project is >> essentially in attic mode and is asking for new maintainers. >> >> https://groups.google.com/g/gocql/c/v0FruczBb2w >> >> No one has suggested the repo be donated to the ASF yet, but before >> anyone should raise any such suggestion we should check if we have folk in >> the project that would be willing to help out with such a donation. >> >