I completely missed that this ECS != EKS. My brain flipped a few bits there.
I have never run C* on ECS and I can't imagine I ever would. It seems like the most bespoke and difficult way to run it. +1 to everything Patrick said. Jon On Thu, Jun 12, 2025 at 12:46 PM Patrick McFadin <pmcfa...@gmail.com> wrote: > I feel like I should weigh in here. :) Running C* on Kubernetes is best > coming from Kubernetes and adding C*. It's a mentality thing. "Why can't I > log into my nodes??" There is a book exactly for this topic: > https://www.oreilly.com/library/view/managing-cloud-native/9781098111380/ > At Datastax, we run Astra exclusively on k8s and have 1000s of clusters > working away. > > I must admit, I never considered running Cassandra on ECS. EKS, for sure, > but not ECS. If you have a procedure to use EC2 and it works, then > transitioning to ECS seems like adding overhead to essentially the same > thing. If you were making the move to k8s, then that's a complete overhaul > of how you run infrastructure. ECS is not. > > Besides the process issues, the only other downside I can think of is > resource limits. EC2 has much more leway on resource usage, but you may > want to review the limits on ECS, especially on container numbers. > > Patrick > > On Thu, Jun 12, 2025 at 11:37 AM Yu, Raymond L. <raymond...@disney.com> > wrote: > >> Hi Daemeon, >> >> >> >> Firstly, apologies to everyone in the thread as I’m not comparing EC2 >> against K8s but specifically against Elastic Container Service backed by >> EC2. >> >> >> >> Thanks for the response. I agree that running persisted databases on K8s >> does not seem ideal. Although there has been a lot of work on K8ssandra, my >> personal opinion is that it takes a lot of K8s knowledge to reap the >> benefits and counter the added complexity. >> >> >> >> To point 3, I believe that one exists in AWS Keyspaces, but for us it >> doesn’t make sense to go with that as we already possess the ability to run >> our own clusters. >> >> >> >> To point 4, our team is actually trying to minimize complexity by >> sticking with deploying Cassandra on EC2 and relying on standard >> Ansible/Terraform/Python automation, as we don’t have a k8s footprint or >> even an ECS one. However a customer team developed an ECS-based tool >> (somewhat of an in-house K8ssandra operator mimic) on their own and would >> like for us to use and support it. Our team is opposed to that idea, as >> like you said, it does not seem to provide a performance improvement, adds >> complexity, and would pose a challenge given that our team’s experience is >> not in deploying Cassandra in ECS nor in supporting the in-house tooling >> needed to do so. >> >> >> >> In general, given an existing Cassandra EC2 footprint and experience >> deploying in that fashion, we’re looking for points for or against >> deploying Cassandra on ECS if in-house tooling is also needed, as outside >> opinions were desired. >> >> >> >> Best, >> >> Raymond Yu >> >> >> >> >> >> >> >> *From: *daemeon reiydelle <daeme...@gmail.com> >> *Date: *Thursday, June 12, 2025 at 9:42 AM >> *To: *user@cassandra.apache.org <user@cassandra.apache.org> >> *Subject: *Re: Request for Thoughts on Deployments on AWS EC2 vs. ECS >> >> *This Message is From an External Sender* >> Caution: Do not click links or open attachments unless you recognize the >> sender and know the content is safe. >> >> K8S has some key use cases where it is ideal, and some use cases that are >> more nuanced, and some that are anti-patterns. It is my opinion that >> services like C*, Hadoop, Kafka, and persisted distributed databases are >> certainly not ideal. I admit to a prejudice of having worked with K8S since >> before it was released (in fact before GCP existed). However, as the >> hyperscalers have moved to ethernet based storage presented to virtual >> machines, many of the aspects which are antipatterns for k8s are the same >> as antipatterns for even EC2's. >> >> >> >> My response: >> >> 1. containers/k8s are a mitigation for the higher cost of systems >> admins managing physical (or pseudophysical EC2) devices >> 2. That mitigation adds complexity. Persisted storage in clustered >> machines, network overhead, etc. >> 3. In the case of hadoop, k8s, and various other distributed key >> value or object stores, some of the cloud vendors provide the stores as a >> service. I am not aware that Cassandra as a service is on offer, am I >> correct? >> 4. Therefore, what are you trying to accomplish by moving from EC2 to >> containers/k8s? Do you already have a substantial k8s footprint with >> experienced k8s resources, especially with resources skilled in persisted >> storage (e.g. PortWorx or similar)? >> >> >> 1. To what extent is this an effort to have fun with a new and shiny >> object vs. an actual, bona fide, need to resolve a problem (like >> complex >> terraform scripts to spin up additional C* nodes or even net new >> clusters) >> that you think helm charts will fix? >> >> >> *.* >> >> *Daemeon Reiydelle* >> >> *email: **daeme...@gmail.com* <daeme...@gmail.com> >> >> *San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle* >> >> >> >> If builders built buildings the way programmers wrote programs, then the >> first woodpecker to come along would destroy civilization. >> >> >> >> >> >> On Thu, Jun 12, 2025 at 9:13 AM Yu, Raymond L. <raymond...@disney.com> >> wrote: >> >> Thank you all for your thoughts. They are greatly appreciated! >> >> >> >> It seems that some of your thoughts echo our worries about there being >> additional hidden nuances to implementing the same level of functionality >> and reliability in ECS and even K8s. We agree that the K8ssandra operator >> would be the most advantageous and desired aspect of switching to a >> container-based solution, specifically if we went with K8s. >> >> >> >> Going by that logic, with the ECS solution we’re comparing against, we’d >> essentially have to support an in-house operator of sorts that has involved >> rewriting all the workflows necessary try to match a portion of the >> functionality of the K8ssandra operator. There have been challenges as well >> on the customer team’s side with that, but there’d 100% also be challenges >> with our team supporting it if it were to be handed off to us. Since there >> are still worries about maturity in the Cassandra community from the >> K8ssandra operator, we have significantly more worries with using an >> in-house one. >> >> >> >> *From: *Jon Haddad <j...@rustyrazorblade.com> >> *Date: *Thursday, June 12, 2025 at 7:16 AM >> *To: *user@cassandra.apache.org <user@cassandra.apache.org> >> *Subject: *Re: Request for Thoughts on Deployments on AWS EC2 vs. ECS >> >> *This Message is From an External Sender* >> Caution: Do not click links or open attachments unless you recognize the >> sender and know the content is safe. >> >> I agree that managing Cassandra on Kubernetes can be challenging without >> prior experience, as understanding all the nuances of Kubernetes takes time. >> >> >> >> However, there are ways to address the rescheduling issues, node >> placement, and local disk concerns that were mentioned. You can pin pods to >> specific hosts to avoid rescheduling on different nodes, and you can use >> local disks or a combination of persistent disks with a local NVMe as a >> cache. Host networking or (i think) Cillium can help with the networking >> performance concerns. For most arguments against using Kubernetes, there's >> usually a workaround or setting that can address the issue. >> >> >> >> The main advantage of Kubernetes is the operator. While it has some >> quirks, it generally does a good job of managing your deployment, >> eliminating the need to write all your workflows. Building on Kubernetes as >> a standard offers the advantage of applying your knowledge across various >> environments once you're familiar with it. >> >> >> >> I wouldn't recommend jumping into Kubernetes and Cassandra >> simultaneously. Both are complex topics. I've worked with Cassandra for >> over a decade and Kubernetes on and off for five years, and I still >> encounter challenges, especially when my desired outcome differs from the >> operator's. >> >> >> >> Both versions are workable. Both have tradeoffs. For now, I'm also >> sticking to baking AMIs [3], but with more experience on K8 and a little >> more maturity from Cassandra, I'd think differently. For stateless apps, >> I'm 100% on board with K8. >> >> >> >> Jon >> >> >> >> [1] >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations >> >> [2] https://lists.apache.org/thread/r0nhyyn6mbpy55fl90xqcj17v6w3wxg3 >> >> [3] https://github.com/rustyrazorblade/easy-cass-lab/tree/main/packer >> >> >> >> On Thu, Jun 12, 2025 at 6:17 AM Luciano Greiner < >> luciano.grei...@gmail.com> wrote: >> >> Quick correction on my previous message — I assumed you were referring >> to running Cassandra on Kubernetes, not purely ECS. >> >> Many of the same concerns still apply. ECS tasks can also be >> rescheduled or moved between instances, which poses risks for >> Cassandra’s rack awareness and replica distribution. Ensuring stable >> node identity and local storage is still tricky. >> >> Cassandra works best when it's tightly coupled to its hardware — >> ideally on dedicated VMs or bare metal — where you have full control >> over topology and disk performance. >> >> Luciano Greiner >> >> On Thu, Jun 12, 2025 at 10:13 AM Luciano Greiner >> <luciano.grei...@gmail.com> wrote: >> > >> > I usually advise against running Cassandra (or most databases) inside >> > Kubernetes. It might look like it simplifies operations, but in my >> > experience, it tends to introduce more complexity than it solves. >> > >> > With Cassandra specifically, Kubernetes may reschedule pods for >> > reasons outside your control (e.g., node pressure, restarts, >> > upgrades). This can lead to topology violations — for example, all >> > replicas ending up in the same physical rack, defeating the purpose of >> > proper rack and datacenter awareness. >> > >> > Another major issue is storage. Cassandra expects fast, local disks >> > close to the compute layer. While Kubernetes StatefulSets can use >> > PersistentVolumes, these are often network-attached and may not offer >> > the performance or locality guarantees Cassandra needs. And if your >> > pods get rescheduled, depending on your storage class and cloud >> > provider, you may run into delays or errors reattaching volumes. >> > >> > Using an operator like K8ssandra doesn't necessarily eliminate these >> > problems — it just adds another tool to manage within the puzzle. >> > >> > Luciano Greiner >> > >> > On Thu, Jun 12, 2025 at 6:20 AM Dor Laor via user >> > <user@cassandra.apache.org> wrote: >> > > >> > > It's possible to manage Cassandra well both with VMs and containers. >> > > As you'd be running one container per VM, there is no significant >> advantage for >> > > containers. K8s provides nice tooling and some methodological >> enforcement which >> > > brings order to the setup but if the team aren't top notch experts in >> k8s, it's not worth >> > > the trouble and the limitations that come with it (networking outside >> the k8s cluster, etc). >> > > It's good to have fewer layers. Most users run databases outside of >> containers. >> > > >> > > On Wed, Jun 11, 2025 at 11:36 PM Raymond Yu <rayyu...@gmail.com> >> wrote: >> > >> >> > >> Hi Cassandra community, >> > >> >> > >> I would like to ask for your expert opinions regarding a discussion >> we're having about deploying Cassandra on AWS EC2 vs. AWS ECS. For context, >> we have a small dedicated DB engineering team that is familiar with >> operating and supporting Cassandra on EC2 for many customer teams. However, >> one team has developed custom tooling for operating Cassandra on ECS >> (EC2-backed) and would like for us to migrate to it for their Cassandra >> needs, which has spawned this discussion (K8ssandra was considered, but >> that team did not want to use Kubernetes). >> > >> >> > >> Further context on our team and experience: >> > >> - Small dedicated team supporting Cassandra (and other DBs) >> > >> - Familiar with operating EC2 on Cassandra >> > >> - Familiar with standard IaC tools and languages >> (Ansible/Terraform/Python/etc.) >> > >> - Only deploy in AWS >> > >> >> > >> Discussed points regarding staying with EC2: >> > >> - Existing team experience and automation in deploying Cassandra on >> EC2 >> > >> - Simpler solution is easier to support and maintain >> > >> - Almost all documentation we can find and use is specific to >> deploying on EC2 >> > >> - Third party support is familiar with EC2 by default >> > >> - Lower learning curve is lower for engineers to onboard >> > >> - More hands-on maintenance regarding OS upgrades >> > >> - Less modern solution >> > >> >> > >> Discussed points regarding using the new ECS solution: >> > >> - Containers are the more modern solution >> > >> - Node autoheal feature in addition to standard C* operations via a >> control plane >> > >> - Higher tool and architecture complexity that requires ramp-up in >> order to use and support effectively >> > >> - We're on our own for potential issues with the tool itself after >> it would be handed off >> > >> - No demonstrated performance gain over EC2-based clusters >> > >> - Third-party support would be less familiar with dealing with ECS >> issues >> > >> - Deployed on EC2 under the hood (one container per VM), so the >> underlying architecture is the same between both solutions >> > >> >> > >> Given that context, our team generally feels that there is little >> marginal benefit given the cost of ramp up and supporting a custom tool, >> but there has also been a request for harder evidence and outside opinions >> on the topic. It has been hard to find documentation of this specific >> comparison on EC2 vs ECS to reference. We'd love to hear your thoughts on >> our context, but also are interested in any general recommendations for one >> over the other. Thanks in advance! >> > >> >> > >> Best, >> > >> Raymond Yu >> >>