I feel like I should weigh in here. :) Running C* on Kubernetes is best coming from Kubernetes and adding C*. It's a mentality thing. "Why can't I log into my nodes??" There is a book exactly for this topic: https://www.oreilly.com/library/view/managing-cloud-native/9781098111380/ At Datastax, we run Astra exclusively on k8s and have 1000s of clusters working away.
I must admit, I never considered running Cassandra on ECS. EKS, for sure, but not ECS. If you have a procedure to use EC2 and it works, then transitioning to ECS seems like adding overhead to essentially the same thing. If you were making the move to k8s, then that's a complete overhaul of how you run infrastructure. ECS is not. Besides the process issues, the only other downside I can think of is resource limits. EC2 has much more leway on resource usage, but you may want to review the limits on ECS, especially on container numbers. Patrick On Thu, Jun 12, 2025 at 11:37 AM Yu, Raymond L. <raymond...@disney.com> wrote: > Hi Daemeon, > > > > Firstly, apologies to everyone in the thread as I’m not comparing EC2 > against K8s but specifically against Elastic Container Service backed by > EC2. > > > > Thanks for the response. I agree that running persisted databases on K8s > does not seem ideal. Although there has been a lot of work on K8ssandra, my > personal opinion is that it takes a lot of K8s knowledge to reap the > benefits and counter the added complexity. > > > > To point 3, I believe that one exists in AWS Keyspaces, but for us it > doesn’t make sense to go with that as we already possess the ability to run > our own clusters. > > > > To point 4, our team is actually trying to minimize complexity by sticking > with deploying Cassandra on EC2 and relying on standard > Ansible/Terraform/Python automation, as we don’t have a k8s footprint or > even an ECS one. However a customer team developed an ECS-based tool > (somewhat of an in-house K8ssandra operator mimic) on their own and would > like for us to use and support it. Our team is opposed to that idea, as > like you said, it does not seem to provide a performance improvement, adds > complexity, and would pose a challenge given that our team’s experience is > not in deploying Cassandra in ECS nor in supporting the in-house tooling > needed to do so. > > > > In general, given an existing Cassandra EC2 footprint and experience > deploying in that fashion, we’re looking for points for or against > deploying Cassandra on ECS if in-house tooling is also needed, as outside > opinions were desired. > > > > Best, > > Raymond Yu > > > > > > > > *From: *daemeon reiydelle <daeme...@gmail.com> > *Date: *Thursday, June 12, 2025 at 9:42 AM > *To: *user@cassandra.apache.org <user@cassandra.apache.org> > *Subject: *Re: Request for Thoughts on Deployments on AWS EC2 vs. ECS > > *This Message is From an External Sender* > Caution: Do not click links or open attachments unless you recognize the > sender and know the content is safe. > > K8S has some key use cases where it is ideal, and some use cases that are > more nuanced, and some that are anti-patterns. It is my opinion that > services like C*, Hadoop, Kafka, and persisted distributed databases are > certainly not ideal. I admit to a prejudice of having worked with K8S since > before it was released (in fact before GCP existed). However, as the > hyperscalers have moved to ethernet based storage presented to virtual > machines, many of the aspects which are antipatterns for k8s are the same > as antipatterns for even EC2's. > > > > My response: > > 1. containers/k8s are a mitigation for the higher cost of systems > admins managing physical (or pseudophysical EC2) devices > 2. That mitigation adds complexity. Persisted storage in clustered > machines, network overhead, etc. > 3. In the case of hadoop, k8s, and various other distributed key value > or object stores, some of the cloud vendors provide the stores as a > service. I am not aware that Cassandra as a service is on offer, am I > correct? > 4. Therefore, what are you trying to accomplish by moving from EC2 to > containers/k8s? Do you already have a substantial k8s footprint with > experienced k8s resources, especially with resources skilled in persisted > storage (e.g. PortWorx or similar)? > > > 1. To what extent is this an effort to have fun with a new and shiny > object vs. an actual, bona fide, need to resolve a problem (like complex > terraform scripts to spin up additional C* nodes or even net new > clusters) > that you think helm charts will fix? > > > *.* > > *Daemeon Reiydelle* > > *email: **daeme...@gmail.com* <daeme...@gmail.com> > > *San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle* > > > > If builders built buildings the way programmers wrote programs, then the > first woodpecker to come along would destroy civilization. > > > > > > On Thu, Jun 12, 2025 at 9:13 AM Yu, Raymond L. <raymond...@disney.com> > wrote: > > Thank you all for your thoughts. They are greatly appreciated! > > > > It seems that some of your thoughts echo our worries about there being > additional hidden nuances to implementing the same level of functionality > and reliability in ECS and even K8s. We agree that the K8ssandra operator > would be the most advantageous and desired aspect of switching to a > container-based solution, specifically if we went with K8s. > > > > Going by that logic, with the ECS solution we’re comparing against, we’d > essentially have to support an in-house operator of sorts that has involved > rewriting all the workflows necessary try to match a portion of the > functionality of the K8ssandra operator. There have been challenges as well > on the customer team’s side with that, but there’d 100% also be challenges > with our team supporting it if it were to be handed off to us. Since there > are still worries about maturity in the Cassandra community from the > K8ssandra operator, we have significantly more worries with using an > in-house one. > > > > *From: *Jon Haddad <j...@rustyrazorblade.com> > *Date: *Thursday, June 12, 2025 at 7:16 AM > *To: *user@cassandra.apache.org <user@cassandra.apache.org> > *Subject: *Re: Request for Thoughts on Deployments on AWS EC2 vs. ECS > > *This Message is From an External Sender* > Caution: Do not click links or open attachments unless you recognize the > sender and know the content is safe. > > I agree that managing Cassandra on Kubernetes can be challenging without > prior experience, as understanding all the nuances of Kubernetes takes time. > > > > However, there are ways to address the rescheduling issues, node > placement, and local disk concerns that were mentioned. You can pin pods to > specific hosts to avoid rescheduling on different nodes, and you can use > local disks or a combination of persistent disks with a local NVMe as a > cache. Host networking or (i think) Cillium can help with the networking > performance concerns. For most arguments against using Kubernetes, there's > usually a workaround or setting that can address the issue. > > > > The main advantage of Kubernetes is the operator. While it has some > quirks, it generally does a good job of managing your deployment, > eliminating the need to write all your workflows. Building on Kubernetes as > a standard offers the advantage of applying your knowledge across various > environments once you're familiar with it. > > > > I wouldn't recommend jumping into Kubernetes and Cassandra simultaneously. > Both are complex topics. I've worked with Cassandra for over a decade and > Kubernetes on and off for five years, and I still encounter challenges, > especially when my desired outcome differs from the operator's. > > > > Both versions are workable. Both have tradeoffs. For now, I'm also > sticking to baking AMIs [3], but with more experience on K8 and a little > more maturity from Cassandra, I'd think differently. For stateless apps, > I'm 100% on board with K8. > > > > Jon > > > > [1] > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations > > [2] https://lists.apache.org/thread/r0nhyyn6mbpy55fl90xqcj17v6w3wxg3 > > [3] https://github.com/rustyrazorblade/easy-cass-lab/tree/main/packer > > > > On Thu, Jun 12, 2025 at 6:17 AM Luciano Greiner <luciano.grei...@gmail.com> > wrote: > > Quick correction on my previous message — I assumed you were referring > to running Cassandra on Kubernetes, not purely ECS. > > Many of the same concerns still apply. ECS tasks can also be > rescheduled or moved between instances, which poses risks for > Cassandra’s rack awareness and replica distribution. Ensuring stable > node identity and local storage is still tricky. > > Cassandra works best when it's tightly coupled to its hardware — > ideally on dedicated VMs or bare metal — where you have full control > over topology and disk performance. > > Luciano Greiner > > On Thu, Jun 12, 2025 at 10:13 AM Luciano Greiner > <luciano.grei...@gmail.com> wrote: > > > > I usually advise against running Cassandra (or most databases) inside > > Kubernetes. It might look like it simplifies operations, but in my > > experience, it tends to introduce more complexity than it solves. > > > > With Cassandra specifically, Kubernetes may reschedule pods for > > reasons outside your control (e.g., node pressure, restarts, > > upgrades). This can lead to topology violations — for example, all > > replicas ending up in the same physical rack, defeating the purpose of > > proper rack and datacenter awareness. > > > > Another major issue is storage. Cassandra expects fast, local disks > > close to the compute layer. While Kubernetes StatefulSets can use > > PersistentVolumes, these are often network-attached and may not offer > > the performance or locality guarantees Cassandra needs. And if your > > pods get rescheduled, depending on your storage class and cloud > > provider, you may run into delays or errors reattaching volumes. > > > > Using an operator like K8ssandra doesn't necessarily eliminate these > > problems — it just adds another tool to manage within the puzzle. > > > > Luciano Greiner > > > > On Thu, Jun 12, 2025 at 6:20 AM Dor Laor via user > > <user@cassandra.apache.org> wrote: > > > > > > It's possible to manage Cassandra well both with VMs and containers. > > > As you'd be running one container per VM, there is no significant > advantage for > > > containers. K8s provides nice tooling and some methodological > enforcement which > > > brings order to the setup but if the team aren't top notch experts in > k8s, it's not worth > > > the trouble and the limitations that come with it (networking outside > the k8s cluster, etc). > > > It's good to have fewer layers. Most users run databases outside of > containers. > > > > > > On Wed, Jun 11, 2025 at 11:36 PM Raymond Yu <rayyu...@gmail.com> > wrote: > > >> > > >> Hi Cassandra community, > > >> > > >> I would like to ask for your expert opinions regarding a discussion > we're having about deploying Cassandra on AWS EC2 vs. AWS ECS. For context, > we have a small dedicated DB engineering team that is familiar with > operating and supporting Cassandra on EC2 for many customer teams. However, > one team has developed custom tooling for operating Cassandra on ECS > (EC2-backed) and would like for us to migrate to it for their Cassandra > needs, which has spawned this discussion (K8ssandra was considered, but > that team did not want to use Kubernetes). > > >> > > >> Further context on our team and experience: > > >> - Small dedicated team supporting Cassandra (and other DBs) > > >> - Familiar with operating EC2 on Cassandra > > >> - Familiar with standard IaC tools and languages > (Ansible/Terraform/Python/etc.) > > >> - Only deploy in AWS > > >> > > >> Discussed points regarding staying with EC2: > > >> - Existing team experience and automation in deploying Cassandra on > EC2 > > >> - Simpler solution is easier to support and maintain > > >> - Almost all documentation we can find and use is specific to > deploying on EC2 > > >> - Third party support is familiar with EC2 by default > > >> - Lower learning curve is lower for engineers to onboard > > >> - More hands-on maintenance regarding OS upgrades > > >> - Less modern solution > > >> > > >> Discussed points regarding using the new ECS solution: > > >> - Containers are the more modern solution > > >> - Node autoheal feature in addition to standard C* operations via a > control plane > > >> - Higher tool and architecture complexity that requires ramp-up in > order to use and support effectively > > >> - We're on our own for potential issues with the tool itself after it > would be handed off > > >> - No demonstrated performance gain over EC2-based clusters > > >> - Third-party support would be less familiar with dealing with ECS > issues > > >> - Deployed on EC2 under the hood (one container per VM), so the > underlying architecture is the same between both solutions > > >> > > >> Given that context, our team generally feels that there is little > marginal benefit given the cost of ramp up and supporting a custom tool, > but there has also been a request for harder evidence and outside opinions > on the topic. It has been hard to find documentation of this specific > comparison on EC2 vs ECS to reference. We'd love to hear your thoughts on > our context, but also are interested in any general recommendations for one > over the other. Thanks in advance! > > >> > > >> Best, > > >> Raymond Yu > >