As a follow-up, do you have any thoughts on running Cassandra on ECS (Elastic Container Service) itself? We haven’t seen any examples or recommendations for or against it out there, but you’ve seen lots of different deployments in the wild.
From: Jon Haddad <j...@rustyrazorblade.com> Date: Thursday, June 12, 2025 at 7:16 AM To: user@cassandra.apache.org <user@cassandra.apache.org> Subject: Re: Request for Thoughts on Deployments on AWS EC2 vs. ECS This Message is From an External Sender Caution: Do not click links or open attachments unless you recognize the sender and know the content is safe. I agree that managing Cassandra on Kubernetes can be challenging without prior experience, as understanding all the nuances of Kubernetes takes time. However, there are ways to address the rescheduling issues, node placement, and local disk concerns that were mentioned. You can pin pods to specific hosts to avoid rescheduling on different nodes, and you can use local disks or a combination of persistent disks with a local NVMe as a cache. Host networking or (i think) Cillium can help with the networking performance concerns. For most arguments against using Kubernetes, there's usually a workaround or setting that can address the issue. The main advantage of Kubernetes is the operator. While it has some quirks, it generally does a good job of managing your deployment, eliminating the need to write all your workflows. Building on Kubernetes as a standard offers the advantage of applying your knowledge across various environments once you're familiar with it. I wouldn't recommend jumping into Kubernetes and Cassandra simultaneously. Both are complex topics. I've worked with Cassandra for over a decade and Kubernetes on and off for five years, and I still encounter challenges, especially when my desired outcome differs from the operator's. Both versions are workable. Both have tradeoffs. For now, I'm also sticking to baking AMIs [3], but with more experience on K8 and a little more maturity from Cassandra, I'd think differently. For stateless apps, I'm 100% on board with K8. Jon [1] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations [2] https://lists.apache.org/thread/r0nhyyn6mbpy55fl90xqcj17v6w3wxg3 [3] https://github.com/rustyrazorblade/easy-cass-lab/tree/main/packer On Thu, Jun 12, 2025 at 6:17 AM Luciano Greiner <luciano.grei...@gmail.com<mailto:luciano.grei...@gmail.com>> wrote: Quick correction on my previous message — I assumed you were referring to running Cassandra on Kubernetes, not purely ECS. Many of the same concerns still apply. ECS tasks can also be rescheduled or moved between instances, which poses risks for Cassandra’s rack awareness and replica distribution. Ensuring stable node identity and local storage is still tricky. Cassandra works best when it's tightly coupled to its hardware — ideally on dedicated VMs or bare metal — where you have full control over topology and disk performance. Luciano Greiner On Thu, Jun 12, 2025 at 10:13 AM Luciano Greiner <luciano.grei...@gmail.com<mailto:luciano.grei...@gmail.com>> wrote: > > I usually advise against running Cassandra (or most databases) inside > Kubernetes. It might look like it simplifies operations, but in my > experience, it tends to introduce more complexity than it solves. > > With Cassandra specifically, Kubernetes may reschedule pods for > reasons outside your control (e.g., node pressure, restarts, > upgrades). This can lead to topology violations — for example, all > replicas ending up in the same physical rack, defeating the purpose of > proper rack and datacenter awareness. > > Another major issue is storage. Cassandra expects fast, local disks > close to the compute layer. While Kubernetes StatefulSets can use > PersistentVolumes, these are often network-attached and may not offer > the performance or locality guarantees Cassandra needs. And if your > pods get rescheduled, depending on your storage class and cloud > provider, you may run into delays or errors reattaching volumes. > > Using an operator like K8ssandra doesn't necessarily eliminate these > problems — it just adds another tool to manage within the puzzle. > > Luciano Greiner > > On Thu, Jun 12, 2025 at 6:20 AM Dor Laor via user > <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> wrote: > > > > It's possible to manage Cassandra well both with VMs and containers. > > As you'd be running one container per VM, there is no significant advantage > > for > > containers. K8s provides nice tooling and some methodological enforcement > > which > > brings order to the setup but if the team aren't top notch experts in k8s, > > it's not worth > > the trouble and the limitations that come with it (networking outside the > > k8s cluster, etc). > > It's good to have fewer layers. Most users run databases outside of > > containers. > > > > On Wed, Jun 11, 2025 at 11:36 PM Raymond Yu > > <rayyu...@gmail.com<mailto:rayyu...@gmail.com>> wrote: > >> > >> Hi Cassandra community, > >> > >> I would like to ask for your expert opinions regarding a discussion we're > >> having about deploying Cassandra on AWS EC2 vs. AWS ECS. For context, we > >> have a small dedicated DB engineering team that is familiar with operating > >> and supporting Cassandra on EC2 for many customer teams. However, one team > >> has developed custom tooling for operating Cassandra on ECS (EC2-backed) > >> and would like for us to migrate to it for their Cassandra needs, which > >> has spawned this discussion (K8ssandra was considered, but that team did > >> not want to use Kubernetes). > >> > >> Further context on our team and experience: > >> - Small dedicated team supporting Cassandra (and other DBs) > >> - Familiar with operating EC2 on Cassandra > >> - Familiar with standard IaC tools and languages > >> (Ansible/Terraform/Python/etc.) > >> - Only deploy in AWS > >> > >> Discussed points regarding staying with EC2: > >> - Existing team experience and automation in deploying Cassandra on EC2 > >> - Simpler solution is easier to support and maintain > >> - Almost all documentation we can find and use is specific to deploying on > >> EC2 > >> - Third party support is familiar with EC2 by default > >> - Lower learning curve is lower for engineers to onboard > >> - More hands-on maintenance regarding OS upgrades > >> - Less modern solution > >> > >> Discussed points regarding using the new ECS solution: > >> - Containers are the more modern solution > >> - Node autoheal feature in addition to standard C* operations via a > >> control plane > >> - Higher tool and architecture complexity that requires ramp-up in order > >> to use and support effectively > >> - We're on our own for potential issues with the tool itself after it > >> would be handed off > >> - No demonstrated performance gain over EC2-based clusters > >> - Third-party support would be less familiar with dealing with ECS issues > >> - Deployed on EC2 under the hood (one container per VM), so the underlying > >> architecture is the same between both solutions > >> > >> Given that context, our team generally feels that there is little marginal > >> benefit given the cost of ramp up and supporting a custom tool, but there > >> has also been a request for harder evidence and outside opinions on the > >> topic. It has been hard to find documentation of this specific comparison > >> on EC2 vs ECS to reference. We'd love to hear your thoughts on our > >> context, but also are interested in any general recommendations for one > >> over the other. Thanks in advance! > >> > >> Best, > >> Raymond Yu