I completely missed that this ECS != EKS.  My brain flipped a few bits
there.

I have never run C* on ECS and I can't imagine I ever would.  It seems like
the most bespoke and difficult way to run it.

+1 to everything Patrick said.

Jon

On Thu, Jun 12, 2025 at 12:46 PM Patrick McFadin <pmcfa...@gmail.com> wrote:

> I feel like I should weigh in here. :) Running C* on Kubernetes is best
> coming from Kubernetes and adding C*. It's a mentality thing. "Why can't I
> log into my nodes??" There is a book exactly for this topic:
> https://www.oreilly.com/library/view/managing-cloud-native/9781098111380/
> At Datastax, we run Astra exclusively on k8s and have 1000s of clusters
> working away.
>
> I must admit, I never considered running Cassandra on ECS. EKS, for sure,
> but not ECS. If you have a procedure to use EC2 and it works, then
> transitioning to ECS seems like adding overhead to essentially the same
> thing. If you were making the move to k8s, then that's a complete overhaul
> of how you run infrastructure. ECS is not.
>
> Besides the process issues, the only other downside I can think of is
> resource limits. EC2 has much more leway on resource usage, but you may
> want to review the limits on ECS, especially on container numbers.
>
> Patrick
>
> On Thu, Jun 12, 2025 at 11:37 AM Yu, Raymond L. <raymond...@disney.com>
> wrote:
>
>> Hi Daemeon,
>>
>>
>>
>> Firstly, apologies to everyone in the thread as I’m not comparing EC2
>> against K8s but specifically against Elastic Container Service backed by
>> EC2.
>>
>>
>>
>> Thanks for the response. I agree that running persisted databases on K8s
>> does not seem ideal. Although there has been a lot of work on K8ssandra, my
>> personal opinion is that it takes a lot of K8s knowledge to reap the
>> benefits and counter the added complexity.
>>
>>
>>
>> To point 3, I believe that one exists in AWS Keyspaces, but for us it
>> doesn’t make sense to go with that as we already possess the ability to run
>> our own clusters.
>>
>>
>>
>> To point 4, our team is actually trying to minimize complexity by
>> sticking with deploying Cassandra on EC2 and relying on standard
>> Ansible/Terraform/Python automation, as we don’t have a k8s footprint or
>> even an ECS one. However a customer team developed an ECS-based tool
>> (somewhat of an in-house K8ssandra operator mimic) on their own and would
>> like for us to use and support it. Our team is opposed to that idea, as
>> like you said, it does not seem to provide a performance improvement, adds
>> complexity, and would pose a challenge given that our team’s experience is
>> not in deploying Cassandra in ECS nor in supporting the in-house tooling
>> needed to do so.
>>
>>
>>
>> In general, given an existing Cassandra EC2 footprint and experience
>> deploying in that fashion, we’re looking for points for or against
>> deploying Cassandra on ECS if in-house tooling is also needed, as outside
>> opinions were desired.
>>
>>
>>
>> Best,
>>
>> Raymond Yu
>>
>>
>>
>>
>>
>>
>>
>> *From: *daemeon reiydelle <daeme...@gmail.com>
>> *Date: *Thursday, June 12, 2025 at 9:42 AM
>> *To: *user@cassandra.apache.org <user@cassandra.apache.org>
>> *Subject: *Re: Request for Thoughts on Deployments on AWS EC2 vs. ECS
>>
>> *This Message is From an External Sender*
>> Caution: Do not click links or open attachments unless you recognize the
>> sender and know the content is safe.
>>
>> K8S has some key use cases where it is ideal, and some use cases that are
>> more nuanced, and some that are anti-patterns. It is my opinion that
>> services like C*, Hadoop, Kafka, and persisted distributed databases are
>> certainly not ideal. I admit to a prejudice of having worked with K8S since
>> before it was released (in fact before GCP existed). However, as the
>> hyperscalers have moved to ethernet based storage presented to virtual
>> machines, many of the aspects which are antipatterns for k8s are the same
>> as antipatterns for even EC2's.
>>
>>
>>
>> My response:
>>
>>    1. containers/k8s are a mitigation for the higher cost of systems
>>    admins managing physical (or pseudophysical EC2) devices
>>    2. That mitigation adds complexity. Persisted storage in clustered
>>    machines, network overhead, etc.
>>    3. In the case of hadoop, k8s, and various other distributed key
>>    value or object stores, some of the cloud vendors provide the stores as a
>>    service. I am not aware that Cassandra as a service is on offer, am I
>>    correct?
>>    4. Therefore, what are you trying to accomplish by moving from EC2 to
>>    containers/k8s? Do you already have a substantial k8s footprint with
>>    experienced k8s resources, especially with resources skilled in persisted
>>    storage (e.g. PortWorx or similar)?
>>
>>
>>    1. To what extent is this an effort to have fun with a new and shiny
>>       object vs. an actual, bona fide, need to resolve a problem (like 
>> complex
>>       terraform scripts to spin up additional C* nodes or even net new 
>> clusters)
>>       that you think helm charts will fix?
>>
>>
>> *.*
>>
>> *Daemeon Reiydelle*
>>
>> *email: **daeme...@gmail.com* <daeme...@gmail.com>
>>
>> *San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*
>>
>>
>>
>> If builders built buildings the way programmers wrote programs, then the
>> first woodpecker to come along would destroy civilization.
>>
>>
>>
>>
>>
>> On Thu, Jun 12, 2025 at 9:13 AM Yu, Raymond L. <raymond...@disney.com>
>> wrote:
>>
>> Thank you all for your thoughts. They are greatly appreciated!
>>
>>
>>
>> It seems that some of your thoughts echo our worries about there being
>> additional hidden nuances to implementing the same level of functionality
>> and reliability in ECS and even K8s. We agree that the K8ssandra operator
>> would be the most advantageous and desired aspect of switching to a
>> container-based solution, specifically if we went with K8s.
>>
>>
>>
>> Going by that logic, with the ECS solution we’re comparing against, we’d
>> essentially have to support an in-house operator of sorts that has involved
>> rewriting all the workflows necessary try to match a portion of the
>> functionality of the K8ssandra operator. There have been challenges as well
>> on the customer team’s side with that, but there’d 100% also be challenges
>> with our team supporting it if it were to be handed off to us. Since there
>> are still worries about maturity in the Cassandra community from the
>> K8ssandra operator, we have significantly more worries with using an
>> in-house one.
>>
>>
>>
>> *From: *Jon Haddad <j...@rustyrazorblade.com>
>> *Date: *Thursday, June 12, 2025 at 7:16 AM
>> *To: *user@cassandra.apache.org <user@cassandra.apache.org>
>> *Subject: *Re: Request for Thoughts on Deployments on AWS EC2 vs. ECS
>>
>> *This Message is From an External Sender*
>> Caution: Do not click links or open attachments unless you recognize the
>> sender and know the content is safe.
>>
>> I agree that managing Cassandra on Kubernetes can be challenging without
>> prior experience, as understanding all the nuances of Kubernetes takes time.
>>
>>
>>
>> However, there are ways to address the rescheduling issues, node
>> placement, and local disk concerns that were mentioned. You can pin pods to
>> specific hosts to avoid rescheduling on different nodes, and you can use
>> local disks or a combination of persistent disks with a local NVMe as a
>> cache. Host networking or (i think) Cillium can help with the networking
>> performance concerns. For most arguments against using Kubernetes, there's
>> usually a workaround or setting that can address the issue.
>>
>>
>>
>> The main advantage of Kubernetes is the operator. While it has some
>> quirks, it generally does a good job of managing your deployment,
>> eliminating the need to write all your workflows. Building on Kubernetes as
>> a standard offers the advantage of applying your knowledge across various
>> environments once you're familiar with it.
>>
>>
>>
>> I wouldn't recommend jumping into Kubernetes and Cassandra
>> simultaneously. Both are complex topics. I've worked with Cassandra for
>> over a decade and Kubernetes on and off for five years, and I still
>> encounter challenges, especially when my desired outcome differs from the
>> operator's.
>>
>>
>>
>> Both versions are workable.  Both have tradeoffs.  For now, I'm also
>> sticking to baking AMIs [3], but with more experience on K8 and a little
>> more maturity from Cassandra, I'd think differently.  For stateless apps,
>> I'm 100% on board with K8.
>>
>>
>>
>> Jon
>>
>>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
>>
>> [2] https://lists.apache.org/thread/r0nhyyn6mbpy55fl90xqcj17v6w3wxg3
>>
>> [3] https://github.com/rustyrazorblade/easy-cass-lab/tree/main/packer
>>
>>
>>
>> On Thu, Jun 12, 2025 at 6:17 AM Luciano Greiner <
>> luciano.grei...@gmail.com> wrote:
>>
>> Quick correction on my previous message — I assumed you were referring
>> to running Cassandra on Kubernetes, not purely ECS.
>>
>> Many of the same concerns still apply. ECS tasks can also be
>> rescheduled or moved between instances, which poses risks for
>> Cassandra’s rack awareness and replica distribution. Ensuring stable
>> node identity and local storage is still tricky.
>>
>> Cassandra works best when it's tightly coupled to its hardware —
>> ideally on dedicated VMs or bare metal — where you have full control
>> over topology and disk performance.
>>
>> Luciano Greiner
>>
>> On Thu, Jun 12, 2025 at 10:13 AM Luciano Greiner
>> <luciano.grei...@gmail.com> wrote:
>> >
>> > I usually advise against running Cassandra (or most databases) inside
>> > Kubernetes. It might look like it simplifies operations, but in my
>> > experience, it tends to introduce more complexity than it solves.
>> >
>> > With Cassandra specifically, Kubernetes may reschedule pods for
>> > reasons outside your control (e.g., node pressure, restarts,
>> > upgrades). This can lead to topology violations — for example, all
>> > replicas ending up in the same physical rack, defeating the purpose of
>> > proper rack and datacenter awareness.
>> >
>> > Another major issue is storage. Cassandra expects fast, local disks
>> > close to the compute layer. While Kubernetes StatefulSets can use
>> > PersistentVolumes, these are often network-attached and may not offer
>> > the performance or locality guarantees Cassandra needs. And if your
>> > pods get rescheduled, depending on your storage class and cloud
>> > provider, you may run into delays or errors reattaching volumes.
>> >
>> > Using an operator like K8ssandra doesn't necessarily eliminate these
>> > problems — it just adds another tool to manage within the puzzle.
>> >
>> > Luciano Greiner
>> >
>> > On Thu, Jun 12, 2025 at 6:20 AM Dor Laor via user
>> > <user@cassandra.apache.org> wrote:
>> > >
>> > > It's possible to manage Cassandra well both with VMs and containers.
>> > > As you'd be running one container per VM, there is no significant
>> advantage for
>> > > containers. K8s provides nice tooling and some methodological
>> enforcement which
>> > > brings order to the setup but if the team aren't top notch experts in
>> k8s, it's not worth
>> > > the trouble and the limitations that come with it (networking outside
>> the k8s cluster, etc).
>> > > It's good to have fewer layers. Most users run databases outside of
>> containers.
>> > >
>> > > On Wed, Jun 11, 2025 at 11:36 PM Raymond Yu <rayyu...@gmail.com>
>> wrote:
>> > >>
>> > >> Hi Cassandra community,
>> > >>
>> > >> I would like to ask for your expert opinions regarding a discussion
>> we're having about deploying Cassandra on AWS EC2 vs. AWS ECS. For context,
>> we have a small dedicated DB engineering team that is familiar with
>> operating and supporting Cassandra on EC2 for many customer teams. However,
>> one team has developed custom tooling for operating Cassandra on ECS
>> (EC2-backed) and would like for us to migrate to it for their Cassandra
>> needs, which has spawned this discussion (K8ssandra was considered, but
>> that team did not want to use Kubernetes).
>> > >>
>> > >> Further context on our team and experience:
>> > >> - Small dedicated team supporting Cassandra (and other DBs)
>> > >> - Familiar with operating EC2 on Cassandra
>> > >> - Familiar with standard IaC tools and languages
>> (Ansible/Terraform/Python/etc.)
>> > >> - Only deploy in AWS
>> > >>
>> > >> Discussed points regarding staying with EC2:
>> > >> - Existing team experience and automation in deploying Cassandra on
>> EC2
>> > >> - Simpler solution is easier to support and maintain
>> > >> - Almost all documentation we can find and use is specific to
>> deploying on EC2
>> > >> - Third party support is familiar with EC2 by default
>> > >> - Lower learning curve is lower for engineers to onboard
>> > >> - More hands-on maintenance regarding OS upgrades
>> > >> - Less modern solution
>> > >>
>> > >> Discussed points regarding using the new ECS solution:
>> > >> - Containers are the more modern solution
>> > >> - Node autoheal feature in addition to standard C* operations via a
>> control plane
>> > >> - Higher tool and architecture complexity that requires ramp-up in
>> order to use and support effectively
>> > >> - We're on our own for potential issues with the tool itself after
>> it would be handed off
>> > >> - No demonstrated performance gain over EC2-based clusters
>> > >> - Third-party support would be less familiar with dealing with ECS
>> issues
>> > >> - Deployed on EC2 under the hood (one container per VM), so the
>> underlying architecture is the same between both solutions
>> > >>
>> > >> Given that context, our team generally feels that there is little
>> marginal benefit given the cost of ramp up and supporting a custom tool,
>> but there has also been a request for harder evidence and outside opinions
>> on the topic. It has been hard to find documentation of this specific
>> comparison on EC2 vs ECS to reference. We'd love to hear your thoughts on
>> our context, but also are interested in any general recommendations for one
>> over the other. Thanks in advance!
>> > >>
>> > >> Best,
>> > >> Raymond Yu
>>
>>

Reply via email to