In my experience it usually makes more sense to evacuate out of a degraded DC at the application layer rather than having Cassandra silently fail over to a remote quorum. When a DC is in a bad state, it’s rarely “just a couple of Cassandra replicas are down.” The same underlying problem (network partitions, bad routing, DNS issues, load balancer problems, AZ failures, power incidents, I could go on...) What I can see is that REMOTE_QUORUM will behave like a dynamic reconfiguration of the replica group for a given request. That’s fundamentally different from today’s NTS contract, where the replica set per DC is static. That would open up all sorts of new ops headaches. - What happens replicas in the degraded DC? Would hints get stored? - I'm not sure if this would work with PAXOS since dynamic reconfiguration isn't supported. Two leaders possibly? - Which now brings up Accord and TCM considerations...
So my strong bias is, once a DC is degraded enough that Cassandra can’t achieve LOCAL_QUORUM, that DC is probably not a place you want to keep serving user traffic from. At that point, I’d rather have the application tier evacuate and talk to a healthy DC, instead of the database internally changing its consistency behavior under the covers. It's just a ton of work for a problem with a simpiler solution. Patrick On Mon, Dec 1, 2025 at 7:07 PM Jaydeep Chovatia <[email protected]> wrote: > >For the record, there is whole section about this here (sorry for pasting > the same link twice in my original mail) > > Thanks, Stefan, for sharing the details about the client-side failover > mechanism. This technique is definitely useful when the entire region is > down. > > This proposal targets the use case in which only a few nodes in the region > fail, but not all. For use cases that prioritize availability over latency, > they can avail this option, and the server will automatically fulfill those > requests from the remote region. > > Jaydeep > > On Mon, Dec 1, 2025 at 2:58 PM Štefan Miklošovič <[email protected]> > wrote: > >> For the record, there is whole section about this here (sorry for pasting >> the same link twice in my original mail) >> >> >> https://docs.datastax.com/en/developer/java-driver/4.17/manual/core/load_balancing/index.html#cross-datacenter-failover >> >> Interesting to see your perspective, I can see how doing something >> without application changes might seem appealing. >> >> I wonder what others think about this. >> >> Regards >> >> On Mon, Dec 1, 2025 at 11:39 PM Qc L <[email protected]> wrote: >> >>> Hi Stefan, >>> >>> Thanks for raising the question about using a custom >>> LoadBalancingPolicy! Remote quorum and LB-based failover aim to solve >>> similar problems, maintaining availability during datacenter degradation, >>> so I did a comparison study on the server-side remote quorum solution >>> versus relying on driver-side logic. >>> >>> >>> - For our use cases, we found that client-side failover is expensive >>> to operate and leads to fragmented behavior during incidents. We also >>> prefer failover to be triggered only when needed, not automatically. A >>> server-side mechanism gives us controlled, predictable behavior and makes >>> the system easier to operate. >>> - Remote quorum is implemented on the server, where the coordinator >>> can intentionally form a quorum using replicas in a backup region when >>> the >>> local region cannot satisfy LOCAL_QUORUM or EACH_QUORUM. The >>> database determines the fallback region, how replicas are selected, and >>> ensures consistency guarantees still hold. >>> >>> By keeping this logic internal to the server, we provide a unified, >>> consistent behavior across all clients without requiring any application >>> changes. >>> >>> Happy to discuss further if helpful. >>> >>> On Mon, Dec 1, 2025 at 12:57 PM Štefan Miklošovič < >>> [email protected]> wrote: >>> >>>> Hello, >>>> >>>> before going deeper into your proposal ... I just have to ask, have you >>>> tried e.g. custom LoadBalancingPolicy in driver, if you use that? >>>> >>>> I can imagine that if the driver detects a node to be down then based >>>> on its "distance" / dc it belongs to you might start to create a different >>>> query plan which talks to remote DC and similar. >>>> >>>> >>>> https://docs.datastax.com/en/drivers/java/4.2/com/datastax/oss/driver/api/core/loadbalancing/LoadBalancingPolicy.html >>>> >>>> >>>> https://docs.datastax.com/en/drivers/java/4.2/com/datastax/oss/driver/api/core/loadbalancing/LoadBalancingPolicy.html >>>> >>>> On Mon, Dec 1, 2025 at 8:54 PM Qc L <[email protected]> wrote: >>>> >>>>> Hello team, >>>>> >>>>> I’d like to propose adding the remote quorum to Cassandra consistency >>>>> level for handling simultaneous hosts unavailability in the local data >>>>> center that cannot achieve the required quorum. >>>>> >>>>> Background >>>>> NetworkTopologyStrategy is the most commonly used strategy at Uber, >>>>> and we use Local_Quorum for read/write in many use cases. Our Cassandra >>>>> deployment in each data center currently relies on majority replicas being >>>>> healthy to consistently achieve local quorum. >>>>> >>>>> Current behavior >>>>> When a local data center in a Cassandra deployment experiences >>>>> outages, network isolation, or maintenance events, the EACH_QUORUM / >>>>> LOCAL_QUORUM consistency level will fail for both reads and writes if >>>>> enough replicas in that the wlocal data center are unavailable. In this >>>>> configuration, simultaneous hosts unavailability can temporarily prevent >>>>> the cluster from reaching the required quorum for reads and writes. For >>>>> applications that require high availability and a seamless user >>>>> experience, >>>>> this can lead to service downtime and a noticeable drop in overall >>>>> availability. see attached figure1.[image: figure1.png] >>>>> >>>>> Proposed Solution >>>>> To prevent this issue and ensure a seamless user experience, we can >>>>> use the Remote Quorum consistency level as a fallback mechanism in >>>>> scenarios where local replicas are unavailable. Remote Quorum in Cassandra >>>>> refers to a read or write operation that achieves quorum (a majority of >>>>> replicas) in the remote data center, rather than relying solely on >>>>> replicas >>>>> within the local data center. see attached Figure2. >>>>> >>>>> [image: figure2.png] >>>>> >>>>> We will add a feature to do read/write consistency level override on >>>>> the server side. When local replicas are not available, we will overwrite >>>>> the server side write consistency level from each quorum to remote quorum. >>>>> Note that, implementing this change in client side will require some >>>>> protocol changes in CQL, we only add this on server side which can only be >>>>> used by server internal. >>>>> >>>>> For example, giving the following Cassandra setup >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *CREATE KEYSPACE ks WITH REPLICATION = { 'class': >>>>> 'NetworkTopologyStrategy', 'cluster1': 3, 'cluster2': 3, 'cluster3': >>>>> 3};* >>>>> >>>>> The selected approach for this design is to explicitly configure a >>>>> backup data center mapping for the local data center, where each data >>>>> center defines its preferred failover target. For example >>>>> *remote_quorum_target_data_center:* >>>>> >>>>> >>>>> * cluster1: cluster2 cluster2: cluster3 cluster3: cluster1* >>>>> >>>>> Implementations >>>>> We proposed the following feature to Cassandra to address data center >>>>> failure scenarios >>>>> >>>>> 1. Introduce a new Consistency level called remote quorum. >>>>> 2. Feature to do read/write consistency level override on server >>>>> side. (This can be controlled by a feature flag). Use Node tools >>>>> command to >>>>> turn on/off the server failback >>>>> >>>>> Why remote quorum is useful >>>>> As shown in the figure 2, we have the data center failure in one of >>>>> the data centers, Local quorum and each quorum will fail since two >>>>> replicas >>>>> are unavailable and cannot meet the quorum requirement in the local data >>>>> center. >>>>> >>>>> During the incident, we can use the nodetool to enable failover to >>>>> remote quorum. With the failover enabled, we can failover to the remote >>>>> data center for read and write, which avoids the available drop. >>>>> >>>>> Any thoughts or concerns? >>>>> Thanks in advance for your feedback, >>>>> >>>>> Qiaochu >>>>> >>>>>
