Hello Jon, Thank you for the insights, I feel failing back to LOCAL_ONE is helpful for local failures in some scenarios.
REMOTE_QUORUM offers stronger durability guarantees under failure conditions by ensuring that each acknowledged write is persisted on majority replicas, e.g for 3 replica data centers at least two replicas are acked. This avoids single-replica acknowledgements and significantly reduces the risk of cascading failures. On Tue, Dec 2, 2025 at 9:34 AM Jon Haddad <[email protected]> wrote: > > This proposal targets the use case in which only a few nodes in the > region fail, but not all. > > If you're able to handle inconsistent results, why not use LOCAL_ONE? If > you have a preference for consistency but can tolerate some inconsistency > sometimes, DowngradingConsistencyRetryPolicy [1] might work for you. It's a > bit contentious in the community (some folks absolutely hate its > existence), but I've had a few use cases where it makes sense. > > Or is this scenario when 3 nodes fail and they happen to contain the same > token range? I'm curious what the math looks like. Are you using 256 vnodes > by any chance? > > If you execute a query and all 3 replicas in your local DC are down, > currently the driver throws an exception. I don't see how putting this > logic on the server helps, since the driver would need to be aware of > REMOTE_QUORUM to route to the remote replicas anyway. At that point you're > just adding an extra hop between the app and the database. > > If you have a database outage so severe that all 3 replicas are > unavailable, what's the state of the rest of the cluster? I'm genuinely > curious how often you'd have a case where 3 nodes with intersecting ranges > are offline and it's not a complete DC failure. > > I agree with Patrick and Stefan here. If you want to stay in the DC, doing > it at the driver level gives you complete control. If your infra is failing > that spectacularly, evacuate. > > I also wonder if witness replicas would help you, if you're seeing that > much failure. > > [1] > https://docs.datastax.com/en/drivers/java/4.17/com/datastax/oss/driver/api/core/retry/DowngradingConsistencyRetryPolicy.html > > > On Mon, Dec 1, 2025 at 7:37 PM Patrick McFadin <[email protected]> wrote: > >> In my experience it usually makes more sense to evacuate out of a >> degraded DC at the application layer rather than having Cassandra silently >> fail over to a remote quorum. When a DC is in a bad state, it’s rarely >> “just a couple of Cassandra replicas are down.” The same underlying problem >> (network partitions, bad routing, DNS issues, load balancer problems, AZ >> failures, power incidents, I could go on...) >> What I can see is that REMOTE_QUORUM will behave like a dynamic >> reconfiguration of the replica group for a given request. That’s >> fundamentally different from today’s NTS contract, where the replica set >> per DC is static. That would open up all sorts of new ops headaches. >> - What happens replicas in the degraded DC? Would hints get stored? >> - I'm not sure if this would work with PAXOS since dynamic >> reconfiguration isn't supported. Two leaders possibly? >> - Which now brings up Accord and TCM considerations... >> >> So my strong bias is, once a DC is degraded enough that Cassandra can’t >> achieve LOCAL_QUORUM, that DC is probably not a place you want to keep >> serving user traffic from. At that point, I’d rather have the application >> tier evacuate and talk to a healthy DC, instead of the database internally >> changing its consistency behavior under the covers. It's just a ton of work >> for a problem with a simpiler solution. >> >> Patrick >> >> >> On Mon, Dec 1, 2025 at 7:07 PM Jaydeep Chovatia < >> [email protected]> wrote: >> >>> >For the record, there is whole section about this here (sorry for >>> pasting the same link twice in my original mail) >>> >>> Thanks, Stefan, for sharing the details about the client-side failover >>> mechanism. This technique is definitely useful when the entire region is >>> down. >>> >>> This proposal targets the use case in which only a few nodes in the >>> region fail, but not all. For use cases that prioritize availability over >>> latency, they can avail this option, and the server will automatically >>> fulfill those requests from the remote region. >>> >>> Jaydeep >>> >>> On Mon, Dec 1, 2025 at 2:58 PM Štefan Miklošovič <[email protected]> >>> wrote: >>> >>>> For the record, there is whole section about this here (sorry for >>>> pasting the same link twice in my original mail) >>>> >>>> >>>> https://docs.datastax.com/en/developer/java-driver/4.17/manual/core/load_balancing/index.html#cross-datacenter-failover >>>> >>>> Interesting to see your perspective, I can see how doing something >>>> without application changes might seem appealing. >>>> >>>> I wonder what others think about this. >>>> >>>> Regards >>>> >>>> On Mon, Dec 1, 2025 at 11:39 PM Qc L <[email protected]> wrote: >>>> >>>>> Hi Stefan, >>>>> >>>>> Thanks for raising the question about using a custom >>>>> LoadBalancingPolicy! Remote quorum and LB-based failover aim to solve >>>>> similar problems, maintaining availability during datacenter degradation, >>>>> so I did a comparison study on the server-side remote quorum solution >>>>> versus relying on driver-side logic. >>>>> >>>>> >>>>> - For our use cases, we found that client-side failover is >>>>> expensive to operate and leads to fragmented behavior during >>>>> incidents. We >>>>> also prefer failover to be triggered only when needed, not >>>>> automatically. A >>>>> server-side mechanism gives us controlled, predictable behavior and >>>>> makes >>>>> the system easier to operate. >>>>> - Remote quorum is implemented on the server, where the >>>>> coordinator can intentionally form a quorum using replicas in a backup >>>>> region when the local region cannot satisfy LOCAL_QUORUM or >>>>> EACH_QUORUM. The database determines the fallback region, how >>>>> replicas are selected, and ensures consistency guarantees still hold. >>>>> >>>>> By keeping this logic internal to the server, we provide a unified, >>>>> consistent behavior across all clients without requiring any application >>>>> changes. >>>>> >>>>> Happy to discuss further if helpful. >>>>> >>>>> On Mon, Dec 1, 2025 at 12:57 PM Štefan Miklošovič < >>>>> [email protected]> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> before going deeper into your proposal ... I just have to ask, have >>>>>> you tried e.g. custom LoadBalancingPolicy in driver, if you use that? >>>>>> >>>>>> I can imagine that if the driver detects a node to be down then based >>>>>> on its "distance" / dc it belongs to you might start to create a >>>>>> different >>>>>> query plan which talks to remote DC and similar. >>>>>> >>>>>> >>>>>> https://docs.datastax.com/en/drivers/java/4.2/com/datastax/oss/driver/api/core/loadbalancing/LoadBalancingPolicy.html >>>>>> >>>>>> >>>>>> https://docs.datastax.com/en/drivers/java/4.2/com/datastax/oss/driver/api/core/loadbalancing/LoadBalancingPolicy.html >>>>>> >>>>>> On Mon, Dec 1, 2025 at 8:54 PM Qc L <[email protected]> wrote: >>>>>> >>>>>>> Hello team, >>>>>>> >>>>>>> I’d like to propose adding the remote quorum to Cassandra >>>>>>> consistency level for handling simultaneous hosts unavailability in the >>>>>>> local data center that cannot achieve the required quorum. >>>>>>> >>>>>>> Background >>>>>>> NetworkTopologyStrategy is the most commonly used strategy at Uber, >>>>>>> and we use Local_Quorum for read/write in many use cases. Our Cassandra >>>>>>> deployment in each data center currently relies on majority replicas >>>>>>> being >>>>>>> healthy to consistently achieve local quorum. >>>>>>> >>>>>>> Current behavior >>>>>>> When a local data center in a Cassandra deployment experiences >>>>>>> outages, network isolation, or maintenance events, the EACH_QUORUM / >>>>>>> LOCAL_QUORUM consistency level will fail for both reads and writes if >>>>>>> enough replicas in that the wlocal data center are unavailable. In this >>>>>>> configuration, simultaneous hosts unavailability can temporarily prevent >>>>>>> the cluster from reaching the required quorum for reads and writes. For >>>>>>> applications that require high availability and a seamless user >>>>>>> experience, >>>>>>> this can lead to service downtime and a noticeable drop in overall >>>>>>> availability. see attached figure1.[image: figure1.png] >>>>>>> >>>>>>> Proposed Solution >>>>>>> To prevent this issue and ensure a seamless user experience, we can >>>>>>> use the Remote Quorum consistency level as a fallback mechanism in >>>>>>> scenarios where local replicas are unavailable. Remote Quorum in >>>>>>> Cassandra >>>>>>> refers to a read or write operation that achieves quorum (a majority of >>>>>>> replicas) in the remote data center, rather than relying solely on >>>>>>> replicas >>>>>>> within the local data center. see attached Figure2. >>>>>>> >>>>>>> [image: figure2.png] >>>>>>> >>>>>>> We will add a feature to do read/write consistency level override on >>>>>>> the server side. When local replicas are not available, we will >>>>>>> overwrite >>>>>>> the server side write consistency level from each quorum to remote >>>>>>> quorum. >>>>>>> Note that, implementing this change in client side will require some >>>>>>> protocol changes in CQL, we only add this on server side which can only >>>>>>> be >>>>>>> used by server internal. >>>>>>> >>>>>>> For example, giving the following Cassandra setup >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> *CREATE KEYSPACE ks WITH REPLICATION = { 'class': >>>>>>> 'NetworkTopologyStrategy', 'cluster1': 3, 'cluster2': 3, 'cluster3': >>>>>>> 3};* >>>>>>> >>>>>>> The selected approach for this design is to explicitly configure a >>>>>>> backup data center mapping for the local data center, where each data >>>>>>> center defines its preferred failover target. For example >>>>>>> *remote_quorum_target_data_center:* >>>>>>> >>>>>>> >>>>>>> * cluster1: cluster2 cluster2: cluster3 cluster3: cluster1* >>>>>>> >>>>>>> Implementations >>>>>>> We proposed the following feature to Cassandra to address data >>>>>>> center failure scenarios >>>>>>> >>>>>>> 1. Introduce a new Consistency level called remote quorum. >>>>>>> 2. Feature to do read/write consistency level override on server >>>>>>> side. (This can be controlled by a feature flag). Use Node tools >>>>>>> command to >>>>>>> turn on/off the server failback >>>>>>> >>>>>>> Why remote quorum is useful >>>>>>> As shown in the figure 2, we have the data center failure in one of >>>>>>> the data centers, Local quorum and each quorum will fail since two >>>>>>> replicas >>>>>>> are unavailable and cannot meet the quorum requirement in the local data >>>>>>> center. >>>>>>> >>>>>>> During the incident, we can use the nodetool to enable failover to >>>>>>> remote quorum. With the failover enabled, we can failover to the remote >>>>>>> data center for read and write, which avoids the available drop. >>>>>>> >>>>>>> Any thoughts or concerns? >>>>>>> Thanks in advance for your feedback, >>>>>>> >>>>>>> Qiaochu >>>>>>> >>>>>>>
