Re: [DISCUSS] Cassandra Cross Data Center Read/Write with Remote Quorum During Data Center Failure

Patrick McFadin Mon, 01 Dec 2025 19:36:57 -0800

In my experience it usually makes more sense to evacuate out of a degraded
DC at the application layer rather than having Cassandra silently fail over
to a remote quorum. When a DC is in a bad state, it’s rarely “just a couple
of Cassandra replicas are down.” The same underlying problem (network
partitions, bad routing, DNS issues, load balancer problems, AZ failures,
power incidents, I could go on...)
What I can see is that REMOTE_QUORUM will behave like a dynamic
reconfiguration of the replica group for a given request. That’s
fundamentally different from today’s NTS contract, where the replica set
per DC is static. That would open up all sorts of new ops headaches.
 - What happens replicas in the degraded DC? Would hints get stored?
 - I'm not sure if this would work with PAXOS since dynamic reconfiguration
isn't supported. Two leaders possibly?
 - Which now brings up Accord and TCM considerations...


So my strong bias is, once a DC is degraded enough that Cassandra can’t
achieve LOCAL_QUORUM, that DC is probably not a place you want to keep
serving user traffic from. At that point, I’d rather have the application
tier evacuate and talk to a healthy DC, instead of the database internally
changing its consistency behavior under the covers. It's just a ton of work
for a problem with a simpiler solution.

Patrick


On Mon, Dec 1, 2025 at 7:07 PM Jaydeep Chovatia <[email protected]>
wrote:

> >For the record, there is whole section about this here (sorry for pasting
> the same link twice in my original mail)
>
> Thanks, Stefan, for sharing the details about the client-side failover
> mechanism. This technique is definitely useful when the entire region is
> down.
>
> This proposal targets the use case in which only a few nodes in the region
> fail, but not all. For use cases that prioritize availability over latency,
> they can avail this option, and the server will automatically fulfill those
> requests from the remote region.
>
> Jaydeep
>
> On Mon, Dec 1, 2025 at 2:58 PM Štefan Miklošovič <[email protected]>
> wrote:
>
>> For the record, there is whole section about this here (sorry for pasting
>> the same link twice in my original mail)
>>
>>
>> https://docs.datastax.com/en/developer/java-driver/4.17/manual/core/load_balancing/index.html#cross-datacenter-failover
>>
>> Interesting to see your perspective, I can see how doing something
>> without application changes might seem appealing.
>>
>> I wonder what others think about this.
>>
>> Regards
>>
>> On Mon, Dec 1, 2025 at 11:39 PM Qc L <[email protected]> wrote:
>>
>>> Hi Stefan,
>>>
>>> Thanks for raising the question about using a custom
>>> LoadBalancingPolicy! Remote quorum and LB-based failover aim to solve
>>> similar problems, maintaining availability during datacenter degradation,
>>> so I did a comparison study on the server-side remote quorum solution
>>> versus relying on driver-side logic.
>>>
>>>
>>>    - For our use cases, we found that client-side failover is expensive
>>>    to operate and leads to fragmented behavior during incidents. We also
>>>    prefer failover to be triggered only when needed, not automatically. A
>>>    server-side mechanism gives us controlled, predictable behavior and makes
>>>    the system easier to operate.
>>>    - Remote quorum is implemented on the server, where the coordinator
>>>    can intentionally form a quorum using replicas in a backup region when 
>>> the
>>>    local region cannot satisfy LOCAL_QUORUM or EACH_QUORUM. The
>>>    database determines the fallback region, how replicas are selected, and
>>>    ensures consistency guarantees still hold.
>>>
>>> By keeping this logic internal to the server, we provide a unified,
>>> consistent behavior across all clients without requiring any application
>>> changes.
>>>
>>> Happy to discuss further if helpful.
>>>
>>> On Mon, Dec 1, 2025 at 12:57 PM Štefan Miklošovič <
>>> [email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> before going deeper into your proposal ... I just have to ask, have you
>>>> tried e.g. custom LoadBalancingPolicy in driver, if you use that?
>>>>
>>>> I can imagine that if the driver detects a node to be down then based
>>>> on its "distance" / dc it belongs to you might start to create a different
>>>> query plan which talks to remote DC and similar.
>>>>
>>>>
>>>> https://docs.datastax.com/en/drivers/java/4.2/com/datastax/oss/driver/api/core/loadbalancing/LoadBalancingPolicy.html
>>>>
>>>>
>>>> https://docs.datastax.com/en/drivers/java/4.2/com/datastax/oss/driver/api/core/loadbalancing/LoadBalancingPolicy.html
>>>>
>>>> On Mon, Dec 1, 2025 at 8:54 PM Qc L <[email protected]> wrote:
>>>>
>>>>> Hello team,
>>>>>
>>>>> I’d like to propose adding the remote quorum to Cassandra consistency
>>>>> level for handling simultaneous hosts unavailability in the local data
>>>>> center that cannot achieve the required quorum.
>>>>>
>>>>> Background
>>>>> NetworkTopologyStrategy is the most commonly used strategy at Uber,
>>>>> and we use Local_Quorum for read/write in many use cases. Our Cassandra
>>>>> deployment in each data center currently relies on majority replicas being
>>>>> healthy to consistently achieve local quorum.
>>>>>
>>>>> Current behavior
>>>>> When a local data center in a Cassandra deployment experiences
>>>>> outages, network isolation, or maintenance events, the EACH_QUORUM /
>>>>> LOCAL_QUORUM consistency level will fail for both reads and writes if
>>>>> enough replicas in that the wlocal data center are unavailable. In this
>>>>> configuration, simultaneous hosts unavailability can temporarily prevent
>>>>> the cluster from reaching the required quorum for reads and writes. For
>>>>> applications that require high availability and a seamless user 
>>>>> experience,
>>>>> this can lead to service downtime and a noticeable drop in overall
>>>>> availability. see attached figure1.[image: figure1.png]
>>>>>
>>>>> Proposed Solution
>>>>> To prevent this issue and ensure a seamless user experience, we can
>>>>> use the Remote Quorum consistency level as a fallback mechanism in
>>>>> scenarios where local replicas are unavailable. Remote Quorum in Cassandra
>>>>> refers to a read or write operation that achieves quorum (a majority of
>>>>> replicas) in the remote data center, rather than relying solely on 
>>>>> replicas
>>>>> within the local data center. see attached Figure2.
>>>>>
>>>>> [image: figure2.png]
>>>>>
>>>>> We will add a feature to do read/write consistency level override on
>>>>> the server side. When local replicas are not available, we will overwrite
>>>>> the server side write consistency level from each quorum to remote quorum.
>>>>> Note that, implementing this change in client side will require some
>>>>> protocol changes in CQL, we only add this on server side which can only be
>>>>> used by server internal.
>>>>>
>>>>> For example, giving the following Cassandra setup
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *CREATE KEYSPACE ks WITH REPLICATION = {  'class':
>>>>> 'NetworkTopologyStrategy',  'cluster1': 3,  'cluster2': 3,  'cluster3': 
>>>>> 3};*
>>>>>
>>>>> The selected approach for this design is to explicitly configure a
>>>>> backup data center mapping for the local data center, where each data
>>>>> center defines its preferred failover target. For example
>>>>> *remote_quorum_target_data_center:*
>>>>>
>>>>>
>>>>> *  cluster1: cluster2  cluster2: cluster3  cluster3: cluster1*
>>>>>
>>>>> Implementations
>>>>> We proposed the following feature to Cassandra to address data center
>>>>> failure scenarios
>>>>>
>>>>>    1. Introduce a new Consistency level called remote quorum.
>>>>>    2. Feature to do read/write consistency level override on server
>>>>>    side. (This can be controlled by a feature flag). Use Node tools 
>>>>> command to
>>>>>    turn on/off the server failback
>>>>>
>>>>> Why remote quorum is useful
>>>>> As shown in the figure 2, we have the data center failure in one of
>>>>> the data centers, Local quorum and each quorum will fail since two 
>>>>> replicas
>>>>> are unavailable and cannot meet the quorum requirement in the local data
>>>>> center.
>>>>>
>>>>> During the incident, we can use the nodetool to enable failover to
>>>>> remote quorum. With the failover enabled, we can failover to the remote
>>>>> data center for read and write, which avoids the available drop.
>>>>>
>>>>> Any thoughts or concerns?
>>>>> Thanks in advance for your feedback,
>>>>>
>>>>> Qiaochu
>>>>>
>>>>>

Re: [DISCUSS] Cassandra Cross Data Center Read/Write with Remote Quorum During Data Center Failure

Reply via email to