+1
> On May 6, 2025, at 10:53 AM, Dmitry Konstantinov <netud...@gmail.com> wrote:
>
> +1 (nb)
>
> On Tue, 6 May 2025 at 17:32, Aleksey Yeshchenko <alek...@apple.com
> <mailto:alek...@apple.com>> wrote:
>> +1
>>
>>> On 5 May 2025, at 23:24, Blake Eggleston <bl...@ultrablake.com
>>> <mailto:bl...@ultrablake.com>> wrote:
>>>
>>> As mutation tracking relates to existing backup systems that account for
>>> repaired vs unrepaired sstables, mutation tracking will continue to promote
>>> sstables to repaired once we know they contain data that has been fully
>>> reconciled. The main difference is that they won’t be promoted as part of
>>> an explicit range repair, but by compaction as they’re able to be promoted.
>>>
>>> (also +1 to finishing witnesses)
>>>
>>> On Mon, May 5, 2025, at 11:45 AM, Benedict Elliott Smith wrote:
>>>> Consistent backup/restore is a fundamentally hard and unsolved problem for
>>>> Cassandra today (without any of the mentioned features). In particular, we
>>>> break the real-time guarantee of the linearizability property (most
>>>> notably for LWTs) between partitions for any backup/restore process today.
>>>>
>>>> Fixing this should be relatively straight-forward for Accord, and
>>>> something we intend to address in follow-up work. Fixing it for eventually
>>>> consistent (or Paxos/LWT) operations is I think achievable, with or
>>>> without mutation tracking (probably easier with mutation tracking). I’m
>>>> not sure of any plans to try to tackle this though.
>>>>
>>>> Witness replicas should not particularly matter at all to any of the above.
>>>>
>>>>> On 5 May 2025, at 18:49, Jon Haddad <j...@rustyrazorblade.com
>>>>> <mailto:j...@rustyrazorblade.com>> wrote:
>>>>>
>>>>> It took me a bit to wrap my head around how this works, but now that I
>>>>> think I understand the idea, it sounds like a solid improvement. Being
>>>>> able to achieve the same results as quorum but costing 1/3 less is a *big
>>>>> deal* and I know several teams that would be interested.
>>>>>
>>>>> One thing I'm curious about (and we can break it out into a separate
>>>>> discussion), is how all the functionality that requires coordination and
>>>>> global state (repaired vs non-repaired) will affect backups. Without a
>>>>> synchronization primitive to take a cluster-wide snapshot, how can we
>>>>> safely restore from eventually consistent backups without risking
>>>>> consistency issues due to out-of-sync repaired status?
>>>>>
>>>>> I don't think we need to block any of the proposed work on this - it's
>>>>> just something that's been nagging at me, and I don't know enough about
>>>>> the nuance of Accord, Mutation Tracking or Witness Replicas to say if it
>>>>> affects things or not. If it does, let's make sure we have that
>>>>> documented [1]
>>>>>
>>>>> Jon
>>>>>
>>>>> [1]
>>>>> https://cassandra.apache.org/doc/latest/cassandra/managing/operating/backups.html
>>>>>
>>>>>
>>>>>
>>>>> On Mon, May 5, 2025 at 10:21 AM Nate McCall <zznat...@gmail.com
>>>>> <mailto:zznat...@gmail.com>> wrote:
>>>>> This sounds like a modern feature that will benefit a lot of folks in
>>>>> cutting storage costs, particularly in large deployments.
>>>>>
>>>>> I'd like to see a note on the CEP about documentation overhead as this is
>>>>> an important feature to communicate correctly, but that's just a nit. +1
>>>>> on moving forward with this overall.
>>>>>
>>>>> On Sun, May 4, 2025 at 1:58 PM Jordan West <jw...@apache.org
>>>>> <mailto:jw...@apache.org>> wrote:
>>>>> I’m generally supportive. The concept is one that I can see the benefits
>>>>> of and I also think the current implementation adds a lot of complexity
>>>>> to the codebase for being stuck in experimental mode. It will be great to
>>>>> have a more robust version built on a better approach.
>>>>>
>>>>> On Sun, May 4, 2025 at 00:27 Benedict <bened...@apache.org
>>>>> <mailto:bened...@apache.org>> wrote:
>>>>> +1
>>>>>
>>>>> This is an obviously good feature for operators that are storage-bound in
>>>>> multi-DC deployments but want to retain their latency characteristics
>>>>> during node maintenance. Log replicas are the right approach.
>>>>>
>>>>> > On 3 May 2025, at 23:42, sc...@paradoxica.net
>>>>> > <mailto:sc...@paradoxica.net> wrote:
>>>>> >
>>>>> > Hey everybody, bumping this CEP from Ariel in case you'd like some
>>>>> > weekend reading.
>>>>> >
>>>>> > We’d like to finish witnesses and bring them out of “experimental”
>>>>> > status now that Transactional Metadata and Mutation Tracking provide
>>>>> > the building blocks needed to complete them.
>>>>> >
>>>>> > Witnesses are part of a family of approaches in replicated storage
>>>>> > systems to maintain or boost availability and durability while reducing
>>>>> > storage costs. Log replicas are a close relative. Both are used by
>>>>> > leading cloud databases – for instance, Spanner implements witness
>>>>> > replicas [1] while DynamoDB implements log replicas [2].
>>>>> >
>>>>> > Witness replicas are a great fit for topologies that replicate at
>>>>> > greater than RF=3 –– most commonly multi-DC/multi-region deployments.
>>>>> > Today in Cassandra, all members of a voting quorum replicate all data
>>>>> > forever. Witness replicas let users break this coupling. They allow one
>>>>> > to define voting quorums that are larger than the number of copies of
>>>>> > data that are stored in perpetuity.
>>>>> >
>>>>> > Take a 3× DC cluster replicated at RF=3 in each DC as an example. In
>>>>> > this topology, Cassandra stores 9× copies of the database forever -
>>>>> > huge storage amplification. Witnesses allow users to maintain a voting
>>>>> > quorum of 9 members (3× per DC); but reduce the durable replicas to 2×
>>>>> > per DC – e.g., two durable replicas and one witness. This maintains the
>>>>> > availability properties of an RF=3×3 topology while reducing storage
>>>>> > costs by 33%, going from 9× copies to 6×.
>>>>> >
>>>>> > The role of a witness is to "witness" a write and persist it until it
>>>>> > has been reconciled among all durable replicas; and to respond to read
>>>>> > requests for witnessed writes awaiting reconciliation. Note that
>>>>> > witnesses don't introduce a dedicated role for a node – whether a node
>>>>> > is a durable replica or witness for a token just depends on its
>>>>> > position in the ring.
>>>>> >
>>>>> > This CEP builds on CEP-45: Mutation Tracking to establish the safety
>>>>> > property of the witness: guaranteeing that writes have been persisted
>>>>> > to all durable replicas before becoming purgeable. CEP-45's journal and
>>>>> > reconciliation design provide a great mechanism to ensure this while
>>>>> > avoiding the write amplification of incremental repair and
>>>>> > anticompaction.
>>>>> >
>>>>> > Take a look at the CEP if you're interested - happy to answer questions
>>>>> > and discuss further:
>>>>> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-45%3A+Mutation+Tracking
>>>>> >
>>>>> > – Scott
>>>>> >
>>>>> > [1] https://cloud.google.com/spanner/docs/replication
>>>>> > [2] https://www.usenix.org/system/files/atc22-elhemali.pdf
>>>>> >
>>>>> >> On Apr 25, 2025, at 8:21 AM, Ariel Weisberg <ar...@weisberg.ws
>>>>> >> <mailto:ar...@weisberg.ws>> wrote:
>>>>> >>
>>>>> >> Hi all,
>>>>> >>
>>>>> >> The CEP is available here:
>>>>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=353601959
>>>>> >>
>>>>> >> We would like to propose CEP-46: Finish Transient
>>>>> >> Replication/Witnesses for adoption by the community. CEP-46 would
>>>>> >> rename transient replication to witnesses and leverage mutation
>>>>> >> tracking to implement witnesses as CEP-45 Mutation Tracking based Log
>>>>> >> Replicas as a replacement for incremental repair based witnesses.
>>>>> >>
>>>>> >> For those not familiar with transient replication it would have the
>>>>> >> keyspace replication settings declare some replicas as transient and
>>>>> >> when incremental repair runs the transient replicas would delete data
>>>>> >> instead of moving it into the repaired set.
>>>>> >>
>>>>> >> With log replicas nodes only materialize mutations in their local LSM
>>>>> >> for ranges where they are full replicas and not witnesses. For witness
>>>>> >> ranges a node will write mutations to their local mutation tracking
>>>>> >> log and participate in background and read time reconciliation. This
>>>>> >> saves the compaction overhead of IR based witnesses which have to
>>>>> >> materialize and perform compaction on all mutations even those being
>>>>> >> applied to witness ranges.
>>>>> >>
>>>>> >> This would address one of the biggest issues with witnesses which is
>>>>> >> the lack of monotonic reads. Implementation complexity wise this would
>>>>> >> actually delete code compared to what would be required to complete IR
>>>>> >> based witnesses because most of the heavy lifting is already done by
>>>>> >> mutation tracking.
>>>>> >>
>>>>> >> Log replicas also makes it much more practical to realize the cost
>>>>> >> savings of witnesses because log replicas have easier to characterize
>>>>> >> resource consumption requirements (write rate *
>>>>> >> recovery/reconfiguration time) and target a 10x improvement in write
>>>>> >> throughput. This makes knowing how much capacity can be omitted safer
>>>>> >> and easier.
>>>>> >>
>>>>> >> Thanks,
>>>>> >> Ariel
>>>>> >
>>>
>>
>
>
>
> --
> Dmitry Konstantinov