> I see that implementing a Replicated Log 
> <https://martinfowler.com/articles/patterns-of-distributed-systems/#PatternSequenceForImplementingReplicatedLog>
>  needs significant changes, particularly about how two phases of Paxos are 
> implemented over the entire log. So will it be better to use Raft instead?

There will be no changes required to our existing Paxos implementation. We can 
just use it. Besides, Paxos is only used as K-sequencer. There is no need to 
use Raft, and both existing LWTs (with Multi-Paxos) and Accord aren't tied to a 
single leader, which is well in the spirit of Cassandra.

> It might be of some use as a quick reference for this CEP to be compared with 
> others who use similar architecture.

I think introducing additional references to CEP may make it more difficult to 
navigate, but of course people can refer to from the mailing list discussion if 
they find it helpful. Many of the things described in CEP are 
Cassandra-specific, and describe not what you have referred to as "consistent 
store" but rather how it integrates Cassandra. The rest of the terms, I 
believe, are common knowledge. But even then, there are very few distributed 
systems concepts mentioned in the CEP.  We have strived to make the document 
self-contained. While some of the concepts are introduced slightly 
out-of-order, this was done deliberately, as we were searching for the best 
logical sequencing.

-- Alex

On Thu, Sep 1, 2022, at 6:14 AM, Unmesh Joshi wrote:
> Hi Sam,
> 
> Great to see this CEP. I have been documenting a few common 'patterns of 
> distributed systems, and have documented a pattern called 'consistent core 
> <https://martinfowler.com/articles/patterns-of-distributed-systems/consistent-core.html>'
>  referring to the source code of various systems which use a linearizable 
> metadata store. I have also documented patterns like 'lease' 
> <https://martinfowler.com/articles/patterns-of-distributed-systems/time-bound-lease.html>
>  and 'state watch 
> <https://martinfowler.com/articles/patterns-of-distributed-systems/state-watch.html>'
>  which are commonly used by a consistent core. I also recently documented how 
> a typical partition assignment and partition movement is implemented in 
> systems that use a consistent core-based metadata store. (In systems like 
> YugabyteDb, Cockroachdb, Kafka etc..)
> It might be of some use as a quick reference for this CEP to be compared with 
> others who use similar architecture.
> A quick question about using existing Paxos machinery. I see that 
> implementing a Replicated Log 
> <https://martinfowler.com/articles/patterns-of-distributed-systems/#PatternSequenceForImplementingReplicatedLog>
>  needs significant changes, particularly about how two phases of Paxos are 
> implemented over the entire log. So will it be better to use Raft instead?
> 
> 
> Thanks,
> Unmesh
> 
> On 2022/08/23 08:50:27 Sam Tunnicliffe wrote:
> > Thanks! 
> > The core of the proposal is around the sequencing metadata changes and 
> > ensuring that they're delivered to/processed by nodes in the right order 
> > and at the right time. The actual mechanisms for imposing that order and 
> > for maintaining the log are pretty simple to implement. We envision using 
> > the existing Paxos machinery by default, but swapping that for an 
> > alternative implemention would not be difficult.
> > 
> > 
> > > On 22 Aug 2022, at 19:14, Derek Chen-Becker <de...@chen-becker.org> wrote:
> > > 
> > > This looks really interesting; thanks for putting this together! Just so 
> > > I'm clear on CEP nomenclature, having external management of metadata as 
> > > a non-goal doesn't preclude some future use, correct? Coincidentally, I'm 
> > > working on my ApacheCon talk on improving modularity in Cassandra and one 
> > > of the ideas I'm discussing is pluggably (?) replacing gossip with 
> > > something(s) that allow us to externalize some of the complexity of 
> > > maintaining consistency. I need to digest the proposal you've made, but I 
> > > don't see the two ideas being at odds on my first read. 
> > > 
> > > Cheers,
> > > 
> > > Derek
> > > 
> > > On Mon, Aug 22, 2022 at 6:45 AM Sam Tunnicliffe <s...@beobal.com 
> > > <ma...@beobal.com>> wrote:
> > > Hi,
> > > 
> > > I'd like to open discussion about this CEP: 
> > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
> > >  
> > > <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21:+Transactional+Cluster+Metadata>
> > >  
> > > Cluster metadata in Cassandra comprises a number of disparate elements 
> > > including, but not limited to, distributed schema, topology and token 
> > > ownership. Following the general design principles of Cassandra, the 
> > > mechanisms for coordinating updates to cluster state have favoured 
> > > eventual consistency, with probabilisitic delivery via gossip being a 
> > > prime example. Undoubtedly, this approach has benefits, not least in 
> > > terms of resilience, particularly in highly fluid distributed 
> > > environments. However, this is not the reality of most Cassandra 
> > > deployments, where the total number of nodes is relatively small (i.e. in 
> > > the low thousands) and the rate of change tends to be low. 
> > > 
> > > Historically, a significant proportion of issues affecting operators and 
> > > users of Cassandra have been due, at least in part, to a lack of strongly 
> > > consistent cluster metadata. In response to this, we propose a design 
> > > which aims to provide linearizability of metadata changes whilst ensuring 
> > > that the effects of those changes are made visible to all nodes in a 
> > > strongly consistent manner. At its core, it is also pluggable, enabling 
> > > Cassandra-derived projects to supply their own implementations if desired.
> > > 
> > > In addition to the CEP document itself, we aim to publish a working 
> > > prototype of the proposed design. Obviously, this does not implement the 
> > > entire proposal and there are several parts which remain only partially 
> > > complete. It does include the core of the system, including a good deal 
> > > of test infrastructure, so may serve as both illustration of the design 
> > > and a starting point for real implementation. 
> > > 
> > > 
> > > 
> > > -- 
> > > +---------------------------------------------------------------+
> > > | Derek Chen-Becker |
> > > | GPG Key available at https://keybase.io/dchenbecker 
> > > <https://keybase.io/dchenbecker> and |
> > > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org 
> > > <https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org> |
> > > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC |
> > > +---------------------------------------------------------------+
> > > 
> > 
> >

Reply via email to