Sure, that works for me. From: Patrick McFadin <pmcfa...@gmail.com> Date: Wednesday, 22 September 2021 at 04:47 To: dev@cassandra.apache.org <dev@cassandra.apache.org> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions I would be happy to host a Zoom as I've done in the past. I can post a transcript and the recording after the call.
Instead of right after your talk Benedict, maybe we can set a time for next week and let everyone know the time? Patrick On Mon, Sep 20, 2021 at 11:05 AM bened...@apache.org <bened...@apache.org> wrote: > Hi Joey, > > Thanks for the feedback and suggestions. > > > I was wondering what do you think about having some extended Q&A after > your ApacheCon talk Wednesday > > I would love to do this. I’ll have to figure out how though – my > understanding is that I have a hard 40m for my talk and any Q&A, and I > expect the talk to occupy most of those 40m as I try to cover both the > CEP-14 and CEP-15. I’m not sure what facilities are made available by > Hopin, but if necessary we can perhaps post some external video chat link? > > The time of day is also a question, as I think the last talk ends at > 9:20pm local time. But we can make that work if necessary. > > > It might help to have a diagram (perhaps I can collaborate with you > on this?) > > I absolutely agree. This is something I had planned to produce but it’s > been a question of time. In part I wanted to ensure we published long in > advance of ApacheCon, but now also with CEP-10, CEP-14 and CEP-15 in flight > it’s hard to get back to improving the draft. If you’d be interested in > collaborating on this that would be super appreciated, as this would > certainly help the reader. > > >I think that WAN is always paid during the Consensus Protocol, and then > in most cases execution can remain LAN except in 3+ datacenters where I > think you'd have to include at least one replica in a neighboring > datacenter… > > As designed the only WAN cost is consensus as Accord ensures every replica > receives a complete copy of every transaction, and is aware of any gaps. If > there are gaps there may be WAN delays as those are filled in. This might > occur because of network outages, but is most likely to occur when > transactions are being actively executed by multiple DCs at once – in which > case there’ll be one further unidirectional WAN latency during execution > while the earlier transaction disseminates its result to the later > transaction(s). There are other similar scenario we can discuss, e.g. if a > transaction takes the slow path and will execute after a transaction being > executed in another DC, that remote transaction needs to receive this > notification before executing. > > There might potentially be some interesting optimisations to make in > future, where with many queued transactions a single DC may nominate itself > to execute all outstanding queries and respond to the remote DCs that > issued them so as to eliminate the WAN latency for disseminating the result > of each transaction. But we’re getting way ahead of ourselves there 😊 > > There’s also no LAN cost on write, at least for responding to the client. > If there is a dependent transaction within the same DC then (as in the > above case) there will be a LAN penalty for the second transaction to > execute. > > > Relatedly I'm curious if there is any way that the client can > acquire the timestamp used by the transaction before sending the data > so we can make the operations idempotent and unrelated to the > coordinator that was executing them as the storage nodes are > vulnerable to disk and heap failure modes which makes them much more > likely to enter grey failure (slow). Alternatively, perhaps it would > make sense to introduce a set of optional dedicated C* nodes for > reaching consensus that do not act as storage nodes so we don't have > to worry about hanging coordinators (join_ring=false?)? > > So, in principle coordination can be performed by any node on the network > including a client – though we’d need to issue the client a unique id this > can be done cheaply on joining. This might be something to explore in > future, though there are downsides to having more coordinators too (more > likely to fail, and stall further transactions that depend on transactions > it is coordinating). > > However, with respect to idempotency, I expect Accord not to perpetuate > the problems of LWTs where the result of an earlier query is unknown. At > least success/fail will be maintained in a distributed fashion for some > reasonable time horizon, and there will also be protection against zombie > transactions (those proposed to a node that went into a failure spiral > before reaching healthy nodes, that somehow regurgitates it hours or days > later), so we should be able to provide practical precisely-once semantics > to clients. > > Whether this is done with a client provided timestamp, or simply some > other arbitrary client-provided id that can be utilised to deduplicate > requests or query the status of a transaction is something we can explore > later. This is something we should explore in a dedicated discussion as > development of Accord progresses. > > > Should Algorithm 1 line 12 be PreAcceptOK from Et (not Qt) or should > line 2 read Qt instead of Et? > > So, technically as it reads today I think it’s correct. For Line 2 there > is always some Qt \subseteq Et. I think the problem here is that actually > there’s a bunch of valid things to do, including picking some arbitrary > subset of each rho in Pt so long as it contains some Qt. It’s hard to > convey the range of options precisely. Line 12 of course really wants to > execute only when some Ft has responded, but if no such response is > forthcoming it wants to execute on some Qt, but of course Ft \superseteq > Qt. Perhaps I should try to state the set inequalities here. I will think > about what I can do to improve the clarity, thanks. > > > It might make sense for participating members to wait for a minimum > detected clock skew before becoming eligible for electorate? > > This is a great idea, thanks! > > > I don't really understand how temporarily down replicas will learn > of mutations they missed .. are we just leveraging some > external repair? > > Yes, precisely. Though in practice any transaction they need to know to > answer a Read etc, they can query a peer for. But in practice I expect to > deliver a real-time repair mechanism scoped (initially, at least) to Accord > transactions to ensure this happens promptly. > > > Relatedly since non-transactional reads wouldn't flow through > consensus (I hope) would it make sense for a restarting node to learn > the latest accepted time once and then be deprioritized for all reads > until it has accepted what it missed? Or is the idea that you would > _always_ read transactionally (and since it's a read only transaction > you can skip the WAN consensus and just go straight to fast path > reads)? > > I expect that tables will be marked transactional, and that every > operation that goes through them will be transactional. However I can > imagine offering weaker read semantics, particularly if you’re looking to > avoid paying the WAN price if you aren’t worried about consistency. I > haven’t really considered how we might marry the two within a table, and > I’m open to suggestions here. I expect that this dovetails with future > improvements to transactional cluster metadata. I think also in part this > kind of behaviour is limited today because repair is too unwieldy, and also > because we don’t have an “on but catching up” state. If we improve repair > for transactions the first part may be solved, and perhaps we can introduce > a new node state as part of improving our approach to cluster management. > > I could imagine having some bounded divergence in general, e.g. I haven’t > corroborated my transaction history in Xms with a majority, or I haven’t > received Xms of the transaction history I’ve witnessed, so I’m going to > remove myself from the read set for non-transactional operations. But I > don’t envisage this landing in V1. > > * I know the paper says that we elide details of how the shards (aka > replica sets?) are chosen, but it seems that this system would have a > hard dependency on a strongly consistent shard selection system (aka > token metadata?) wouldn't it? In particular if the simple quorums > (which I interpreted to be replica sets in current C*, not sure if > that's correct) can change in non linearizable ways I don't think > Property 3.3 can hold. I think you hint at a solution to this in > section 5 but I'm not sure I grok it. > > Yes, it does. That’s something that’s in hand, and colleagues will be > reaching out to the list about in the next couple of months. I anticipate > this being a solved problem before Accord depends on it. There’s still a > bunch of complexity within Accord for applying topology changes safely > (which Section 5 nods to), but the membership decisions will be taken by > Cassandra – safely. > > > From: Joseph Lynch <joe.e.ly...@gmail.com> > Date: Monday, 20 September 2021 at 17:17 > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions > Benedict, > > Thank you very much for advancing this proposal, I'm extremely excited > to see flexible quorums used in this way and am looking forward to the > integration of Accord into Cassandra! I read the whitepaper and have a > few questions, but I was wondering what do you think about having some > extended Q&A after your ApacheCon talk Wednesday (maybe at the end of > the C* track)? It might be higher bandwidth than going back and forth > on email/slack (also given you're presenting on it that might be a > good time to discuss it)? > > Briefly > * It might help to have a diagram (perhaps I can collaborate with you > on this?) showing the happy path delay waiting in the reorder buffer > and the messages that are sent in a 2 and 3 datacenter deployment > during the PreAccept, Accept, Commit, Execute, Apply phases. In > particular it was hard for me to follow where exactly I was paying WAN > latency and where we could achieve progress with LAN only (I think > that WAN is always paid during the Consensus Protocol, and then in > most cases execution can remain LAN except in 3+ datacenters where I > think you'd have to include at least one replica in a neighboring > datacenter). In particular, it seems that Accord always pays clock > skew + WAN latency during the reorder buffer (as part of consensus) + > 2x LAN latency during execution (to read and then write). > * Relatedly I'm curious if there is any way that the client can > acquire the timestamp used by the transaction before sending the data > so we can make the operations idempotent and unrelated to the > coordinator that was executing them as the storage nodes are > vulnerable to disk and heap failure modes which makes them much more > likely to enter grey failure (slow). Alternatively, perhaps it would > make sense to introduce a set of optional dedicated C* nodes for > reaching consensus that do not act as storage nodes so we don't have > to worry about hanging coordinators (join_ring=false?)? > * Should Algorithm 1 line 12 be PreAcceptOK from Et (not Qt) or should > line 2 read Qt instead of Et? > * I think your claims about clock skew being <1ms in general is > accurate at least for AWS except for when machines boot for the first > time (I can send you some data shortly). It might make sense for > participating members to wait for a minimum detected clock skew before > becoming eligible for electorate? > * I don't really understand how temporarily down replicas will learn > of mutations they missed, did I miss the part where a read replica > would recover all transactions between its last accepted time and > another replica's last accepted time? Or are we just leveraging some > external repair? > * Relatedly since non-transactional reads wouldn't flow through > consensus (I hope) would it make sense for a restarting node to learn > the latest accepted time once and then be deprioritized for all reads > until it has accepted what it missed? Or is the idea that you would > _always_ read transactionally (and since it's a read only transaction > you can skip the WAN consensus and just go straight to fast path > reads)? > * I know the paper says that we elide details of how the shards (aka > replica sets?) are chosen, but it seems that this system would have a > hard dependency on a strongly consistent shard selection system (aka > token metadata?) wouldn't it? In particular if the simple quorums > (which I interpreted to be replica sets in current C*, not sure if > that's correct) can change in non linearizable ways I don't think > Property 3.3 can hold. I think you hint at a solution to this in > section 5 but I'm not sure I grok it. > > Super interesting proposal and I am looking forward to all the > improvements this will bring to the project! > > Cheers, > -Joey > > On Mon, Sep 20, 2021 at 1:34 AM Miles Garnsey > <miles.garn...@datastax.com> wrote: > > > > If Accord can fulfil its aims it sounds like a huge improvement to the > state of the art in distributed transaction processing. Congrats to all > involved in pulling the proposal together. > > > > I was holding off on feedback since this is quite in depth and I don’t > want to bike shed, I still haven’t spent as much time understanding this as > I’d like. > > > > Regardless, I’ll make the following notes in case they’re helpful. My > feedback is more to satisfy my own curiosity and stimulate discussion than > to suggest that there are any flaws here. I applaud the proposed testing > approach and think it is the only way to be certain that the proposed > consistency guarantees will be upheld. > > > > General > > > > I’m curious if/how this proposal addresses issues we have seen when > scaling; I see reference to simple majorities of nodes - is there any plan > to ensure safety under scaling operations or DC (de)commissioning? > > > > What consistency levels will be supported under Accord? Will it simply > be a single CL representing a majority of nodes across the whole cluster? > (This at least would mitigate the issues I’ve seen when folks want to > switch from EACH_SERIAL to SERIAL). > > > > Accord > > > > > Accord instead assembles an inconsistent set of dependencies. > > > > > > Further explanation here would be good. Do we mean to say that the > dependancies may differ according to which transactions the coordinator has > witnessed at the time the incoming transaction is first seen? This would > make sense if some nodes had not fully committed a foregoing transaction. > > > > Is it correct to think of this step as assembling a dependancy graph of > foregoing transactions which must be completed ahead of progressing the > incoming new transaction? > > > > Fast Path > > > > > A coordinator C proposes a timestamp t0 to at least a quorum of a fast > path electorate. If t0 is larger than all timestamps witnessed for all > prior conflicting transactions, t0 is accepted by a replica. If a fast path > quorum of responses accept, the transaction is agreed to execute at t0. > Replicas respond with the set of transactions they have witnessed that may > execute with a lower timestamp, i.e. those with a lower t0. > > > > What is t0 here? I’m guessing it is the Lamport clock time of the most > recent mutation to the partition? May be worth clarifying because otherwise > the perception may be that it is the commencement time of the current > transaction which may not be the intention. > > > > Regarding the use of logical clocks in general - > > > > Do we have one clock-per-shard-per-node? Or is there a single clock for > all transactions on a node? > > What happens in network partitions? > > In a cross-shard transaction does maintaining simple majorities of > replicas protect you from potential inconsistencies arising when a > transaction W10 addressing partitions p1, p2 comes from a different > majority (potentially isolated due to a network partition) from earlier > writes W[1,9] to p1 only? > > It seems that this may cause a sudden change to the dependancy graph for > partition p2 which may render it vulnerable to strange effects? > > Do we consider adversarial cases or any sort of byzantine faults? > (That’s a bit out of left field, feel free to kick me.) > > Why do we prefer Lamport clocks to vector clocks or other types of > logical clock? > > > > Slow Path > > > > > This value is proposed to at least a simple majority of nodes, along > with the union of the dependenciesreceived > > > > > > Related to the earlier point: when we say `union` here - what set are we > forming a union over? Is it a union of all dependancies t_n < t as seen by > all coordinators? I presume that the logic precludes the possibility that > these dependancies will conflict, since all foregoing transactions which > are in progress as dependancies must be non-conflicting with earlier > transactions in the dependancy graph? > > > > In any case, further information about how the dependancy graph is > computed would be interesting. > > > > > The inclusion of dependencies in the proposal is solely to facilitate > Recovery of other transactions that may be incomplete - these are stored on > each replica to facilitate decisions at recovery. > > > > > > Every replica? Or only those participating in the transaction? > > > > > If C fails to reach fast path consensus it takes the highest t it > witnessed from its responses, which constitutes a simple Lamport clock > value imposing a valid total order. This value is proposed to at least a > simple majority of nodes, > > > > > > When speaking about the simple majority of nodes to whom the max(t) > value returned will be proposed to - > > It sounds like this need not be the same majority from whom the original > sets of T_n and dependancies was obtained? > > Is there a proof to show that the dependancies created from the union of > the first set of replicas resolves to an acceptable dependancy graph for an > arbitrary majority of replicas? (Especially given that a majority of > replicas is not a majority of nodes, given we are in a cross-shard scenario > here). > > What happens in cases where the replica set has changed due to (a) > scaling RF in a single DC (b) adding a whole new DC? > > Wikipedia <https://en.wikipedia.org/wiki/Lamport_timestamp> tells me > that Lamport clocks only impose partial, not total order. I’m guessing > we’re thinking of a different type of logical clock when we speak of > Lamport clocks here (but my expertise is sketchy on this topic). > > > > Recovery > > > > I would be interested in further exploration of the unhappy path (where > 'a newer ballot has been issued by a recovery coordinator to take over the > transaction’). I understand that this may be partially covered in the > pseudocode for `Recovery` but I’m struggling to reconcile the ’new ballot > has been issued’ language with the ‘any R in responses had X as Applied, > Committed, or Accepted’ language. > > > > Well done again and thank you for pushing the envelope in this area > Benedict. > > > > Miles > > > > > On 15 Sep 2021, at 11:33 pm, bened...@apache.org wrote: > > > > > >> I would kind of expect this work, if it pans out, to _replace_ the > current paxos implementation > > > > > > That’s a good point. I think the clear direction of travel would be > total replacement of Paxos, but I anticipate that this will be > feature-flagged at least initially. So for some period of time we may > maintain both options, with the advanced CQL functionality disabled if you > opt for classic Paxos. > > > > > > I think this is a necessary corollary of a requirement to support live > upgrades – something that is non-negotiable IMO, but that I have also > neglected to discuss in the CEP. I will rectify this. An open question is > if we want to support live downgrades back to Classic Paxos. I kind of > expect that we will, though that will no doubt be informed by the > difficulty of doing so. > > > > > > Either way, this means the deprecation cycle for Classic Paxos is > probably a separate and future decision for the community. We could choose > to maintain it indefinitely, but I would vote to retire it the following > major version. > > > > > > A related open question is defaults – I would probably vote for new > clusters to default to Accord, and existing clusters to need to run a > migration command after fully upgrading the cluster. > > > > > > From: Sylvain Lebresne <lebre...@gmail.com> > > > Date: Wednesday, 15 September 2021 at 14:13 > > > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions > > > Fwiw, it makes sense to me to talk about CQL syntax evolution > separately. > > > > > > It's pretty clear to me that we _can_ extend CQL to make sure of a > general > > > purpose transaction mechanism, so I don't think deciding if we want a > > > general purpose transaction mechanism has to depend on deciding on the > > > syntax. Especially since the syntax question can get pretty far on its > own > > > and could be a serious upfront distraction. > > > > > > And as you said, there are even queries that can be expressed with the > > > current syntax that we refuse now and would be able to accept with > this, so > > > those could be "ground zero" of what this work would allow. > > > > > > But outside of pure syntax questions, one thing that I don't see > discussed > > > in the CEP (or did I miss it) is what the relationship of this new > > > mechanism with the existing paxos implementation would be? I would > kind of > > > expect this work, if it pans out, to _replace_ the current paxos > > > implementation (because 1) why not and 2) the idea of having 2 > > > serialization mechanisms that serialize separately sounds like a > nightmare > > > from the user POV) but it isn't stated clearly. If replacement is > indeed > > > the intent, then I think there needs to be a plan for the upgrade > path. If > > > that's not the intent, then what? > > > -- > > > Sylvain > > > > > > > > > On Wed, Sep 15, 2021 at 12:09 PM bened...@apache.org < > bened...@apache.org> > > > wrote: > > > > > >> Ok, so the act of typing out an example was actually a really good > > >> reminder of just how limited our functionality is today, even for > single > > >> partition operations. > > >> > > >> I don’t want to distract from any discussion around the underlying > > >> protocol, but we could kick off a separate conversation about how to > evolve > > >> CQL sooner than later if there is the appetite. There are no concrete > > >> proposals to discuss, it would be brainstorming. > > >> > > >> Do people also generally agree this work warrants a distinct CEP, or > would > > >> people prefer to see this developed under the same umbrella? > > >> > > >> > > >> > > >> From: bened...@apache.org <bened...@apache.org> > > >> Date: Wednesday, 15 September 2021 at 09:19 > > >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions > > >>> perhaps we can prepare these as examples > > >> > > >> There are grammatically correct CQL queries today that cannot be > executed, > > >> that this work will naturally remove the restrictions on. I’m > certainly > > >> happy to specify one of these for the CEP if it will help the reader. > > >> > > >> I want to exclude “new CQL commands” or any other enhancement to the > > >> grammar from the scope of the CEP, however. This work will enable a > range > > >> of improvements to the UX, but I think this work is a separate, > long-term > > >> project of evolution that deserves its own CEPs, and will likely > involve > > >> input from a wider range of contributors and users. If nobody else > starts > > >> such CEPs, I will do so in due course (much further down the line). > > >> > > >> Assuming there is not significant dissent on this point I will update > the > > >> CEP to reflect this non-goal. > > >> > > >> > > >> > > >> From: C. Scott Andreas <sc...@paradoxica.net> > > >> Date: Wednesday, 15 September 2021 at 00:31 > > >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > >> Cc: dev@cassandra.apache.org <dev@cassandra.apache.org> > > >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions > > >> Adding a few notes from my perspective as well – > > >> > > >> Re: the UX question, thanks for asking this. > > >> > > >> I agree that offering a set of example queries and use cases may help > make > > >> the specific use cases more understandable; perhaps we can prepare > these as > > >> examples to be included in the CEP. > > >> > > >> I do think that all potential UX directions begin with the > specification > > >> of the protocol that will underly them, as what can be expressed by > it may > > >> be a superset of what's immediately exposed by CQL. But at minimum > it's > > >> great to have a sense of the queries one might be able to issue to > focus a > > >> reading of the whitepaper. > > >> > > >> Re: "Can we not start using it as an external dependency, and later > > >> re-evaluate if it's necessary to bring it into the project or even > incubate > > >> it as another Apache project" > > >> > > >> I think it would be valuable to the project for the work to be > incubated > > >> in a separate repository as part of the Apache Cassandra project > itself, > > >> much like the in-JVM dtest API and Harry. This pattern worked well for > > >> those projects as they incubated as it allowed them to evolve outside > the > > >> primary codebase, but subject to the same project governance, set of > PMC > > >> members, committers, and so on. Like those libraries, it also makes > sense > > >> as the Cassandra project is the first (and at this time) only known > > >> intended consumer of the library, though there may be more in the > future. > > >> > > >> If the proposal is accepted, the time horizon envisioned for this > work's > > >> completion is ~9 months to a standard of production readiness. The > > >> contributors see value in the work being donated to and governed by > the > > >> contribution practices of the Foundation. Doing so ensures that it is > being > > >> developed openly and with full opportunity for review and > contribution of > > >> others, while also solidifying contribution of the IP to the project. > > >> > > >> Spinning up a separate ASF incubation project is an interesting idea, > but > > >> I feel that doing so would introduce a far greater overhead in > process and > > >> governance, and that the most suitable governance and set of > committers/PMC > > >> members are those of the Apache Cassandra project itself. > > >> > > >> On Sep 14, 2021, at 3:53 PM, "bened...@apache.org" < > bened...@apache.org> > > >> wrote: > > >> > > >> > > >> Hi Paulo, > > >> > > >> First and foremost, I believe this proposal in its current form > focuses on > > >> the protocol details (HOW?) but lacks the bigger picture on how this > is > > >> going to be exposed to the user (WHAT)? > > >> > > >> In my opinion this CEP embodies a coherent distinct and complex piece > of > > >> work, that requires specialist expertise. You have after all just > suggested > > >> a month to read only the existing proposal 😊 > > >> > > >> UX is a whole other kind of discussion, that can be quite > opinionated, and > > >> requires different expertise. It is in my opinion helpful to break > out work > > >> that is not tightly coupled, as well as work that requires different > > >> expertise. As you point out, multi-key UX features are largely > independent > > >> of any underlying implementation, likely can be done in parallel, and > even > > >> with different contributors. > > >> > > >> Can we not start using it as an external dependency > > >> > > >> I would love to understand your rationale, as this is a surprising > > >> suggestion to me. This is just like any other subsystem, but we would > be > > >> managing it as a separate library primarily for modularity reasons. > The > > >> reality is that this option should anyway be considered unavailable. > This > > >> is a proposed contribution to the Cassandra project, which we can > either > > >> accept or reject. > > >> > > >> Isn't this a good chance to make the serialization protocol pluggable > > >> with clearly defined integration points > > >> > > >> It has recently been demonstrated to be possible to build a system > that > > >> can safely switch between different consensus protocols. However, > this was > > >> very sophisticated work that would require its own CEP, one that we > would > > >> be unable to resource. Even if we could this would be insufficient. > This > > >> goal has never been achieved for a multi-shard transaction protocol > to my > > >> knowledge, and multi-shard transaction protocols are much more > divergent in > > >> implementation detail than consensus protocols. > > >> > > >> so we could easily switch implementations with different guarantees… > (ie. > > >> Apache Ratis) > > >> > > >> As far as I know, there are no other strict serializable protocols > > >> available to plug in today. Apache Ratis appears to be a > straightforward > > >> Raft implementation, and therefore it is a linearizable consensus > protocol. > > >> It is not multi-shard transaction protocol at all, let alone strict > > >> serializable. It could be used in place of Paxos, but not Accord. > > >> > > >> > > >> > > >> From: Paulo Motta <pauloricard...@gmail.com> > > >> Date: Tuesday, 14 September 2021 at 22:55 > > >> To: Cassandra DEV <dev@cassandra.apache.org> > > >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions > > >> I can start with some preliminary comments while I get more > familiarized > > >> with the proposal: > > >> > > >> - First and foremost, I believe this proposal in its current form > focuses > > >> on the protocol details (HOW?) but lacks the bigger picture on how > this is > > >> going to be exposed to the user (WHAT)? Is exposing linearizable > > >> transactions to the user not a goal of this proposal? If not, I think > the > > >> proposal is missing the UX (ie. what CQL commands are going to be > added > > >> etc) on how these transactions are going to be exposed. > > >> > > >> - Why do we need to bring the library into the project umbrella? Can > we not > > >> start using it as an external dependency, and later re-evaluate if > it's > > >> necessary to bring it into the project or even incubate it as another > > >> Apache project? I feel we may be importing unnecessary management > overhead > > >> into the project while only a small subset of contributors will be > involved > > >> with the core protocol. > > >> > > >> - Isn't this a good chance to make the serialization protocol > pluggable > > >> with clearly defined integration points, so we could easily switch > > >> implementations with different guarantees, trade-offs and performance > > >> considerations while leaving the UX intact? This would also allow us > to > > >> easily benchmark the protocol against alternatives (ie. Apache Ratis) > and > > >> validate the performance claims. I think the best way to do that > would be > > >> to define what the feature will look like to the end user (UX), > define the > > >> integration points necessary to support this feature, and use accord > as the > > >> first implementation of these integration points. > > >> > > >> Em ter., 14 de set. de 2021 às 17:57, Paulo Motta < > > >> pauloricard...@gmail.com> > > >> escreveu: > > >> > > >> Given the extensiveness and complexity of the proposal I'd suggest > leaving > > >> it a little longer (perhaps 4 weeks from the publish date?) for > people to > > >> get a bit more familiarized and have the chance to comment before > casting a > > >> vote. I glanced through the proposal - and it looks outstanding, very > > >> promising work guys! - but would like a bit more time to take a > deeper look > > >> and digest it before potentially commenting on it. > > >> > > >> Em ter., 14 de set. de 2021 às 17:30, bened...@apache.org < > > >> bened...@apache.org> escreveu: > > >> > > >> Has anyone had a chance to read the drafts, and has any feedback or > > >> questions? Does anybody still anticipate doing so in the near future? > Or > > >> shall we move to a vote? > > >> > > >> From: bened...@apache.org <bened...@apache.org> > > >> Date: Tuesday, 7 September 2021 at 21:27 > > >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions > > >> Hi Jake, > > >> > > >>> What structural changes are planned to support an external dependency > > >> project like this > > >> > > >> To add to Blake’s answer, in case there’s some confusion over this, > the > > >> proposal is to include this library within the Apache Cassandra > project. So > > >> I wouldn’t think of it as an external dependency. This PMC and > community > > >> will still have the usual oversight over direction and development, > and > > >> APIs will be developed solely with the intention of their integration > with > > >> Cassandra. > > >> > > >>> Will this effort eventually replace consistency levels in C*? > > >> > > >> I hope we’ll have some very related discussions around consistency > levels > > >> in the coming months more generally, but I don’t think that is tightly > > >> coupled to this work. I agree with you both that we won’t want to > > >> perpetuate the problems you’ve highlighted though. > > >> > > >> Henrik: > > >>> I was referring to the property that Calvin transactions also need to > > >> be sent to the cluster in a single shot > > >> > > >> Ah, yes. In that case I agree, and I tried to point to this direction > in > > >> an earlier email, where I discussed the use of scripting languages > (i.e. > > >> transactionally modifying the database with some subset of arbitrary > > >> computation). I think the JVM is particularly suited to offering quite > > >> powerful distributed transactions in this vein, and it will be > interesting > > >> to see what we might develop in this direction in future. > > >> > > >> > > >> From: Jake Luciani <jak...@gmail.com> > > >> Date: Tuesday, 7 September 2021 at 19:27 > > >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions > > >> Great thanks for the information > > >> > > >> On Tue, Sep 7, 2021 at 12:44 PM Blake Eggleston > > >> <beggles...@apple.com.invalid> wrote: > > >> > > >>> Hi Jake, > > >>> > > >>>> 1. Will this effort eventually replace consistency levels in C*? I > > >> ask > > >>>> because one of the shortcomings of our paxos today is > > >>>> it can be easily mixed with non serialized consistencies and > therefore > > >>>> users commonly break consistency by for example reading at CL.ONE > > >> while > > >>>> also > > >>>> using LWTs. > > >>> > > >>> This will likely require CLs to be specified at the schema level for > > >>> tables using multi partition transactions. I’d expect this to be > > >> available > > >>> for other tables, but not required. > > >>> > > >>>> 2. What structural changes are planned to support an external > > >> dependency > > >>>> project like this? Are there some high level interfaces you expect > > >> the > > >>>> project to adhere to? > > >>> > > >>> There will be some interfaces that need to be implemented in C* to > > >> support > > >>> the library. You can find the current interfaces in the accord.api > > >> package, > > >>> but these were written to support some initial testing, and not > intended > > >>> for integration into C* as is. Things are pretty fluid right now and > > >> will > > >>> be rewritten / refactored multiple times over the next few months. > > >>> > > >>> Thanks, > > >>> > > >>> Blake > > >>> > > >>> > > >>>> On Sun, Sep 5, 2021 at 10:33 AM bened...@apache.org < > > >> bened...@apache.org > > >>>> > > >>>> wrote: > > >>>> > > >>>>> Wiki: > > >>>>> > > >>> > > >> > > >> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions > > >>>>> Whitepaper: > > >>>>> > > >>> > > >> > > >> > https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf > > >>>>> < > > >>>>> > > >>> > > >> > > >> > https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2 > > >>>>>> > > >>>>> Prototype: https://github.com/belliottsmith/accord > > >>>>> > > >>>>> Hi everyone, I’d like to propose this CEP for adoption by the > > >> community. > > >>>>> > > >>>>> Cassandra has benefitted from LWTs for many years, but application > > >>>>> developers that want to ensure consistency for complex operations > > >> must > > >>>>> either accept the scalability bottleneck of serializing all related > > >>> state > > >>>>> through a single partition, or layer a complex state machine on top > > >> of > > >>> the > > >>>>> database. These are sophisticated and costly activities that our > > >> users > > >>>>> should not be expected to undertake. Since distributed databases > are > > >>>>> beginning to offer distributed transactions with fewer caveats, it > is > > >>> past > > >>>>> time for Cassandra to do so as well. > > >>>>> > > >>>>> This CEP proposes the use of several novel techniques that build > upon > > >>>>> research (that followed EPaxos) to deliver (non-interactive) > general > > >>>>> purpose distributed transactions. The approach is outlined in the > > >>> wikipage > > >>>>> and in more detail in the linked whitepaper. Importantly, by > adopting > > >>> this > > >>>>> approach we will be the _only_ distributed database to offer > global, > > >>>>> scalable, strict serializable transactions in one wide area > > >> round-trip. > > >>>>> This would represent a significant improvement in the state of the > > >> art, > > >>>>> both in the academic literature and in commercial or open source > > >>> offerings. > > >>>>> > > >>>>> This work has been partially realised in a prototype. This partial > > >>>>> prototype has been verified against Jepsen.io’s Maelstrom library > and > > >>>>> dedicated in-tree strict serializability verification tools, but > much > > >>> work > > >>>>> remains for the work to be production capable and integrated into > > >>> Cassandra. > > >>>>> > > >>>>> I propose including the prototype in the project as a new source > > >>>>> repository, to be developed as a standalone library for integration > > >> into > > >>>>> Cassandra. I hope the community sees the important value > proposition > > >> of > > >>>>> this proposal, and will adopt the CEP after this discussion, so > that > > >> the > > >>>>> library and its integration into Cassandra can be developed in > > >> parallel > > >>> and > > >>>>> with the involvement of the wider community. > > >>>>> > > >>>> > > >>>> > > >>>> -- > > >>>> http://twitter.com/tjake > > >>> > > >>> > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>> > > >>> > > >> > > >> -- > > >> http://twitter.com/tjake > > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org >