Re: [DISCUSS] CEP-15: General Purpose Transactions

bened...@apache.org Tue, 12 Oct 2021 15:21:21 -0700

Thanks Alex! I’ve hugely appreciated our exploration of the optimisation space 
of Accord, and for you to have taken the time to summarise it for everyone is 
particularly decent of you.

FWIW, I think there are likely some easy optimisations for providing snapshot 
isolation without an initial WAN round-trip for many transactions. Replicas 
will be tracking their progress with respect to the global log, and so may 
maintain a high watermark for applied transactions, so that if the latest 
timestamps on all replicas for the keys occur below their high watermark then 
the local replicas are consistent as of any timestamp we may select earlier 
than this (and depending how (or if) MVCC is implemented we may prefer to pick 
the latest timestamp, or an earlier one). If we later involve a shard that is 
not consistent up to this timestamp then we may need a WAN round-trip to ensure 
it is consistent (but this might not need to be global, only to the nearest DC 
that has a sufficiently high watermark).

I could imagine using this mechanism to guarantee serializable reads over the 
LAN by ensuring shards maintain MVCC history that goes far enough back to 
intersect with the lowest high watermark, so that we may always pick a 
consistent timestamp.

From: Alex Miller <millerde...@gmail.com>
Date: Tuesday, 12 October 2021 at 22:25
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I have, purely out of laziness, been engaging on this topic on ASF Slack as
opposed to dev@[1].  Benedict has been overly generous in answering
questions and considering future optimizations there, but it means that I
inadvertently forked the conversation on this topic.  To bring the
highlights of that conversation back to the dev list:

[1]: https://the-asf.slack.com/archives/CK23JSY2K/p1631611705108600

== Reduced Conflict Tracking

The Accord whitepaper specifies a transaction conflict as:

> We say that two transactions γ and τ conflict (γ ∼ τ) if their execution
is not commutative, so that either their response or the database state
would differ if their execution order were reversed.

Which means that all conflicts the protocol is subsequently tracking are
the full set of read-after-write, write-after-write, and write-after-read
conflicts.  This is a superset of what is required for correctness.

write-after-read conflicts may be ignored when the underlying storage is
multi-version, and I'm told the plan is that Accord would be implemented on
top of multiversioned storage.  A read submitted to a multi-versioned
database is unaffected by writes that occur later, and as such,
write-after-read conflicts don't need to be tracked.

Write-after-write conflicts may be ignored, as Accord assigns a single
write timestamp to all writes, and all writes appear atomically at a single
consistent version.  This means that Accord implements write snapshots[2],
and thus it is impossible to cause a cycle of transaction conflicts with
only writes, so they don't need to be recorded as conflicts.

Thus, Accord only needs to track read-after-write conflicts, which is a
nice reduction to the metadata overhead involved in tracking and
propagating transaction conflicts.

[2]: Maysam Yabandeh and Daniel Gómez Ferro. 2012. A critique of snapshot
isolation. In Proceedings of the 7th ACM European conference on Computer
Systems (EuroSys ’12). Association for Computing Machinery, New York, NY,
USA, 155–168. DOI:https://doi.org/10.1145/2168836.2168853

== Read-Only Transaction Optimizations

As previously mentioned in this list, Calvin-derived designs end up in an
uncomfortable situation where strictly-serializable reads need to be
committed to disk as part of a batch to be assigned a serialization order,
and then wait for all previously scheduled transactions to finish before
performing the reads.  This brings me sadness in two different ways:
strictly serializable reads have high latency, and read-only transactions
involve writes to disk.  Stale read snapshots are offered as a way to avoid
downsides, but require being able to tolerate staleness.

The Accord whitepaper specifies journaling a read-only transaction to disk
as part of PreAccept to record both the existence of the transaction and
its conflicts.  As read-only transactions don't affect the database state,
it's okay to not have durable consensus on if they committed or not.
Read-only transactions have no side effects by definition, and one may rely
on clients to retry if the read-only transaction failed, thus PreAccept
doesn't need to durably record the existence of Read-Only transactions.
Nor does it need to track them for dependencies/conflict reasons, as such
information would only be needed to track write-after-read conflicts, which
we may omit as discussed above.

Additionally, as read-only transactions will always be aborted during
recovery, they may treat a majority quorum as a fastpath quorum, and never
need to proceed into a second round in order to "commit".

== Interactive Transactions

Across a few sub-threads in our slack thread, I think we worked out the
details in enough clarity that I'm agreeing with the belief that there's a
reasonable interactive transaction protocol hiding within Accord:

1. To obtain a consistent read snapshot, a client can either:
    a. Pick a timestamp, and wait  $WAN RTT + \epsilon$ for all concurrent
transactions to be committed or aborted.
    b. Contact a quorum from each partition the client wishes to read
from.  This will take about the same amount of time, so (a) might be
preferable unless $\epsilon$ is large (which it shouldn't be) or client
clocks are untrustworthy.
2. The client can then issue any arbitrary number of reads to replicas at
this version.  Any fully applied replica may be used.
3. To commit, the client contacts a quorum of replicas for each partition
that was read or written from, and provides the set of reads and writes for
the partition.
4. As part of PreAccept, each replica in a quorum will verify that no
writes were accepted between the read timestamp and the proposed commit
timestamp which intersect with the read set.  If there are any, the replica
votes to abort and retry the transaction.

Which there's likely some details that I'm missing, and Benedict will leap
up with some corrections and cautions about incompleteness, but it's enough
that I'm feeling reasonably confident that there's a better version of this
that will actually work.  In particular, this protocol would mean that
transactions which encounter conflicts aren't logged to disk in most cases
(a minority of the quorum still might), unlike most commit-then-execute
designs which log the transaction to disk on every execution attempt.  This
means highly skewed interactive transaction workloads will have a much more
limited impact on the database.  (In doing some skimming across papers, I
discovered that vCorfu/CorfuDB[3] does interactive transactions on top of a
shared distributed log in a somewhat conceptually similar way.)

We also had a bit of discussion over implementation constraints on the
conflict checking.  Without supporting optimistic transactions, Accord only
needs to keep track of the read/write sets of transactions which are still
in flight.  To support optimistic transactions, Accord would need to
bookkeep the most recent timestamp at which the key was modified, for every
key.  There's some databases (e.g. CockroachDB, FoundationDB) which have a
similar need, and use similar data structures which could be copied.

[3]: Michael Wei, Amy Tai, Christopher J. Rossbach, Ittai Abraham, Maithem
Munshed, Medhavi Dhawan, Jim Stabile, Udi Wieder, Scott Fritchie, Steven
Swanson, Michael J. Freedman, & Dahlia Malkhi (2017). vCorfu: A Cloud-Scale
Object Store on a Shared Log. In the14th USENIX Symposium on Networked
Systems Design and Implementation (NSDI 17) (pp. 35–49). USENIX Association.

== Tradeoffs

There's a number of advantages and disadvantages that Accord takes, a
number of which are inherited from being in the class of
commit-then-execute protocols.

Accord relies on synchronized clocks.  Exceedingly poorly synchronized
clocks don't result in correctness violations though, only perceived
unavailability to a client or of a replica.  This avoids the ire and
pitfalls of most other clock-based designs, of which I enjoy the clickbait
title of "NewSQL database systems are failing to guarantee consistency, and
I blame Spanner"[4]. This also means that one can set a much more
aggressive bound on clock skew than most other databases can do, as very
rarely exceeding the clock skew bound will just be perceived the same as a
few message drops.  One would also hope that existing Cassandra users,
having already been warned about Last Writer Wins with poorly synchronized
clocks, would already have checked their NTP setup.

Committing a transaction before execution means the database is committed
to performing the deferred work of transaction execution.  In some fashion,
the expressiveness and complexity of the query language needs to be
constrained to place limitations on the execution time or resources. Fauna
invented FQL with a specific set of limitations for a presumable reason.
CQL seems to already be a reasonably limited query language that doesn't
easily lend itself to succinctly expressing an incredulous amount of work,
which would make it already reasonably suited as a query language for
Accord.

Any query which can't pre-declare its read and write sets must attempt to
pre-execute enough of the query to determine them, and then submit the
transaction as optimistic on all values read during the partial execution
still being untouched.  Most notably, all workloads that utilize secondary
indexes are affected, and degrade from being guaranteed to commit, to being
optimistic and potentially requiring retries.  This transformed Calvin into
an optimistic protocol, and one that's significantly less efficient than
classic execute-then-commit designs.  Accord is similarly affected, though
the window of optimism would likely be smaller.  However, it seems like
most common ways to end up in this situation are already discouraged or
prevented.  CQL's own restrictions prevent many forms of queries which
result in unclear read and write sets.  In my highly limited Cassandra
experience, I've generally seen Secondary Indexes be cautioned against
already.

[4]:
http://dbmsmusings.blogspot.com/2018/09/newsql-database-systems-are-failing-to.html

== Conclusion

I thought Accord and Cassandra seemed remarkably well matched, as Accord's
weaknesses are already forbidden or anti-patterns in Cassandra.  Accord
suffers from the downsides of commit-then-execute databases less than
alternative designs, and seems to have optimizations available to remove
some weaknesses entirely, which makes me favorable towards the design in
general.  Most limitations that one would desire for a query language are
already present in CQL.  The leaderless consensus prioritizes availability
in the same way that Cassandra does, which would similarly make a Raft-like
design seem awkward.  Having a path towards efficient interactive
transactions later means that the commit-then-execute design doesn't feel
like it's placing strong limitations on what higher-level workloads could
be supported in the future.

So I'm +1 the work, as it seems to be a general purpose and interesting
transaction protocol, but I'm also just here because I thought Benedict was
nice enough that I could trick him into discussing transaction processing
with me. ;)

On Mon, Oct 11, 2021 at 9:08 AM Aleksey Yeschenko <alek...@apache.org>
wrote:

> Lacking the most basic support for multi-partition transactions is a
> serious handicap. The CEP offers a concrete solution.
>
> It’s possible to solve multi-partition transactions in a myriad of other
> ways, I’m sure, but CEP-15 is what’s on offer for Cassandra at the moment,
> and I’m not seeing any alternative CEPs with folks lined up to implement
> them.
>
> The CEP is a clear and meaningful improvement over status quo. The
> engineers behind it are committed to doing the implementation work and can
> be trusted to stick around for maintenance. It’s been a month now, please,
> let’s get this going.
>
> > On 11 Oct 2021, at 13:43, bened...@apache.org wrote:
> >
> > For those who missed it, my talk discussing this CEP at ApacheCon is now
> available to view:  https://www.youtube.com/watch?v=YAE7E-QEAvk
> >
> >
> >
> > From: Oleksandr Petrov <oleksandr.pet...@gmail.com>
> > Date: Monday, 11 October 2021 at 10:11
> > To: dev <dev@cassandra.apache.org>
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >> I support this proposal. From what I can understand, this proposal
> moves
> > us towards having the building blocks we need to correctly deliver some
> of
> > the most often requested features in Cassandra.
> >
> > Same here. I also support this proposal and believe it opens up many new
> > opportunities (while not limiting us / not narrowing our future options),
> > can help us implement features we've all wanted to have implemented for
> > years, and make significant improvements in the subsystems that were a
> > source of issues for a long time.
> >
> > I think it's also good to start with CAS batches: it's a great way to
> make
> > the feature available and work incrementally. After this lands, people
> will
> > be able to use Accord/MPT in different subsystems and get busy
> > implementing all sorts of other features and improvements on top of it.
> >
> >
> >
> >
> > On Sat, Oct 9, 2021 at 4:18 PM Joseph Lynch <joe.e.ly...@gmail.com>
> wrote:
> >
> >>> With the proposal hitting the one-month mark, the contributors are
> >> interested in gauging the developer community's response to the
> proposal.
> >>
> >> I support this proposal. From what I can understand, this proposal
> >> moves us towards having the building blocks we need to correctly
> >> deliver some of the most often requested features in Cassandra. For
> >> example it seems to unlock: batches that actually work, registers that
> >> offer fast compare and swap, global secondary indices that can be
> >> correctly maintained, and more. Therefore, given the benefit to the
> >> community, I support working towards that foundation that will allow
> >> us to build solutions in Cassandra that pay consensus closer to
> >> mutation instead of lazily at read/repair time.
> >>
> >> I think the feedback in this thread around interface (what statements
> >> will this facilitate and how will the library integrate with Cassandra
> >> itself), performance (how fast will these transactions be, will we
> >> offer bounded stale reads, etc ...), and implementation (how does this
> >> compare/contrast with other consensus approaches) has been
> >> informative, but at this point I think it makes sense to start trying
> >> to make incremental progress towards a functional integration to
> >> discover any remaining areas for improvement.
> >>
> >> Cheers and thank you!
> >> -Joey
> >>
> >>
> >>
> >> On Thu, Oct 7, 2021 at 10:51 AM C. Scott Andreas <sc...@paradoxica.net>
> >> wrote:
> >>>
> >>> Hi Jonathan,
> >>>
> >>> Following up on my message yesterday as it looks like our replies may
> >> have crossed en route.
> >>>
> >>> Thanks for bumping your message from earlier in our discussion. I
> >> believe we have addressed most of these questions on the thread, in
> >> addition to offering a presentation on this and related work at
> ApacheCon,
> >> a discussion hosted following that presentation at ApacheCon, and in ASF
> >> Slack. Contributors have further offered an opportuntity to discuss
> >> specific questions via videoconference if it helps to speak live. I'd be
> >> happy to do so as well.
> >>>
> >>> Since your original message, discussion has covered a lot of ground on
> >> the related databases you've mentioned:
> >>> – Henrik has shared expertise related to MongoDB and its
> implementation.
> >>> – You've shared an overview of Calvin.
> >>> – Alex Miller has helped us review the work relative to other Paxos
> >> algorithms and identified a few great enhancements to incorporate.
> >>> – The paper discusses related approaches in FoundationDB, CockroachDB,
> >> and Yugabyte.
> >>> – Subsequent discussion has contrasted the implementation to DynamoDB,
> >> Google Cloud BigTable, and Google Cloud Spanner (noting specifically
> that
> >> the protocol achieves Spanner's 1x round-trip without requiring
> specialized
> >> hardware).
> >>>
> >>> In my reply yesterday, I've attempted to crystallize what becomes
> >> possible via CQL: one-shot multi-partition transactions in the first
> >> implementation and a 4x latency reduction on writes / 2x latency
> reduction
> >> on reads relative to today; along with the ability to build upon this
> work
> >> to enable interactive transactions in the future.
> >>>
> >>> I believe we've exercised the questions you've raised and am grateful
> >> for the ground we've covered. If you have further questions that are
> >> difficult to exercise via email, please let me know if you'd like to
> >> arrange a call (open-invite); we'd be happy to discuss live as well.
> >>>
> >>> With the proposal hitting the one-month mark, the contributors are
> >> interested in gauging the developer community's response to the
> proposal.
> >> We warrant our ability to focus durably on the project; execute this
> >> development on ASF JIRA in collaboration with other contributors; engage
> >> with members of the developer and user community on feedback,
> enhancements,
> >> and bugs; and intend deliver it to completion at a standard of readiness
> >> suitable for production transactional systems of record.
> >>>
> >>> Thanks,
> >>>
> >>> – Scott
> >>>
> >>> On Oct 6, 2021, at 8:25 AM, C. Scott Andreas <sc...@paradoxica.net>
> >> wrote:
> >>>
> >>>
> >>>
> >>> Hi folks,
> >>>
> >>> Thanks for discussion on this proposal, and also to Benedict who’s been
> >> fielding questions on the list!
> >>>
> >>> I’d like to restate the goals and problem statement captured by this
> >> proposal and frame context.
> >>>
> >>> Today, lightweight transactions limit users to transacting over a
> single
> >> partition. This unit of atomicity has a very low upper limit in terms of
> >> the amount of data that can be CAS’d over; and doing so leads many to
> >> design contorted data models to cram different types of data into one
> >> partition for the purposes of being able to CAS over it. We propose that
> >> Cassandra can and should be extended to remove this limit, enabling
> users
> >> to issue one-shot transactions that CAS over multiple keys – including
> CAS
> >> batches, which may modify multiple keys.
> >>>
> >>> To enable this, the CEP authors have designed a novel, leaderless
> >> paxos-based protocol unique to Cassandra, offered a proof of its
> >> correctness, a whitepaper outlining it in detail, along with a prototype
> >> implementation to incubate development, and integrated it with Maelstrom
> >> from jepsen.io to validate linearizability as more specific test
> >> infrastructure is developed. This rigor is remarkable, and I’m thrilled
> to
> >> see such a degree of investment in the area.
> >>>
> >>> Even users who do not require the capability to transact across
> >> partition boundaries will benefit. The protocol reduces message/WAN
> >> round-trips by 4x on writes (4 → 1) and 2x on reads (2 → 1) in the
> common
> >> case against today’s baseline. These latency improvements coupled with
> the
> >> enhanced flexibility of what can be transacted over in Cassandra enable
> new
> >> classes of applications to use the database.
> >>>
> >>> In particular, 1xRTT read/write transactions across partitions enable
> >> Cassandra to be thought of not just as a strongly consistent database,
> but
> >> even a transactional database - a mode many may even prefer to use by
> >> default. Given this capability, Apache Cassandra has an opportunity to
> >> become one of – or perhaps the only – database in the industry that can
> >> store multiple petabytes of data in a single database; replicate it
> across
> >> many regions; and allow users to transact over any subset of it. These
> are
> >> capabilities that can be met by no other system I’m aware of on the
> market.
> >> Dynamo’s transactions are single-DC. Google Cloud BigTable does not
> support
> >> transactions. Spanner, Aurora, CloudSQL, and RDS have far lower
> scalability
> >> limits or require specialized hardware, etc.
> >>>
> >>> This is an incredible opportunity for Apache Cassandra - to surpass the
> >> scalability and transactional capability of some of the most advanced
> >> systems in our industry - and to do so in open source, where anyone can
> >> download and deploy the software to achieve this without cost; and for
> >> students and researchers to learn from and build upon as well (a team
> from
> >> UT-Austin has already reached out to this effect).
> >>>
> >>> As Benedict and Blake noted, the scope of what’s captured in this
> >> proposal is also not terminal. While the first implementation may extend
> >> today’s CAS semantics to multiple partitions with lower latency, the
> >> foundation is suitable to build interactive transactions as well — which
> >> would be remarkable and is something that I hadn’t considered myself at
> the
> >> onset of this project.
> >>>
> >>> To that end, the CEP proposes the protocol, offers a validated
> >> implementation, and the initial capability of extending today’s
> >> single-partition transactions to multi-partition; while providing the
> >> flexibility to build upon this work further.
> >>>
> >>> A simple example of what becomes possible when this work lands and is
> >> integrated might be:
> >>>
> >>> –––
> >>> BEGIN BATCH
> >>> UPDATE tbl1 SET value1 = newValue1 WHERE partitionKey = k1
> >>> UPDATE tbl2 SET value2 = newValue2 WHERE partitionKey = k2 AND
> >> conditionValue = someCondition
> >>> APPLY BATCH
> >>> –––
> >>>
> >>> I understand that this query is present in the CEP and my intent isn’t
> >> to recommend that folks reread it if they’ve given a careful reading
> >> already. But I do think it’s important to elaborate upon what becomes
> >> possible when this query can be issued.
> >>>
> >>> Users of Cassandra who have designed data models that cram many types
> of
> >> data into a single partition for the purposes of atomicity no longer
> need
> >> to. They can design their applications with appropriate schemas that
> >> wouldn’t leave Codd holding his nose. They’re no longer pushed into
> >> antipatterns that result in these partitions becoming huge and
> potentially
> >> unreadable. Cassandra doesn’t become fully relational in this CEP - but
> it
> >> becomes possible and even easy to design applications that transact
> across
> >> tables that mimic a large amount of relational functionality. And for
> users
> >> who are content to transact over a single table, they’ll find those
> >> transactions become up to 4x faster today due to the protocol’s
> reduction
> >> in round-trips. The library’s loose coupling to Apache Cassandra and
> >> ability to be incubated out-of-tree also enables other applications to
> take
> >> advantage of the protocol and is a nice step toward bringing modularity
> to
> >> the project. There are a lot of good things happening here.
> >>>
> >>> I know I’m listed as an author - but figured I should go on record to
> >> say “I support this CEP.” :)
> >>>
> >>> Thanks,
> >>>
> >>> – Scott
> >>>
> >>> On Oct 6, 2021, at 8:05 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
> >>>
> >>>
> >>> The problem that I keep pointing out is that you've created this CEP
> for
> >>> Accord without first getting consensus that the goals and the tradeoffs
> >> it
> >>> makes to achieve those goals (and that it will impose on future work
> >> around
> >>> transactions) are the right ones for Cassandra long term.
> >>>
> >>> At this point I'm done repeating myself. For the convenience of anyone
> >>> following this thread intermittently, I'll quote my first reply on this
> >>> thread to illustrate the kind of discussion I'd like to have.
> >>>
> >>> -----
> >>>
> >>> The whitepaper here is a good description of the consensus algorithm
> >> itself
> >>> as well as its robustness and stability characteristics, and its
> >> comparison
> >>> with other state-of-the-art consensus algorithms is very useful. In the
> >>> context of Cassandra, where a consensus algorithm is only part of what
> >> will
> >>> be implemented, I'd like to see a more complete evaluation of the
> >>> transactional side of things as well, including performance
> >> characteristics
> >>> as well as the types of transactions that can be supported and at
> least a
> >>> general idea of what it would look like applied to Cassandra. This will
> >>> allow the PMC to make a more informed decision about what tradeoffs are
> >>> best for the entire long-term project of first supplementing and
> >> ultimately
> >>> replacing LWT.
> >>>
> >>> (Allowing users to mix LWT and AP Cassandra operations against the same
> >>> rows was probably a mistake, so in contrast with LWT we’re not looking
> >> for
> >>> something fast enough for occasional use but rather something within a
> >>> reasonable factor of AP operations, appropriate to being the only way
> to
> >>> interact with tables declared as such.)
> >>>
> >>> Besides Accord, this should cover
> >>>
> >>> - Calvin and FaunaDB
> >>> - A Spanner derivative (no opinion on whether that should be Cockroach
> or
> >>> Yugabyte, I don’t think it’s necessary to cover both)
> >>> - A 2PC implementation (the Accord paper mentions DynamoDB but I
> suspect
> >>> there is more public information about MongoDB)
> >>> - RAMP
> >>>
> >>> Here’s an example of what I mean:
> >>>
> >>> =Calvin=
> >>>
> >>> Approach: global consensus (Paxos in Calvin, Raft in FaunaDB) to order
> >>> transactions, then replicas execute the transactions independently with
> >> no
> >>> further coordination. No SPOF. Transactions are batched by each
> sequencer
> >>> to keep this from becoming a bottleneck.
> >>>
> >>> Performance: Calvin paper (published 2012) reports linear scaling of
> >> TPC-C
> >>> New Order up to 500,000 transactions/s on 100 machines (EC2 XL machines
> >>> with 7GB ram and 8 virtual cores). Note that TPC-C New Order is
> composed
> >>> of four reads and four writes, so this is effectively 2M reads and 2M
> >>> writes as we normally measure them in C*.
> >>>
> >>> Calvin supports mixed read/write transactions, but because the
> >> transaction
> >>> execution logic requires knowing all partition keys in advance to
> ensure
> >>> that all replicas can reproduce the same results with no coordination,
> >>> reads against non-PK predicates must be done ahead of time
> >> (transparently,
> >>> by the server) to determine the set of keys, and this must be retried
> if
> >>> the set of rows affected is updated before the actual transaction
> >> executes.
> >>>
> >>> Batching and global consensus adds latency -- 100ms in the Calvin paper
> >> and
> >>> apparently about 50ms in FaunaDB. Glass half full: all transactions
> >>> (including multi-partition updates) are equally performant in Calvin
> >> since
> >>> the coordination is handled up front in the sequencing step. Glass half
> >>> empty: even single-row reads and writes have to pay the full
> coordination
> >>> cost. Fauna has optimized this away for reads but I am not aware of a
> >>> description of how they changed the design to allow this.
> >>>
> >>> Functionality and limitations: since the entire transaction must be
> known
> >>> in advance to allow coordination-less execution at the replicas, Calvin
> >>> cannot support interactive transactions at all. FaunaDB mitigates this
> by
> >>> allowing server-side logic to be included, but a Calvin approach will
> >> never
> >>> be able to offer SQL compatibility.
> >>>
> >>> Guarantees: Calvin transactions are strictly serializable. There is no
> >>> additional complexity or performance hit to generalizing to multiple
> >>> regions, apart from the speed of light. And since Calvin is already
> >> paying
> >>> a batching latency penalty, this is less painful than for other
> systems.
> >>>
> >>> Application to Cassandra: B-. Distributed transactions are handled by
> the
> >>> sequencing and scheduling layers, which are leaderless, and Calvin’s
> >>> requirements for the storage layer are easily met by C*. But Calvin
> also
> >>> requires a global consensus protocol and LWT is almost certainly not
> >>> sufficiently performant, so this would require ZK or etcd (reasonable
> >> for a
> >>> library approach but not for replacing LWT in C* itself), or an
> >>> implementation of Accord. I don’t believe Calvin would require
> additional
> >>> table-level metadata in Cassandra.
> >>>
> >>> On Wed, Oct 6, 2021 at 9:53 AM bened...@apache.org <
> bened...@apache.org>
> >>> wrote:
> >>>
> >>> The problem with dropping a patch on Jira is that there is no
> opportunity
> >>> to point out problems, either with the fundamental approach or with the
> >>> specific implementation. So please point out some problems I can engage
> >>> with!
> >>>
> >>>
> >>> From: Jonathan Ellis <jbel...@gmail.com>
> >>> Date: Wednesday, 6 October 2021 at 15:48
> >>> To: dev <dev@cassandra.apache.org>
> >>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >>> On Wed, Oct 6, 2021 at 9:21 AM bened...@apache.org <
> bened...@apache.org>
> >>> wrote:
> >>>
> >>>> The goals of the CEP are stated clearly, and these were the goals we
> >> had
> >>>> going into the (multi-month) research project we undertook before
> >>> proposing
> >>>> this CEP. These goals are necessarily value judgements, so we cannot
> >>> expect
> >>>> that everyone will agree that they are optimal.
> >>>>
> >>>
> >>> Right, so I'm saying that this is exactly the most important thing to
> get
> >>> consensus on, and creating a CEP for a protocol to achieve goals that
> you
> >>> have not discussed with the community is the CEP equivalent of
> dropping a
> >>> patch on Jira without discussing its goals either.
> >>>
> >>> That's why our conversations haven't gone anywhere, because I keep
> saying
> >>> "we need discuss the goals and tradeoffs", and I'll give an example of
> >> what
> >>> I mean, and you keep addressing the examples (sometimes very shallowly,
> >> "it
> >>> would be possible to X" or "Y could be done as an optimization") while
> >>> ignoring the request to open a discussion around the big picture.
> >>>
> >>>
> >>>
> >>> --
> >>> Jonathan Ellis
> >>> co-founder, http://www.datastax.com
> >>> @spyced
> >>>
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
> >
> > --
> > alex p
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: [DISCUSS] CEP-15: General Purpose Transactions

Reply via email to