Re: server side describe

2020-04-03 Thread bened...@apache.org
> scope creep.

I think it is unfair to label this scope creep; it would have to be newly 
considered for 4.0 for it to fall under that umbrella.

I don't personally mind if it lands, but this was discussed at length on 
multiple occasions over the past year, and only stalled because of a 
combination of lack of etiquette, and a lack of leadership from e.g. PMC in 
resolving various political questions over the course of events.

I also struggle to see how this would invalidate testing in any significant 
way?  It doesn't modify any existing behaviour.


From: Joshua McKenzie 
Sent: 01 April 2020 19:24
To: dev@cassandra.apache.org 
Subject: Re: server side describe

This looks like a feature that'd potentially invalidate some testing that's
been done and we've been feature frozen for over a year and a half. Also:
scope creep.

My PoV is we hold off. If we get into a cadence of more frequent releases
we'll have it soon enough.

On Wed, Apr 1, 2020 at 3:03 PM  wrote:

> Hi,
> Normally I ping the person on the ticket or in Slack to ask him/her for
> status update and whether I can help. Then probably he/she gives me a
> direction.
> If I can’t find the person anymore, I just use my best judgement and
> coordinate with people who might know better than me.
> For now this strategy worked for me personally.
> Hope this helps
> Ekaterina
>
> Sent from my iPhone
>
> > On 1 Apr 2020, at 14:27, Jon Haddad  wrote:
> >
> > Hey folks,
> >
> > I was looking through our open JIRAs and realized we hadn't merged in
> > server side describe calls yet.  The ticket died off a ways ago, and I
> > pinged Chris about it yesterday.  He's got a lot of his plate and won't
> be
> > able to work on it anytime soon.  I still think we should include this in
> > 4.0.
> >
> > From a technical standpoint, It doesn't say much on the ticket after
> Robert
> > tossed an alternative patch out there.  I don't mind reviewing and
> merging
> > either of them, it sounded like both are pretty close to done and I think
> > from the perspective of updating drivers for 4.0 this will save quite a
> bit
> > of time since driver maintainers won't have to add new CQL generation for
> > the various new options that have recently appeared.
> >
> > Questions:
> >
> > * Does anyone have an objection to getting this into 4.0? The patches
> > aren't too huge, I think they're low risk, and also fairly high reward.
> > * I don't have an opinion (yet) on Robert's patch vs Chris's, with regard
> > to which is preferable.
> > * Since soon after Robert put up his PR he hasn't been around, at least
> as
> > far as I've seen.  How have we dealt with abandoned patches before?  If
> > we're going to add this in the patch will need some cleanup.  Is it
> > reasonable to continue someone else's work when they've disappeared?
> >
> > Jon
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


[DISCUSS] CEP-10: Cluster and Code Simulations

2021-06-03 Thread bened...@apache.org
Proposal for a mechanism to evaluate whole clusters, or individual classes, 
with a deterministically pseudorandom ordering of all thread and message events.

https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-10%3A+Cluster+and+Code+Simulations

Evaluating the correctness of distributed systems is hard, as I’m sure every 
developer on this list appreciates. As the project has matured, we have had to 
grapple more with the guarantees we provide users for features we develop, and 
the semantics we promise, particularly around edge-cases between two mechanisms 
or systems.

This work aims to dramatically reduce the project overhead necessary for 
delivering a bug-free Cassandra.

The premise is to intercept all relevant events that could be performed in a 
different order, i.e. primarily message delivery and thread events such as 
executor submission, signalling of threads, lock acquisition and release, and 
even volatile reads and writes (to a lesser extent). These events are then 
scheduled pseudo-randomly (with various restrictions to ensure a valid 
execution), or in some cases not evaluated at all (to simulate e.g. messages 
being lost). The result is a repeatable sequential evaluation of a 
multi-threaded, multi-actor system.

This permits us to evaluate a much broader range of cluster behaviours without 
any additional development work, permitting us to implement a broad range of 
property-based and related randomized acceptance tests, without significant 
developer burden.

The work will apply just as readily to multi-threaded single classes as it will 
to whole clusters, and will come with a linearizability test for LWTs as well 
as a unit test for an existing multi-threaded bug that is otherwise hard to 
exhibit.

To achieve this, significant modifications will be required to the codebase, 
mostly cleaning up existing abstractions. Specifically, we will need to be able 
to mock executors, any blocking concurrency primitives, time, filesystem access 
and internode streaming.

The work is – in large part – already complete, with JIRA and PRs to follow in 
the coming weeks. Of course, the work is subject to the usual community input 
and review, so this does not preclude changes to the work (even significant 
ones, if they are warranted). I know a lot of incoming CEP are likely to be 
backed up by significant off-list development as a result of the focus on a 
shippable 4.0. Hopefully this is just a temporary growing pain, particularly as 
we move towards a shippable trunk.

I hope this work will be of huge value to the project, particularly as we race 
to catch up on years of limited feature development.

JIRA and PRs will follow, but I wanted to kick-off discussion in advance.



Re: Obfuscation of passwords in audit loging, in or not in 4.0?

2021-06-03 Thread bened...@apache.org
I think it can be argued that this is a pretty serious bug for a newly 
introduced feature, and qualifies for inclusion in an RC, but I don’t 
personally have a strong opinion on if this should happen.

I can’t imagine how this would be an _exception_ for inclusion in 4.0.1 though.

From: Mick Semb Wever 
Date: Thursday, 3 June 2021 at 22:45
To: dev@cassandra.apache.org 
Subject: Re: Obfuscation of passwords in audit loging, in or not in 4.0?
Thanks for raising this Stefan.



> While I humbly think this is 4.0-worthy, the process we have, as far
> as I know, is that there should be only critical fixes in 4.0 so I
> guess this will go to 4.0.1, right? Or does this qualify to go to 4.0
> still?
>


I believe the question here is whether this patch is worthy of an exception
to go to 4.0.x. (i.e. 4.0.1)
At this point in time all improvements would be by default slated for 4.x
(i.e. 4.1)

It does not qualify as a RC critical bug for 4.0.0.

Looking at the patch it is simple, and one could almost consider it a
security fix on a new 4.0 feature, so I'd say it's a valid exception for
4.0.x.
Keen to hear what others think. And how we should go about requesting such
exceptions for non-bugs during each annual release cycle.


Re: [DISCUSSION] Should we mark DROP COMPACT STORAGE as experimental

2021-06-04 Thread bened...@apache.org
This seems reasonable to me, but it raises a question of roadmap. My 
understanding is that we are deprecating compact storage, and will remove it in 
a future release (or have already partially removed it? I forget). Do these 
issues then constitute a blocking issue for GA, or do we modify our roadmap, or 
do we stipulate that users must upgrade to a future patch version of 4.0 before 
going to 4.next/5.0?


From: Benjamin Lerer 
Date: Friday, 4 June 2021 at 09:53
To: dev@cassandra.apache.org 
Subject: [DISCUSSION] Should we mark DROP COMPACT STORAGE as experimental
Hi everybody,

There are a significant amount of issues with DROP COMPACT STORAGE that can
be pretty surprising for users.
To name a few:
* Some hidden columns will show up changing the resultset returned for
wildcard queries
* As COMPACT tables did not have primary key liveness there empty rows
inserted AFTER the ALTER will be returned whereas the one inserted before
the ALTER will not.
* Also due to the lack of primary key liveness the amount of SSTables being
read will increase resulting in slower queries
* After DROP COMPACT it becomes possible to ALTER the table in a way that
makes all the row disappears
* There is a loss of functionality around null clustering when dropping
compact storage (CASSANDRA-16069)

In my opinion DROP COMPACT STORAGE is not ready for production use unless
users fully understand what they are doing.
By consequence, I am wondering if we should not mark it as experimental as
we did for the Materialized Views (CASSANDRA-13959).

What is your opinion?


Re: Apache Cassandra logo

2021-06-11 Thread bened...@apache.org
I’m onboard. Feels like the project is about the right age to have a mid-life 
crisis and try to spice things up a bit with a new logo.

From: Patrick McFadin 
Date: Friday, 11 June 2021 at 17:44
To: dev@cassandra.apache.org 
Subject: Re: Apache Cassandra logo
I'm going to call this out and take the hit.

I think it's time for the PMC to re-evaluate the current Cassandra logo.
For... reasons. We can put that in the post-4.0 bucket but it's a debate
that needs to be had.

Patrick

On Fri, Jun 11, 2021 at 12:20 AM Mick Semb Wever  wrote:

> > > None of those logos match the one used on the web site where Apache is
> in a different font and Cassandra is in a different font and in capitals.
> Is the logo on the website  now the preferred logo? Is there a version with
> black text? If so any chance it could be uploaded to apache.org/logos?
> >
> > For what it's worth the page at www.apache.org 
> > features three projects
> every hour.
> >
> > The logo used is the one the project registers at 
> > www.apache.org/logos/
>
>
> This will get resolved by 4.0 GA
>
> The website changes are still in progress and are getting tackled
> under CASSANDRA-16115. The folk there will be able to give better
> answers, if you would like to get involved there. Otherwise I have
> kept them up to date.
>
> Given the redesign and a few logo variants to choose from, it will be
> worth asking the PMC what we would like as the primary logo used in
> other such places (which I presume is the first logo listed?)
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Are we ready for 4.0.0 (GA) ?

2021-06-14 Thread bened...@apache.org
A rate of 4/30 is a rate of 13% true bugs, which worries me with respect to our 
promise of shipping a bug-free GA.  In past releases we have ensured no flaky 
tests, I think.

That said, I’ve not had the time to contribute to the fixing of flaky tests, so 
I’ll leave the decision to those who have, or otherwise have a strong opinion.


From: Ekaterina Dimitrova 
Date: Monday, 14 June 2021 at 20:51
To: dev@cassandra.apache.org 
Subject: Re: Are we ready for 4.0.0 (GA) ?
To give some context around the flaky tests, I pulled a quick report for the 
fixed ones during the past two months. It is attached for your reference.

To summarize, in two months 30 tickets for flaky tests were closed and only 4 
of them were Cassandra bugs(marked in red in the report), the rest of them were 
test fixes.

I think Butler and running in a loop any new tests before adding them to our 
test suite will help a lot. Also, Mick did a lot of work to stabilize Jenkins. 
Timeouts and resource issues are less common than before, that is  a win! Thank 
you Mick!

Best regards,
Ekaterina


On Mon, 14 Jun 2021 at 13:08, Adam Holmberg 
mailto:adam.holmb...@datastax.com>> wrote:
To the point of "long-term observability over flakies":

I will mention here that we intend to deploy a tool called Butler that we
have developed and used internally for a while. It compliments Jenkins to
present different views of test results, allowing developers to better
ascertain those tests that are flaky vs failing vs new regressions. We
already have a server provisioned for public hosting. The application
requires a bit of work to generalize for this project. We've been putting
it on while focused on getting 4.0 over the line, but should be getting to
it soon after.

On Mon, Jun 14, 2021 at 11:33 AM Mick Semb Wever 
mailto:m...@apache.org>> wrote:

> Are we ready to cut 4.0.0 (GA) once the following tickets land?
>
>  CASSANDRA-16733 – Allow operators to disable 'ALTER ... DROP COMPACT
> STORAGE' statements"
>  CASSANDRA-16669 – Password obfuscation for DCL audit log statements
>  CASSANDRA-16735 – Adding columns via ALTER TABLE can generate corrupt
> sstables
>
>
> A bit more background.
>
> 1. On our 4.0 GA board there's a few other tickets, which have priority but
> are not blockers for a GA release.
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=355&quickFilter=1661
>
>  CASSANDRA-16715 – WEBSITE - June 2021 updates
>  CASSANDRA-12519 – dtest failure in
> offline_tools_test.TestOfflineTools.sstableofflinerelevel_test
>  CASSANDRA-16681 – org.apache.cassandra.utils.memory.LongBufferPoolTest -
> tests are flaky
>  CASSANDRA-16689 – Flaky LeaveAndBootstrapTest
>
>
> 2. We also said we would get 5 green CI runs in a row. Progress on that
> front
> has been slow and risks delaying GA and our user base. It has had priority
> and there's been lots of momentum which is persisting: lots of flaky fixes
> committed; and the following are being discussed to keep pushing it in the
> right direction…
>  - Long-term observability over flakies
>  - Jenkins agent observability (infra stability)
>
> The past weeks has seen good progress on stability of ci-cassandra.a.o with
> the introduction of cpu docker limits imposed, and better monitoring of the
> agents so we can ensure we get the saturation and load we want. Dockerising
> the cqlshlib tests is also in progress.
>
> The alternative to a 4.0.0 GA release is a 4.0-rc2 release.
> Should the next release be: 4.0.0 (GA) or 4.0-rc2 ?
>


--
Adam Holmberg
e. adam.holmb...@datastax.com
w. www.datastax.com


Re: Are we ready for 4.0.0 (GA) ?

2021-06-15 Thread bened...@apache.org
That popularity line is a lot more stable than I would have expected, honestly, 
given the huge shifts in the database landscape in the intervening years. 
Though of course I’m sure we’d all rather it were trending upwards. I think the 
release of 4.0 is likely to have minimal impact on that, though – future 
project developments are going to determine the project’s success, I expect. 
Plus maybe a new logo 😊

Still, not disputing the need to ship GA soon.  We do have to cut another RC 
given the seriousness of CASSANDRA-16735 though, right?


From: Benjamin Lerer 
Date: Tuesday, 15 June 2021 at 11:14
To: dev@cassandra.apache.org 
Subject: Re: Are we ready for 4.0.0 (GA) ?
As the list of flaky tests was filtered out I wanted to add some
information about the test that revealed real issues. First there was a
mistake: only 3 of the issues were revealed by flaky tests. The other one
was a user report.
From the 3 remaining tickets only 2 were 4.0 bugs: CASSANDRA-16238
<https://issues.apache.org/jira/browse/CASSANDRA-16238> and CASSANDRA-16668
<https://issues.apache.org/jira/browse/CASSANDRA-16668>(which was a pretty
hard to hit bug).
I totally agree that we found some real issues but the cost is pretty high:
2 months of work for two 4.0 issues.

I had a look this morning at how many users reported bugs on the RC-2
release. Outside of the people deeply involved in this project there were
only 4 people reporting true issues and all of the issues were relatively
minors.

I totally understand that we want to deliver a high quality product. I just
believe that we have to draw the line at some point.
The popularity of Cassandra has been going down for years (
https://db-engines.com/en/ranking_trend/system/Cassandra). The project
might need that release more than any bug fix we can do.

Le mar. 15 juin 2021 à 07:00, Dinesh Joshi  a
écrit :

> Based on the release lifecycle[1], we should cut another RC until we don’t
> find any blocking issues.
>
> Dinesh
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=132320437
>
> >
> > On Jun 14, 2021, at 9:05 PM, Scott Andreas  wrote:
> >
> > A second RC is appropriate given the revert of CASSANDRA-15899
> necessitated by the discovery of CASSANDRA-16735: Adding columns via ALTER
> TABLE can generate corrupt sstables.
> >
> > Ekaterina and Benedict's statement regarding the true positive rate of
> flaky tests also shows the value of resolving these, and that it would be
> good to pay this down as far as we can reasonably do so without
> unnecessarily withholding the release.
> >
> > I do think it's possible that an RC2 build is a candidate for nomination
> as our GA release. I don't think the RC2 phase needs to be drawn-out, but
> believe it would build confidence for the project to have positive feedback
> from a release containing the fix for C-16735. If work paying down the
> remaining flaky tests surfaces a similar true positive rate, a third build
> might be warranted, and it would be to the benefit of our users - but I
> don't think we're far off.
> >
> > I hope others are working to deploy the beta/RC builds and integrate +
> deploy changes from trunk into the releases they're deploying, as heavy
> contributors doing so provides us the best opportunity to catch these
> issues before our users do.
> >
> > We're getting close.
> >
> > 
> > From: bened...@apache.org 
> > Sent: Monday, June 14, 2021 3:03 PM
> > To: dev@cassandra.apache.org
> > Subject: Re: Are we ready for 4.0.0 (GA) ?
> >
> > A rate of 4/30 is a rate of 13% true bugs, which worries me with respect
> to our promise of shipping a bug-free GA.  In past releases we have ensured
> no flaky tests, I think.
> >
> > That said, I’ve not had the time to contribute to the fixing of flaky
> tests, so I’ll leave the decision to those who have, or otherwise have a
> strong opinion.
> >
> >
> > From: Ekaterina Dimitrova 
> > Date: Monday, 14 June 2021 at 20:51
> > To: dev@cassandra.apache.org 
> > Subject: Re: Are we ready for 4.0.0 (GA) ?
> > To give some context around the flaky tests, I pulled a quick report for
> the fixed ones during the past two months. It is attached for your
> reference.
> >
> > To summarize, in two months 30 tickets for flaky tests were closed and
> only 4 of them were Cassandra bugs(marked in red in the report), the rest
> of them were test fixes.
> >
> > I think Butler and running in a loop any new tests before adding them to
> our test suite will help a lot. Also, Mick did a lot of work to stabilize
> Jenkins. Timeouts and resource issues are less common than before, that is
> a win! Thank

Re: Are we ready for 4.0.0 (GA) ?

2021-06-15 Thread bened...@apache.org
I think, given your revised statements around the bugs discovered with the 
flaky tests, and given that these don’t seem to have been serious bugs, I’m 
comfortable with a two week period post-RC2.


From: Benjamin Lerer 
Date: Tuesday, 15 June 2021 at 12:41
To: dev@cassandra.apache.org 
Subject: Re: Are we ready for 4.0.0 (GA) ?
>
> We do have to cut another RC given the seriousness of CASSANDRA-16735
> though, right?


I do not disagree with that. I just would like to see us more precise with
our expectations for releasing 4.0 GA, considering that we have already
deeply tested the code.

Would it make sense to say: "Let's give us 1 or 2 weeks to test RC-2. If no
blocker shows up we can release 4.0 GA" ?

Le mar. 15 juin 2021 à 12:25, bened...@apache.org  a
écrit :

> That popularity line is a lot more stable than I would have expected,
> honestly, given the huge shifts in the database landscape in the
> intervening years. Though of course I’m sure we’d all rather it were
> trending upwards. I think the release of 4.0 is likely to have minimal
> impact on that, though – future project developments are going to determine
> the project’s success, I expect. Plus maybe a new logo 😊
>
> Still, not disputing the need to ship GA soon.  We do have to cut another
> RC given the seriousness of CASSANDRA-16735 though, right?
>
>
> From: Benjamin Lerer 
> Date: Tuesday, 15 June 2021 at 11:14
> To: dev@cassandra.apache.org 
> Subject: Re: Are we ready for 4.0.0 (GA) ?
> As the list of flaky tests was filtered out I wanted to add some
> information about the test that revealed real issues. First there was a
> mistake: only 3 of the issues were revealed by flaky tests. The other one
> was a user report.
> From the 3 remaining tickets only 2 were 4.0 bugs: CASSANDRA-16238
> <https://issues.apache.org/jira/browse/CASSANDRA-16238> and
> CASSANDRA-16668
> <https://issues.apache.org/jira/browse/CASSANDRA-16668>(which was a pretty
> hard to hit bug).
> I totally agree that we found some real issues but the cost is pretty high:
> 2 months of work for two 4.0 issues.
>
> I had a look this morning at how many users reported bugs on the RC-2
> release. Outside of the people deeply involved in this project there were
> only 4 people reporting true issues and all of the issues were relatively
> minors.
>
> I totally understand that we want to deliver a high quality product. I just
> believe that we have to draw the line at some point.
> The popularity of Cassandra has been going down for years (
> https://db-engines.com/en/ranking_trend/system/Cassandra). The project
> might need that release more than any bug fix we can do.
>
> Le mar. 15 juin 2021 à 07:00, Dinesh Joshi  a
> écrit :
>
> > Based on the release lifecycle[1], we should cut another RC until we
> don’t
> > find any blocking issues.
> >
> > Dinesh
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=132320437
> >
> > >
> > > On Jun 14, 2021, at 9:05 PM, Scott Andreas 
> wrote:
> > >
> > > A second RC is appropriate given the revert of CASSANDRA-15899
> > necessitated by the discovery of CASSANDRA-16735: Adding columns via
> ALTER
> > TABLE can generate corrupt sstables.
> > >
> > > Ekaterina and Benedict's statement regarding the true positive rate of
> > flaky tests also shows the value of resolving these, and that it would be
> > good to pay this down as far as we can reasonably do so without
> > unnecessarily withholding the release.
> > >
> > > I do think it's possible that an RC2 build is a candidate for
> nomination
> > as our GA release. I don't think the RC2 phase needs to be drawn-out, but
> > believe it would build confidence for the project to have positive
> feedback
> > from a release containing the fix for C-16735. If work paying down the
> > remaining flaky tests surfaces a similar true positive rate, a third
> build
> > might be warranted, and it would be to the benefit of our users - but I
> > don't think we're far off.
> > >
> > > I hope others are working to deploy the beta/RC builds and integrate +
> > deploy changes from trunk into the releases they're deploying, as heavy
> > contributors doing so provides us the best opportunity to catch these
> > issues before our users do.
> > >
> > > We're getting close.
> > >
> > > 
> > > From: bened...@apache.org 
> > > Sent: Monday, June 14, 2021 3:03 PM
> > > To: dev@cassandra.apache.org
> > > Subject: Re: Are we ready for 4.

Re: Additions to Cassandra ecosystem page?

2021-06-22 Thread bened...@apache.org
Under Cloud Offerings, are we comfortable implicitly endorsing “API compatible” 
offerings that aren’t actually Cassandra, and also don’t (as far as I am aware) 
fully support Cassandra functionality? Should we at least mention that this is 
the case?


From: Melissa Logan 
Date: Tuesday, 22 June 2021 at 21:39
To: u...@cassandra.apache.org , 
dev@cassandra.apache.org 
Subject: Additions to Cassandra ecosystem page?
Hi all,

The Cassandra community recently updated its website and has added several
new entries to the Ecosystem page: https://cassandra.apache.org/ecosystem/.

If you have edits or know of other third-party Cassandra projects, tools,
products, etc that may be useful to others -- please get in touch and we'll
add to the next round of site updates in July.

Thanks!

Melissa
Apache Cassandra Contributor


Re: Additions to Cassandra ecosystem page?

2021-06-23 Thread bened...@apache.org
If we are going to include copycats, let’s (in all seriousness) at least be fun 
about it and put them under the heading “Copycats”

We should also include a disclaimer that they may not be feature compatible. 
Since due diligence on this is hard even for subject matter experts, it would 
be nicer still if we put a bit of detail explaining some of the differences 
before putting them on the website, but I doubt anyone has the time for that 
(so I still slightly prefer we don’t include them).



From: Ben Bromhead 
Sent: Wednesday, June 23, 2021 4:56:34 AM
To: Cassandra DEV 
Subject: Re: Additions to Cassandra ecosystem page?

There is certainly a lack of clarity in the grouping, as a number of those
services are not offering Apache Cassandra. I would suggest another
category along the lines of "Cassandra Protocol compatible offerings".

That way users can easily distinguish between ecosystem offerings where
"the driver works, but certain features might not", vs an actual Apache
Cassandra offering.

We could then also add things like Yugabyte and Scylla into that category.

On Wed, Jun 23, 2021 at 11:15 AM Jonathan Koppenhofer 
wrote:

> No major opinion on the "cloud offerings" piece, but I agree people should
> know what they are getting into, and be able to make an informed decision.
> However, if someone is going down that path, I would hope they do the
> due-diligence to make sure it fits their requirements.
>
> 1 small update I would suggest. It seems like Datastax Spring Boot entry
> would go in development frameworks as opposed to the sidecar section.
>
> On Tue, Jun 22, 2021, 5:39 PM bened...@apache.org 
> wrote:
>
> > Under Cloud Offerings, are we comfortable implicitly endorsing “API
> > compatible” offerings that aren’t actually Cassandra, and also don’t (as
> > far as I am aware) fully support Cassandra functionality? Should we at
> > least mention that this is the case?
> >
> >
> > From: Melissa Logan 
> > Date: Tuesday, 22 June 2021 at 21:39
> > To: u...@cassandra.apache.org ,
> > dev@cassandra.apache.org 
> > Subject: Additions to Cassandra ecosystem page?
> > Hi all,
> >
> > The Cassandra community recently updated its website and has added
> several
> > new entries to the Ecosystem page:
> https://cassandra.apache.org/ecosystem/
> > .
> >
> > If you have edits or know of other third-party Cassandra projects, tools,
> > products, etc that may be useful to others -- please get in touch and
> we'll
> > add to the next round of site updates in July.
> >
> > Thanks!
> >
> > Melissa
> > Apache Cassandra Contributor
> >
>


--

Ben Bromhead

Instaclustr | www.instaclustr.com<http://www.instaclustr.com> | @instaclustr
<http://twitter.com/instaclustr> | +64 27 383 8975


Re: Additions to Cassandra ecosystem page?

2021-06-23 Thread bened...@apache.org
+1

From: Brandon Williams 
Date: Wednesday, 23 June 2021 at 15:44
To: dev@cassandra.apache.org 
Subject: Re: Additions to Cassandra ecosystem page?
On Wed, Jun 23, 2021 at 9:38 AM Joshua McKenzie  wrote:
>
> The obvious core responsibility of the website should be to ASLv2
> permissively licensed Apache Cassandra and secondarily to CQL as a protocol
> IMO. I don't think we as a project should be tracking derivative works,
> forks, or other things built on top of the code-base and certainly not
> things with wildly varied licensing (AGPL, proprietary closed, etc).

I agree.  I don't see how it makes sense for us to promote less
compatible derivatives with more restrictive licensing.  Imitation may
be flattery but as you pointed out, we don't need to be the ones
advertising it.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: Additions to Cassandra ecosystem page?

2021-06-29 Thread bened...@apache.org
I don’t think it is intractable to come up with a definition that we use for 
inclusion.

1. List no alternative offerings at all.
2. List only those offerings that deploy precisely a released version of 
Cassandra.
3. List only those offerings that deploy precisely a released version of 
Cassandra with modifications that extend a list of public APIs.
4. List only those offerings that deploy precisely a released version of 
Cassandra with modifications that extend a list of public APIs, or are 
themselves published under ASL v2.

Listing a product on our website implicitly endorses that offering, and we 
should absolutely be restrictive about what we endorse. I’m -1 unconditionally 
endorsing competing products, and products that are not themselves clearly some 
derivative work that is accessible to the community under similar terms.

If we cannot agree on a set of conditions, options (1) and (2) are simple, easy 
to administer, adequately restrictive and not inconsistently permissive.

I don’t think this website is going to drive a lot of traffic to any of these 
businesses, so I doubt any of them should be upset at any loss of revenue. The 
question is simply one of principle for us as a project.



From: Benjamin Lerer 
Date: Tuesday, 29 June 2021 at 08:10
To: dev@cassandra.apache.org 
Subject: Re: Additions to Cassandra ecosystem page?
I feel that we are going into a too restrictive direction. I believe that
we have more to win by being open and welcoming.
-1 for the strict approach and for the licences.

Le mar. 29 juin 2021 à 00:40, Ben Bromhead  a écrit :

> On Thu, Jun 24, 2021 at 2:38 AM Joshua McKenzie 
> wrote:
>
> >
> > The obvious core responsibility of the website should be to ASLv2
> > permissively licensed Apache Cassandra and secondarily to CQL as a
> protocol
> > IMO. I don't think we as a project should be tracking derivative works,
> > forks, or other things built on top of the code-base and certainly not
> > things with wildly varied licensing (AGPL, proprietary closed, etc).
> >
> > To go that route we either become fully inclusive of everything or become
> > Kingmakers, and either way there's the consequence of inconsistent levels
> > of vetting, maintenance, and dilution of what it means to "be Cassandra".
> > There's plenty of other websites for other projects and everyone has
> access
> > to search engines.
> >
>
> This makes sense to me as a line in the sand to draw if we are going down a
> strict path.
>
> It would be up to whoever wants to be added to the list to demonstrate this
> is the case.
>
> There would still be some degree of honesty required as well on the service
> providers part.
>


Re: Additions to Cassandra ecosystem page?

2021-06-30 Thread bened...@apache.org
I disagree that disclaimers dissolve many duties, responsibilities or implied 
communicative acts.

Most people recognise disclaimers as a means of abdicating responsibility for 
the consequences of utilising an endorsement or other facility, not as a 
communicative act indicating a lack of actual endorsement.

Besides which, many here have communicated reasons they believe it is wrong to 
promote these other database offerings, which is a weaker criteria than 
endorsement.


From: Paulo Motta 
Date: Tuesday, 29 June 2021 at 19:14
To: Cassandra DEV 
Subject: Re: Additions to Cassandra ecosystem page?
> Listing a product on our website implicitly endorses that offering, and
we should absolutely be restrictive about what we endorse. I’m -1
unconditionally endorsing

I don't think listing a product on the website means implicitly endorsing
it if it's explicitly mentioned with a visible disclaimer that we're not
endorsing the listed products.

>From my experience, an ecosystem page is an open wiki editable by anyone
with the sole objective of allowing external users to easily find anything
related to the project, and not a list of "unconditionally endorsed"
offerings.

Why not take a community-driven laissez-faire approach and just let people
list whatever they want if they feel part of the community, with the
explicit disclaimer that being on the list is not a project endorsement of
the offering? For instance, Apache Kafka uses very simple wording to convey
this [1]: "Here is a list of tools *we have been* told about that integrate
with Kafka outside the main distribution. *We haven't tried them all, so
they may not work*!" [1]

I think it's fine to bikeshed how to categorize offerings, present the
list, word the disclaimer and even remove clear violations of good faith,
but I don't think we should be over presumptuous and prescribe what is
allowed and forbidden on a public wiki of an open source project.

Two objective suggestions I'd like to make are:
- Give more visibility/prominence to
auxiliary/complementary/supplementary/non-competing/open-source
projects/products by listing them at the top of the page, and list
closed-source / SaaS / API-compatible offerings under its own category at
the bottom of the page with maybe an additional disclaimer that not all
features may be available on these offerings.
- There are 3 sub-offerings from a single vendor in the "Professional
Services" category, but I think it's sufficient to list each service
provider once per category, since the sub-offerings can be easily found by
visiting the service provider website.

Paulo
-

[1] https://spark.apache.org/third-party-projects.html

Em ter., 29 de jun. de 2021 às 04:48, Benjamin Lerer 
escreveu:

> If I have to choose between the four choices that you proposed I would then
> choose (1) List no alternative offerings at all.
>
> Le mar. 29 juin 2021 à 09:34, bened...@apache.org  a
> écrit :
>
> > I don’t think it is intractable to come up with a definition that we use
> > for inclusion.
> >
> > 1. List no alternative offerings at all.
> > 2. List only those offerings that deploy precisely a released version of
> > Cassandra.
> > 3. List only those offerings that deploy precisely a released version of
> > Cassandra with modifications that extend a list of public APIs.
> > 4. List only those offerings that deploy precisely a released version of
> > Cassandra with modifications that extend a list of public APIs, or are
> > themselves published under ASL v2.
> >
> > Listing a product on our website implicitly endorses that offering, and
> we
> > should absolutely be restrictive about what we endorse. I’m -1
> > unconditionally endorsing competing products, and products that are not
> > themselves clearly some derivative work that is accessible to the
> community
> > under similar terms.
> >
> > If we cannot agree on a set of conditions, options (1) and (2) are
> simple,
> > easy to administer, adequately restrictive and not inconsistently
> > permissive.
> >
> > I don’t think this website is going to drive a lot of traffic to any of
> > these businesses, so I doubt any of them should be upset at any loss of
> > revenue. The question is simply one of principle for us as a project.
> >
> >
> >
> > From: Benjamin Lerer 
> > Date: Tuesday, 29 June 2021 at 08:10
> > To: dev@cassandra.apache.org 
> > Subject: Re: Additions to Cassandra ecosystem page?
> > I feel that we are going into a too restrictive direction. I believe that
> > we have more to win by being open and welcoming.
> > -1 for the strict approach and for the licences.
> >
> > Le mar. 29 juin 2021 à 00:40, Ben Bromhead  a
> écrit :
> >
> &g

Re: [DISCUSS] Clarifying the CEP process

2021-07-08 Thread bened...@apache.org
That’s how I understand the process, yes. Voting to accept the CEP just 
indicates that the broad strokes painted by the CEP are acceptable to the 
community, and a patch can be brought forward with the expectation that it will 
be accepted once it meets the other criteria for acceptance.

From: Benjamin Lerer 
Date: Thursday, 8 July 2021 at 10:59
To: dev@cassandra.apache.org 
Subject: [DISCUSS] Clarifying the CEP process
Hi everybody,

CEPs are now a required step for important changes to the Cassandra code
base. Nevertheless, this process is new for all of us and beyond creating a
CEP it seems a bit unclear what needs to be done to get the CEP approved.

I will take as an example the CEP-9: Make SSLContext creation pluggable

that has been provided with a JIRA ticket ( CASSANDRA-1
 ) and a PR.

Sumanth and Stefan both raised some high level concerns on the JIRA ticket.
My understanding is that they should have been raised on the DISCUSSION
thread. Once those concerns are addressed or discussed, I believe that if
nobody raised more concerns we should trigger a VOTE.

Is my understanding correct? What criteria should be met before we trigger
VOTE?

One other point of confusion is the agreement on the CEPs versus the
agreement on the patch. Agreeing on the CEP in my opinion does not mean
that we agree on the patch. As patch is not required before we agree on the
CEP. Am I correct?

Thanks in advance for your feedback.


Re: [DISCUSS] Clarifying the CEP process

2021-07-08 Thread bened...@apache.org
I think that’s a bit extreme – it seems perfectly fine to comment on Jira, but 
high level discussions around scope, goals and potential confounders should 
ideally happen on the DISCUSS thread. It’s a difficult balancing act, choosing 
the venue for a discussion, so let’s not censure people unnecessarily.

Perhaps we should put together a cheat sheet for kinds of discussion and what 
venue to raise them in at different phases in a CEP lifecycle. As this process 
is likely to be quite dynamic over a CEP’s lifetime - once a vote passes, it’s 
likely that aspects of a CEP will be revisited as a result of discussions 
(including high level ones) on Jira and other venues, but we won’t want to 
bring those discussions immediately back to another DISCUSS thread – it’s 
likely that would wait until some consensus emerges amongst those involved in 
the work, to present to the dev list for further discussion.


From: Mick Semb Wever 
Date: Thursday, 8 July 2021 at 11:14
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Clarifying the CEP process
My understanding is that they should have been raised on the DISCUSSION
> thread. Once those concerns are addressed or discussed, I believe that if
> nobody raised more concerns we should trigger a VOTE.
>
> Is my understanding correct?
>



Agree, we shouldn't be commenting on jira tickets or on PRs until the CEP
process has passed a vote. The jira ticket and PR can be created as a PoC
to help explain and illustrate the CEP, but nothing more than that.

Thanks for raising this Benjamin. It's important we make the new process of
CEPs easy for adoption.


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
Did anyone have any thoughts on this CEP, or shall I bring it forward for a 
vote also?

From: bened...@apache.org 
Date: Thursday, 3 June 2021 at 20:19
To: dev@cassandra.apache.org 
Subject: [DISCUSS] CEP-10: Cluster and Code Simulations
Proposal for a mechanism to evaluate whole clusters, or individual classes, 
with a deterministically pseudorandom ordering of all thread and message events.

https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-10%3A+Cluster+and+Code+Simulations

Evaluating the correctness of distributed systems is hard, as I’m sure every 
developer on this list appreciates. As the project has matured, we have had to 
grapple more with the guarantees we provide users for features we develop, and 
the semantics we promise, particularly around edge-cases between two mechanisms 
or systems.

This work aims to dramatically reduce the project overhead necessary for 
delivering a bug-free Cassandra.

The premise is to intercept all relevant events that could be performed in a 
different order, i.e. primarily message delivery and thread events such as 
executor submission, signalling of threads, lock acquisition and release, and 
even volatile reads and writes (to a lesser extent). These events are then 
scheduled pseudo-randomly (with various restrictions to ensure a valid 
execution), or in some cases not evaluated at all (to simulate e.g. messages 
being lost). The result is a repeatable sequential evaluation of a 
multi-threaded, multi-actor system.

This permits us to evaluate a much broader range of cluster behaviours without 
any additional development work, permitting us to implement a broad range of 
property-based and related randomized acceptance tests, without significant 
developer burden.

The work will apply just as readily to multi-threaded single classes as it will 
to whole clusters, and will come with a linearizability test for LWTs as well 
as a unit test for an existing multi-threaded bug that is otherwise hard to 
exhibit.

To achieve this, significant modifications will be required to the codebase, 
mostly cleaning up existing abstractions. Specifically, we will need to be able 
to mock executors, any blocking concurrency primitives, time, filesystem access 
and internode streaming.

The work is – in large part – already complete, with JIRA and PRs to follow in 
the coming weeks. Of course, the work is subject to the usual community input 
and review, so this does not preclude changes to the work (even significant 
ones, if they are warranted). I know a lot of incoming CEP are likely to be 
backed up by significant off-list development as a result of the focus on a 
shippable 4.0. Hopefully this is just a temporary growing pain, particularly as 
we move towards a shippable trunk.

I hope this work will be of huge value to the project, particularly as we race 
to catch up on years of limited feature development.

JIRA and PRs will follow, but I wanted to kick-off discussion in advance.


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
Hi Benjamin,

The concurrency constructs listed are all _blocking_ concurrency primitives, 
i.e. they put threads to sleep and wake them up. Since the goal of this work is 
pseudorandom execution of the application, trapping thread events is a central 
feature.

The ability to mock the file system is only to ensure the execution is 
_deterministic_. Otherwise a cluster running billions of simulations would be 
almost useless, as you would not readily be able to reproduce the sequence on a 
local machine. The execution order is extremely brittle, with even a different 
patch release of the JVM being able to produce a different sequence of 
execution (in some cases, at least – no doubt many patch releases do not have 
ordering impacts).

The best example of this work is the LWT linearizability verifier that will be 
included with it, which is quite a simple test to put together with the 
simulator: you simply issue some LWT reads and writes to a cluster, and the 
simulator intercepts* every message and thread (and in some specific relevant 
cases, memory access) event, and executes them in pseudorandom order. Each run 
exhibits unique behaviour, exploring different edge cases in the system. If we 
were to only intercept message events, we would fail to explore a wide variety 
of potentially erroneous states in the system – including even those only 
related to message delivery (in the real world, responses can be received 
before the thread sending them completes the act of doing so, for instance).


From: Benjamin Lerer 
Date: Tuesday, 13 July 2021 at 09:50
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
Hi Benedict, Sam,

Could you describe some of the scenarios that this new framework will allow
us to test ? They might help me to understand the changes required.
The need for the changes around concurrency and file access is not obvious
to me. By consequence, I am guessing that I probably do not fully
understand the goal of the proposal.

Thanks in advance

Benjamin


Le mar. 13 juil. 2021 à 10:37, Sam Tunnicliffe  a écrit :

> Spoiler alert: I am pretty familiar with the proposal and the off-list
> work that has been done toward it.
>
> From my perspective, I have no qualms about putting this CEP up for a
> vote. Having seen the potential (and to some degree, realised) benefit of
> this proposal, I am
> convinced of its value.
>
> Thanks,
> Sam
>
> > On 13 Jul 2021, at 09:20, bened...@apache.org wrote:
> >
> > Did anyone have any thoughts on this CEP, or shall I bring it forward
> for a vote also?
> >
> > From: bened...@apache.org 
> > Date: Thursday, 3 June 2021 at 20:19
> > To: dev@cassandra.apache.org 
> > Subject: [DISCUSS] CEP-10: Cluster and Code Simulations
> > Proposal for a mechanism to evaluate whole clusters, or individual
> classes, with a deterministically pseudorandom ordering of all thread and
> message events.
> >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-10%3A+Cluster+and+Code+Simulations
> >
> > Evaluating the correctness of distributed systems is hard, as I’m sure
> every developer on this list appreciates. As the project has matured, we
> have had to grapple more with the guarantees we provide users for features
> we develop, and the semantics we promise, particularly around edge-cases
> between two mechanisms or systems.
> >
> > This work aims to dramatically reduce the project overhead necessary for
> delivering a bug-free Cassandra.
> >
> > The premise is to intercept all relevant events that could be performed
> in a different order, i.e. primarily message delivery and thread events
> such as executor submission, signalling of threads, lock acquisition and
> release, and even volatile reads and writes (to a lesser extent). These
> events are then scheduled pseudo-randomly (with various restrictions to
> ensure a valid execution), or in some cases not evaluated at all (to
> simulate e.g. messages being lost). The result is a repeatable sequential
> evaluation of a multi-threaded, multi-actor system.
> >
> > This permits us to evaluate a much broader range of cluster behaviours
> without any additional development work, permitting us to implement a broad
> range of property-based and related randomized acceptance tests, without
> significant developer burden.
> >
> > The work will apply just as readily to multi-threaded single classes as
> it will to whole clusters, and will come with a linearizability test for
> LWTs as well as a unit test for an existing multi-threaded bug that is
> otherwise hard to exhibit.
> >
> > To achieve this, significant modifications will be required to the
> codebase, mostly cleaning up existing abstractions. Specifically, we will
> need to be able to m

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
> Should target release be 4.1. (not 4.0.x) ?

No, in my opinion the target should be 4.0.x. We are reaching for a shippable 
trunk and this has no public API impacts. This work is IMO central to achieving 
a shippable trunk, either way. The only reason I do not target 3.x is that it 
would be too burdensome.

> My concern is that changing code and tests at the same time risks regressions…

I’ve never heard this position before. Would you care to elaborate? It is quite 
normal for us to update tests alongside changes to the code.

> And seconding Benjamin's comments… some documentation on how to write a test, 
> and a simple test example, that this CEP then allows us to write would help a 
> lot (a la "working backwards").

1) This work is to _enable_ the development of tests, with the only test 
originally planned to arrive alongside it the fairly sophisticated LWT 
Verifier. This is something we have sorely needed as a project, as we have had 
serious correctness violations for multiple years. This broad category of 
integrated test for verifying correctness is the main goal of the work and is 
not easily condensed into an example snippet.
2) It is _possible_ that some simple and fluid APIs will be introduced in a 
later phase of this work, but they haven’t been designed yet, so I cannot share 
snippets.

In principle, however, you would be able to do something like:

@Nemesis volatile int x = 0;
int foo() {
x = x + 1;
return x;
}

@Test
void test() {
Future f1 = executor.submit(() -> foo());
Future f2 = executor.submit(() -> foo());
Assert.assertTrue(f1.get() == 1 || f2.get() == 1);
}


From: Mick Semb Wever 
Date: Tuesday, 13 July 2021 at 10:28
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>
> To achieve this, significant modifications will be required to the codebase, 
> mostly cleaning up existing abstractions. Specifically, we will need to be 
> able to mock executors, any blocking concurrency primitives, time, filesystem 
> access and internode streaming.
>
> The work is – in large part – already complete, with JIRA and PRs to follow 
> in the coming weeks. Of course, the work is subject to the usual community 
> input and review, so this does not preclude changes to the work (even 
> significant ones, if they are warranted). I know a lot of incoming CEP are 
> likely to be backed up by significant off-list development as a result of the 
> focus on a shippable 4.0. Hopefully this is just a temporary growing pain, 
> particularly as we move towards a shippable trunk.
>
> I hope this work will be of huge value to the project, particularly as we 
> race to catch up on years of limited feature development.
>
> JIRA and PRs will follow, but I wanted to kick-off discussion in advance.
>



Should target release be 4.1. (not 4.0.x) ?

I'd be interested in seeing a rough timeline/plan of how the proposed
changes are to be defined in JIRAs and ordered.

I'd like to hear a bit more about the test plan. Not so much about how
the CEP itself improves testability of the project, but for example
the testing required to be in place to introduce the changes of the
CEP (and if it already exists, where). My concern is that changing
code and tests at the same time risks regressions…

And seconding Benjamin's comments… some documentation on how to write
a test, and a simple test example, that this CEP then allows us to
write would help a lot (a la "working backwards").

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
Ach, editing code in the email editor isn’t smart when editors all have 
different meanings for key combinations (accidentally hit send), but you get 
the idea. The simulator would intercept these thread executions, the memory 
accesses for the annotated field, and evaluate them so that in some cases the 
assertions would fail.

This is obviously a toy example that is not very interesting, but the main real 
example we have is too complicated to produce a snippet to demonstrate. In my 
view, the long term outcome of this work is likely the enablement of many unit 
tests that are a little more complicated than this, on less obvious code.

But the headline goal of the CEP is not. By itself, the LWT Verifier 
demonstrates the power and utility of the work. I don’t believe it is terribly 
helpful to focus on secondary justifications like the example I gave. For me, 
the _ability_ to prove the correctness of difficult but critical systems is 
justification enough, whether or not we deliver a simple API as part of the CEP.



From: bened...@apache.org 
Date: Tuesday, 13 July 2021 at 10:43
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> Should target release be 4.1. (not 4.0.x) ?


No, in my opinion the target should be 4.0.x. We are reaching for a shippable 
trunk and this has no public API impacts. This work is IMO central to achieving 
a shippable trunk, either way. The only reason I do not target 3.x is that it 
would be too burdensome.

> My concern is that changing code and tests at the same time risks regressions…


I’ve never heard this position before. Would you care to elaborate? It is quite 
normal for us to update tests alongside changes to the code.

> And seconding Benjamin's comments… some documentation on how to write a test, 
> and a simple test example, that this CEP then allows us to write would help a 
> lot (a la "working backwards").

1) This work is to _enable_ the development of tests, with the only test 
originally planned to arrive alongside it the fairly sophisticated LWT 
Verifier. This is something we have sorely needed as a project, as we have had 
serious correctness violations for multiple years. This broad category of 
integrated test for verifying correctness is the main goal of the work and is 
not easily condensed into an example snippet.
2) It is _possible_ that some simple and fluid APIs will be introduced in a 
later phase of this work, but they haven’t been designed yet, so I cannot share 
snippets.

In principle, however, you would be able to do something like:

@Nemesis volatile int x = 0;
int foo() {
x = x + 1;
return x;
}

@Test
void test() {
Future f1 = executor.submit(() -> foo());
Future f2 = executor.submit(() -> foo());
Assert.assertTrue(f1.get() == 1 || f2.get() == 1);
}


From: Mick Semb Wever 
Date: Tuesday, 13 July 2021 at 10:28
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>
> To achieve this, significant modifications will be required to the codebase, 
> mostly cleaning up existing abstractions. Specifically, we will need to be 
> able to mock executors, any blocking concurrency primitives, time, filesystem 
> access and internode streaming.
>
> The work is – in large part – already complete, with JIRA and PRs to follow 
> in the coming weeks. Of course, the work is subject to the usual community 
> input and review, so this does not preclude changes to the work (even 
> significant ones, if they are warranted). I know a lot of incoming CEP are 
> likely to be backed up by significant off-list development as a result of the 
> focus on a shippable 4.0. Hopefully this is just a temporary growing pain, 
> particularly as we move towards a shippable trunk.
>
> I hope this work will be of huge value to the project, particularly as we 
> race to catch up on years of limited feature development.
>
> JIRA and PRs will follow, but I wanted to kick-off discussion in advance.
>



Should target release be 4.1. (not 4.0.x) ?

I'd be interested in seeing a rough timeline/plan of how the proposed
changes are to be defined in JIRAs and ordered.

I'd like to hear a bit more about the test plan. Not so much about how
the CEP itself improves testability of the project, but for example
the testing required to be in place to introduce the changes of the
CEP (and if it already exists, where). My concern is that changing
code and tests at the same time risks regressions…

And seconding Benjamin's comments… some documentation on how to write
a test, and a simple test example, that this CEP then allows us to
write would help a lot (a la "working backwards").

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
Perhaps it’s worth looking forward at the roadmap that we plan to develop, and 
consider whether such a facility would be welcome for proving their safety, and 
we can then worry about evolving the specifics of any API(s) together as we 
deploy the capability? Looking ahead, there are very few major features I 
wouldn’t want to see exercised with this approach, given the choice.

The LWT Verifier by itself is an integration test that covers many of the 
affected subsystems, including sstables, memtables and repair. But we will have 
the ability to introduce dedicated verification for each of these features and 
systems, and we will necessarily produce more robust code (repair is a great 
example of a brittle system that would be impossible to produce with such an 
adversarial test system)


*Query side improvements:*

  * Storage Attached Index or SAI. The CEP can be found at
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
  * Add support for OR predicates in the CQL where clause
  * Allow to aggregate by time intervals (CASSANDRA-11871) and allow UDFs
in GROUP BY clause
  * Ability to read the TTL and WRITE TIME of an element in a collection
(CASSANDRA-8877)
  * Multi-Partition LWTs
  * Materialized views hardening: Addressing the different Materialized
Views issues (see CASSANDRA-15921 and [1] for some of the work involved)

*Security improvements:*

  * SSTables encryption (CASSANDRA-9633)
  * Add support for Dynamic Data Masking (CEP pending)
  * Allow the creation of roles that have the ability to assign arbitrary
privileges, or scoped privileges without also granting those roles access
to database objects.
  * Filter rows from system and system_schema based on users permissions
(CASSANDRA-15871)

*Performance improvements:*

  * Trie-based index format (CEP pending)
  * Trie-based memtables (CEP pending)
  * Paxos improvements: Paxos / LWT implementation that would enable the
database to serve serial writes with two round-trips and serial reads with
one round-trip in the uncontended case

*Safety/Usability improvements:*

  * Guardrails. The CEP can be found at
https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
  * Add ability to track state in repair (CASSANDRA-15399)
  * Repair coordinator improvements (CASSANDRA-15399)
  * Make incremental backup configurable per keyspace and table
(CASSANDRA-15402)
  * Add ability to blacklist a CQL partition so all requests are ignored
(CASSANDRA-12106)
  * Add default and required keyspace replication options (CASSANDRA-14557)
  * Transactional Cluster Metadata: Use of transactions to propagate
cluster metadata
  * Downgrade-ability: Ability to downgrade to downgrade in the event that
a serious issue has been identified

*Pluggability improvements:*

  * Pluggable schema manager (CEP pending)
  * Pluggable filesystem (CEP pending)
  * Pluggable authenticator for CQLSH (CASSANDRA-16456). A CEP draft can be
found at
https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit
  * Memtable API (CEP pending). The goal being to allow improvements such
as CASSANDRA-13981 to be easily plugged into Cassandra

*Memtable pluggable implementation:*

  * Enable Cassandra for Persistent Memory (CASSANDRA-13981)




From: bened...@apache.org 
Date: Tuesday, 13 July 2021 at 10:51
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
Ach, editing code in the email editor isn’t smart when editors all have 
different meanings for key combinations (accidentally hit send), but you get 
the idea. The simulator would intercept these thread executions, the memory 
accesses for the annotated field, and evaluate them so that in some cases the 
assertions would fail.

This is obviously a toy example that is not very interesting, but the main real 
example we have is too complicated to produce a snippet to demonstrate. In my 
view, the long term outcome of this work is likely the enablement of many unit 
tests that are a little more complicated than this, on less obvious code.

But the headline goal of the CEP is not. By itself, the LWT Verifier 
demonstrates the power and utility of the work. I don’t believe it is terribly 
helpful to focus on secondary justifications like the example I gave. For me, 
the _ability_ to prove the correctness of difficult but critical systems is 
justification enough, whether or not we deliver a simple API as part of the CEP.



From: bened...@apache.org 
Date: Tuesday, 13 July 2021 at 10:43
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> Should target release be 4.1. (not 4.0.x) ?



No, in my opinion the target should be 4.0.x. We are reaching for a shippable 
trunk and this has no public API impacts. This work is IMO central to achieving 
a shippable trunk, either way. The only reason I do not target 3.x is that it 
would be too burdensome.

> My concern is that changin

Re: [VOTE] Release Apache Cassandra 4.0.0

2021-07-13 Thread bened...@apache.org
Do we support aarch64? If we don’t we should continue with the release; if we 
do, we should (unfortunately) re-roll.

I’m genuinely unsure if we officially support it or not, though. I see an 
earlier related thread, but no vote and no conclusion about supported 
architectures.


From: Mick Semb Wever 
Date: Tuesday, 13 July 2021 at 12:45
To: dev@cassandra.apache.org 
Subject: Re: [VOTE] Release Apache Cassandra 4.0.0
>
> Hi, there is an issue for the [VOTE] version (Oracle JDK 1.8.0_291 on
> aarch64). Is it a known issue (for the 3.x)?
>
> The stack size specified is too small, Specify at least 328k
> Error: Could not create the Java Virtual Machine.
>
> It's worked once I added the `-Xss512k` to the JVM_OPTS (in the
> cassandra-env.sh).
>


Thanks Cong!

Given it is a trivial fix, I say we continue with 4.0.0.
I've entered it as CASSANDRA-16798 for 4.0.1


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
No change is without risk; we have introduced serious regressions with bug 
fixes to patch releases. The overall risk to the release lifecycle is reduced 
significantly in my opinion, as we reduce the likelihood of introducing 
regressions, and can use the same test infrastructure across all of the 
actively developed releases, increasing our confidence in 4.0.x releases.

Furthermore, we introduced a significant performance regression in all lines of 
the software by increasing the number of LWT round-trips. Unless we intend to 
leave this regression for a further year without _any_ release offering a 
solution, we will need suitable verification mechanisms for whatever fixes we 
deliver.

My view is that it is unacceptable to leave such a significant regression 
unaddressed in all lines of software we intend to release for the foreseeable 
future.


From: Paulo Motta 
Date: Tuesday, 13 July 2021 at 13:21
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> No, in my opinion the target should be 4.0.x. We are reaching for a
shippable trunk and this has no public API impacts. This work is IMO
central to achieving a shippable trunk, either way. The only reason I do
not target 3.x is that it would be too burdensome.

In my limited view of the proposal, a major refactor of internal
concurrency APIs to support the testing facility potentially risks the
stability of a minor release, something we've been wanting to avoid with
our focus on stability. So I'd prefer this to go in  trunk/4.1, otherwise
we will create precedence to including non-bugfix changes in minor
versions, something I think we should avoid.

In the past we've been lenient to including seemingly harmless internal
changes that caused client impact and we should be careful to avoid this in
the future. To prevent this I think we should take a strict approach and
only accept bug fixes in minor (ie. 4.0.x) versions moving forward.

I'd go one step further and propose that any CEPs, which are generally
about new features, major API changes or internal refactorings, should only
be allowed in subsequent major versions, unless an explicit exception is
granted.

Em ter., 13 de jul. de 2021 às 07:11, bened...@apache.org <
bened...@apache.org> escreveu:

> Perhaps it’s worth looking forward at the roadmap that we plan to develop,
> and consider whether such a facility would be welcome for proving their
> safety, and we can then worry about evolving the specifics of any API(s)
> together as we deploy the capability? Looking ahead, there are very few
> major features I wouldn’t want to see exercised with this approach, given
> the choice.
>
> The LWT Verifier by itself is an integration test that covers many of the
> affected subsystems, including sstables, memtables and repair. But we will
> have the ability to introduce dedicated verification for each of these
> features and systems, and we will necessarily produce more robust code
> (repair is a great example of a brittle system that would be impossible to
> produce with such an adversarial test system)
>
>
> *Query side improvements:*
>
>   * Storage Attached Index or SAI. The CEP can be found at
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
>   * Add support for OR predicates in the CQL where clause
>   * Allow to aggregate by time intervals (CASSANDRA-11871) and allow UDFs
> in GROUP BY clause
>   * Ability to read the TTL and WRITE TIME of an element in a collection
> (CASSANDRA-8877)
>   * Multi-Partition LWTs
>   * Materialized views hardening: Addressing the different Materialized
> Views issues (see CASSANDRA-15921 and [1] for some of the work involved)
>
> *Security improvements:*
>
>   * SSTables encryption (CASSANDRA-9633)
>   * Add support for Dynamic Data Masking (CEP pending)
>   * Allow the creation of roles that have the ability to assign arbitrary
> privileges, or scoped privileges without also granting those roles access
> to database objects.
>   * Filter rows from system and system_schema based on users permissions
> (CASSANDRA-15871)
>
> *Performance improvements:*
>
>   * Trie-based index format (CEP pending)
>   * Trie-based memtables (CEP pending)
>   * Paxos improvements: Paxos / LWT implementation that would enable the
> database to serve serial writes with two round-trips and serial reads with
> one round-trip in the uncontended case
>
> *Safety/Usability improvements:*
>
>   * Guardrails. The CEP can be found at
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
>   * Add ability to track state in repair (CASSANDRA-15399)
>   * Repair coordinator improvements (CASSANDRA-15399)
>   * Make incremental backup configurable per keyspace and table
> (CASSANDRA-15402)
>   * Add

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
Hmm. It occurs to me I’m not entirely sure how our new release process is going 
to work.

Will we be releasing 4.1 builds immediately, as part of shippable trunk? Or 
will 4.0 be our only active line of software for the next year?

Either way, I bet my bottom dollar there will come some regret if we introduce 
such divergence between the two most active branches we maintain, so early in 
their lifecycles. If we invest significant resources in improved testing using 
this framework (which I very much expect) then branches that are not compatible 
will not benefit, likely reducing their quality; and the risk of backports will 
increase, due to divergence.

Altogether, I think it would be a huge mistake. But if we will be shipping 
releases soon that can fix these aforementioned regressions, I won’t campaign 
for it.



From: bened...@apache.org 
Date: Tuesday, 13 July 2021 at 13:31
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
No change is without risk; we have introduced serious regressions with bug 
fixes to patch releases. The overall risk to the release lifecycle is reduced 
significantly in my opinion, as we reduce the likelihood of introducing 
regressions, and can use the same test infrastructure across all of the 
actively developed releases, increasing our confidence in 4.0.x releases.

Furthermore, we introduced a significant performance regression in all lines of 
the software by increasing the number of LWT round-trips. Unless we intend to 
leave this regression for a further year without _any_ release offering a 
solution, we will need suitable verification mechanisms for whatever fixes we 
deliver.

My view is that it is unacceptable to leave such a significant regression 
unaddressed in all lines of software we intend to release for the foreseeable 
future.


From: Paulo Motta 
Date: Tuesday, 13 July 2021 at 13:21
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> No, in my opinion the target should be 4.0.x. We are reaching for a
shippable trunk and this has no public API impacts. This work is IMO
central to achieving a shippable trunk, either way. The only reason I do
not target 3.x is that it would be too burdensome.

In my limited view of the proposal, a major refactor of internal
concurrency APIs to support the testing facility potentially risks the
stability of a minor release, something we've been wanting to avoid with
our focus on stability. So I'd prefer this to go in  trunk/4.1, otherwise
we will create precedence to including non-bugfix changes in minor
versions, something I think we should avoid.

In the past we've been lenient to including seemingly harmless internal
changes that caused client impact and we should be careful to avoid this in
the future. To prevent this I think we should take a strict approach and
only accept bug fixes in minor (ie. 4.0.x) versions moving forward.

I'd go one step further and propose that any CEPs, which are generally
about new features, major API changes or internal refactorings, should only
be allowed in subsequent major versions, unless an explicit exception is
granted.

Em ter., 13 de jul. de 2021 às 07:11, bened...@apache.org <
bened...@apache.org> escreveu:

> Perhaps it’s worth looking forward at the roadmap that we plan to develop,
> and consider whether such a facility would be welcome for proving their
> safety, and we can then worry about evolving the specifics of any API(s)
> together as we deploy the capability? Looking ahead, there are very few
> major features I wouldn’t want to see exercised with this approach, given
> the choice.
>
> The LWT Verifier by itself is an integration test that covers many of the
> affected subsystems, including sstables, memtables and repair. But we will
> have the ability to introduce dedicated verification for each of these
> features and systems, and we will necessarily produce more robust code
> (repair is a great example of a brittle system that would be impossible to
> produce with such an adversarial test system)
>
>
> *Query side improvements:*
>
>   * Storage Attached Index or SAI. The CEP can be found at
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
>   * Add support for OR predicates in the CQL where clause
>   * Allow to aggregate by time intervals (CASSANDRA-11871) and allow UDFs
> in GROUP BY clause
>   * Ability to read the TTL and WRITE TIME of an element in a collection
> (CASSANDRA-8877)
>   * Multi-Partition LWTs
>   * Materialized views hardening: Addressing the different Materialized
> Views issues (see CASSANDRA-15921 and [1] for some of the work involved)
>
> *Security improvements:*
>
>   * SSTables encryption (CASSANDRA-9633)
>   * Add support for Dynamic Data Masking (CEP pending)
>   * Allow the creation of roles that hav

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
> This is a fair point.  But a CEP isn't required to solve this.

I think the work contained in this CEP is necessary to safely solving this 
problem, and I have some empirical evidence in favour of this assertion.


From: Brandon Williams 
Date: Tuesday, 13 July 2021 at 13:39
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
On Tue, Jul 13, 2021 at 7:31 AM bened...@apache.org  wrote:
> Furthermore, we introduced a significant performance regression in all lines 
> of the software by increasing the number of LWT round-trips. Unless we intend 
> to leave this regression for a further year without _any_ release offering a 
> solution, we will need suitable verification mechanisms for whatever fixes we 
> deliver.
>
> My view is that it is unacceptable to leave such a significant regression 
> unaddressed in all lines of software we intend to release for the foreseeable 
> future.

This is a fair point.  But a CEP isn't required to solve this.



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
Nothing we’re discussing constitutes a feature. We’re discussing stability 
enhancements, and important bug fixes.

I think this disagreement is to some extent founded on our different premises 
about what a patch release should contain, and this seems to be the fault of 
incompletely specified documentation.

1. The release lifecycle only forbids feature work from being developed in a 
patch release, and only expressly includes bug fixes. Note that, the document 
even has a comment by the author suggesting that features may be backported to 
a patch release from trunk (not something I agree with, but it demonstrates the 
ambiguity of the definition).
2. There seems to be some conflation of size-of-change with the admissibility 
wrt release lifecycle – I don’t think there’s any criteria here, and it’s open 
to the community’s case-by-case assessment. Whatever we do to fix the bug in 
question will necessarily be a very significant piece of work itself, for 
instance.

My interpretation of the release lifecycle document is that it is acceptable to 
include this work in a patch release. My belief about its impact is that it 
would contribute positively to the stability of the project’s 4.0 releases over 
the lifecycle, and improve project velocity.

With respect to whether we can ship a fix to 12126 without validation, I would 
be strongly opposed to this, and certainly would not produce a patch myself in 
this way. Not only would it be burdensome (given the divergences in the 
codebase), but I would not consider it acceptably safe (given the divergence).


From: Jeremiah D Jordan 
Date: Tuesday, 13 July 2021 at 14:15
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
I tend to agree with Paulo that a major refactoring of some internal interfaces 
sounds like something to be explicitly avoided in a patch release.  I thought 
this was the type of change we all agreed we should stop letting in to patch 
releases, and that we would attempt to release more often (once a year) so 
changes that only go to trunk would get out faster?  Are we really wanting to 
break that promise to ourselves before we even release 4.0?  To me “I think we 
need this feature released faster” is not a reason to put it in 4.0, it could 
be a reason to release 4.1 sooner.  This is where having a releasable trunk 
helps, as if we decided as a project that some change was worth a new major 
being released early the effort of doing that release is much smaller when 
trunk is releasable.

Any fix we make in 4.0 would be merged forward into trunk and could be fully 
verified there?  Probably not the best, but would give more confidence in a fix 
than otherwise without adding other major changes to 4.0?

-Jeremiah

> On Jul 13, 2021, at 7:59 AM, Benjamin Lerer  wrote:
>
>>
>> Furthermore, we introduced a significant performance regression in all
>> lines of the software by increasing the number of LWT round-trips. Unless
>> we intend to leave this regression for a further year without _any_ release
>> offering a solution, we will need suitable verification mechanisms for
>> whatever fixes we deliver.
>>
>> My view is that it is unacceptable to leave such a significant regression
>> unaddressed in all lines of software we intend to release for the
>> foreseeable future.
>
>
> I would like to expand a bit on this as I believe it might be important for
> people to have the full picture. The fix for  CASSANDRA-12126
> <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a
> regression by increasing the number of LWT round-trips. Nevertheless, the
> patch introduced a flag to allow users to revert to the previous behavior
> (previous performance + consistency issue).
>
> Also the patch did not address all paxos consistency issues. There are
> still some issues during topologie changes (may be in some other scenarios).
>
> My understanding of Benedict's proposal is to fix paxos once and for all
> without any performance regression.
>
> That goal makes total sense to me. "Where do we do that?" is a more tricky
> question.
>
> Le mar. 13 juil. 2021 à 14:46, bened...@apache.org  a
> écrit :
>
>> Hmm. It occurs to me I’m not entirely sure how our new release process is
>> going to work.
>>
>> Will we be releasing 4.1 builds immediately, as part of shippable trunk?
>> Or will 4.0 be our only active line of software for the next year?
>>
>> Either way, I bet my bottom dollar there will come some regret if we
>> introduce such divergence between the two most active branches we maintain,
>> so early in their lifecycles. If we invest significant resources in
>> improved testing using this framework (which I very much expect) then
>> branches that are not compatible will not benefit, likely re

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
> I do think adding the ability to do “Cluster and Code Simulations” is a new 
> feature.

I don’t. I understand a feature to be a user-visible change, such as new 
functionality, and it was on this basis I endorsed the release lifecycle 
document. I do not believe that all improvement should stop to patch releases, 
as I do not believe this produces the highest quality outcome.




From: Jeremiah D Jordan 
Date: Tuesday, 13 July 2021 at 14:41
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
I do not think fixing CASSANDRA-12126 is not a new feature.  I do think adding 
the ability to do “Cluster and Code Simulations” is a new feature.

-Jeremiah

> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote:
>
> Nothing we’re discussing constitutes a feature. We’re discussing stability 
> enhancements, and important bug fixes.
>
> I think this disagreement is to some extent founded on our different premises 
> about what a patch release should contain, and this seems to be the fault of 
> incompletely specified documentation.
>
> 1. The release lifecycle only forbids feature work from being developed in a 
> patch release, and only expressly includes bug fixes. Note that, the document 
> even has a comment by the author suggesting that features may be backported 
> to a patch release from trunk (not something I agree with, but it 
> demonstrates the ambiguity of the definition).
> 2. There seems to be some conflation of size-of-change with the admissibility 
> wrt release lifecycle – I don’t think there’s any criteria here, and it’s 
> open to the community’s case-by-case assessment. Whatever we do to fix the 
> bug in question will necessarily be a very significant piece of work itself, 
> for instance.
>
> My interpretation of the release lifecycle document is that it is acceptable 
> to include this work in a patch release. My belief about its impact is that 
> it would contribute positively to the stability of the project’s 4.0 releases 
> over the lifecycle, and improve project velocity.
>
> With respect to whether we can ship a fix to 12126 without validation, I 
> would be strongly opposed to this, and certainly would not produce a patch 
> myself in this way. Not only would it be burdensome (given the divergences in 
> the codebase), but I would not consider it acceptably safe (given the 
> divergence).
>
>
> From: Jeremiah D Jordan 
> Date: Tuesday, 13 July 2021 at 14:15
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> I tend to agree with Paulo that a major refactoring of some internal 
> interfaces sounds like something to be explicitly avoided in a patch release. 
>  I thought this was the type of change we all agreed we should stop letting 
> in to patch releases, and that we would attempt to release more often (once a 
> year) so changes that only go to trunk would get out faster?  Are we really 
> wanting to break that promise to ourselves before we even release 4.0?  To me 
> “I think we need this feature released faster” is not a reason to put it in 
> 4.0, it could be a reason to release 4.1 sooner.  This is where having a 
> releasable trunk helps, as if we decided as a project that some change was 
> worth a new major being released early the effort of doing that release is 
> much smaller when trunk is releasable.
>
> Any fix we make in 4.0 would be merged forward into trunk and could be fully 
> verified there?  Probably not the best, but would give more confidence in a 
> fix than otherwise without adding other major changes to 4.0?
>
> -Jeremiah
>
>> On Jul 13, 2021, at 7:59 AM, Benjamin Lerer  wrote:
>>
>>>
>>> Furthermore, we introduced a significant performance regression in all
>>> lines of the software by increasing the number of LWT round-trips. Unless
>>> we intend to leave this regression for a further year without _any_ release
>>> offering a solution, we will need suitable verification mechanisms for
>>> whatever fixes we deliver.
>>>
>>> My view is that it is unacceptable to leave such a significant regression
>>> unaddressed in all lines of software we intend to release for the
>>> foreseeable future.
>>
>>
>> I would like to expand a bit on this as I believe it might be important for
>> people to have the full picture. The fix for  CASSANDRA-12126
>> <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a
>> regression by increasing the number of LWT round-trips. Nevertheless, the
>> patch introduced a flag to allow users to revert to the previous behavior
>> (previous performance + consistency issue).
>>
>> Also the patch did not address all paxos consistency iss

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
My point is that we all have different premises we are working from. I don’t 
think you can convince me that I am mistaken about how I interpret the word 
feature. The release lifecycle document we voted on is ambiguous, and we all 
clearly take it to mean different things.

From: Jeremiah D Jordan 
Date: Tuesday, 13 July 2021 at 15:06
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
Just because it is a feature for users who are developers does not mean it is 
not a new feature?  Adding this capability is adding new functionality to what 
developers can do with Apache Cassandra.  How is that not a new feature?

Semver has been brought up a lot in conversations around what can go where.  If 
we look at how semver defines such things:

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.

This change to me sounds like 2.  Adding new functionality in a backwards 
compatible manner.  I guess our issue here is that we have never actually done 
MINOR releases in the C* project, we only make MAJOR releases and PATCH 
releases.  So we need to decide where things that in semver would go in a MINOR 
version should go.  In my mind it was always that such things should only go to 
a MAJOR, as it seems less safe to relax what goes in a PATCH and allow them 
there.

-Jeremiah

> On Jul 13, 2021, at 8:47 AM, bened...@apache.org wrote:
>
>> I do think adding the ability to do “Cluster and Code Simulations” is a new 
>> feature.
>
> I don’t. I understand a feature to be a user-visible change, such as new 
> functionality, and it was on this basis I endorsed the release lifecycle 
> document. I do not believe that all improvement should stop to patch 
> releases, as I do not believe this produces the highest quality outcome.
>
>
>
>
> From: Jeremiah D Jordan 
> Date: Tuesday, 13 July 2021 at 14:41
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> I do not think fixing CASSANDRA-12126 is not a new feature.  I do think 
> adding the ability to do “Cluster and Code Simulations” is a new feature.
>
> -Jeremiah
>
>> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote:
>>
>> Nothing we’re discussing constitutes a feature. We’re discussing stability 
>> enhancements, and important bug fixes.
>>
>> I think this disagreement is to some extent founded on our different 
>> premises about what a patch release should contain, and this seems to be the 
>> fault of incompletely specified documentation.
>>
>> 1. The release lifecycle only forbids feature work from being developed in a 
>> patch release, and only expressly includes bug fixes. Note that, the 
>> document even has a comment by the author suggesting that features may be 
>> backported to a patch release from trunk (not something I agree with, but it 
>> demonstrates the ambiguity of the definition).
>> 2. There seems to be some conflation of size-of-change with the 
>> admissibility wrt release lifecycle – I don’t think there’s any criteria 
>> here, and it’s open to the community’s case-by-case assessment. Whatever we 
>> do to fix the bug in question will necessarily be a very significant piece 
>> of work itself, for instance.
>>
>> My interpretation of the release lifecycle document is that it is acceptable 
>> to include this work in a patch release. My belief about its impact is that 
>> it would contribute positively to the stability of the project’s 4.0 
>> releases over the lifecycle, and improve project velocity.
>>
>> With respect to whether we can ship a fix to 12126 without validation, I 
>> would be strongly opposed to this, and certainly would not produce a patch 
>> myself in this way. Not only would it be burdensome (given the divergences 
>> in the codebase), but I would not consider it acceptably safe (given the 
>> divergence).
>>
>>
>> From: Jeremiah D Jordan 
>> Date: Tuesday, 13 July 2021 at 14:15
>> To: Cassandra DEV 
>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>> I tend to agree with Paulo that a major refactoring of some internal 
>> interfaces sounds like something to be explicitly avoided in a patch 
>> release.  I thought this was the type of change we all agreed we should stop 
>> letting in to patch releases, and that we would attempt to release more 
>> often (once a year) so changes that only go to trunk would get out faster?  
>> Are we really wanting to break that promise to ourselves before we even 
>> release 4.0?  To me “I think we need this feature released faster” is not a 
>

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
> If we're talking about introducing wrapper APIs that are compatible 
> w/existing concurrent classes

Unfortunately we’re not, as we often don’t use interfaces. Semaphore, 
CountDownLatch etc are concrete classes. We have quite a hodge-podge of 
concurrent API usages, and many of them are not readily mockable as they stand.

The majority of this work is cleaning the codebase, in all honesty. There is a 
lot of ugliness in there, and a lot of inconsistent behaviour.
- We use four different Futures APIs, I think (Future, ListenableFuture, 
CompletableFuture, netty’s Future), for instance. To minimise churn I implement 
three of the four in a single interface, and standardise on this for our 
Executors; this is a breaking change, and necessary to support mocking for all 
of these use cases without rewriting the application code. In this case, we use 
as a basis the futures we already introduced as part of the internode 
networking rewrite.
- To mock our executors I introduce factories, but the current hierarchy is a 
mess of inconsistency, so even discounting the above breaking change this 
necessitated introducing a new interface hierarchy to implement, and 
overhauling the internals for consistency.

PRs will land soon for people to look at, but honestly we’re getting into an 
unnecessary tangle over target release. I think it would be a mistake to push 
this to a later release, because it is valuable and it will bring pain by 
creating divergence - but the question a CEP is meant to answer is _if_ the 
community wants a piece of work.

Since it’s become an explicit point of contention, we can perhaps disaggregate 
a vote on _when_ to happen in parallel, once discussion on _if_ wraps up.


From: Joshua McKenzie 
Date: Tuesday, 13 July 2021 at 17:34
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
So stepping back from the feature vs. bug and rel cycle debate (a valuable
one, but not the original purpose of this thread):

>From the CEP:

>
>- Refactor internal APIs around concurrency to support mock
>implementations that are able to control execution, including
>
>
>- SimpleCondition, Semaphore, CountDownLatch, BlockingQueue, etc
>
>
>- Executors, futures, starting threads, etc - including important
>improvements to consistency of approach in the codebase
>
>
>- The use of currentTimeMillis and nanoTime
>
>
>- The replacement of java.io.File with a wrapper on java.nio.files.Path
> providing an ergonomic API, and some improvements to consistency of
>file handling
>
>
>- Support for alternative streaming implementations
>
>
>- Improvements to the dtest API to support necessary functionality
>
> If we're talking about introducing wrapper APIs that are compatible
w/existing concurrent classes with just a bimorphic call based on whether
we're testing or not, that's a very low risk change IMO. I'd expect any and
all invasive / new / possibly bugged changes to occur in "Introduction of a
simulator package", not in the basic interfaces we're shimming between
things.

Cleaning up inconsistency of our time units and calls of various concurrent
objects is bugfixing so should be fair game any time.

~Josh


On Tue, Jul 13, 2021 at 10:26 AM Benjamin Lerer  wrote:

> >
> > "Where do we do that?" is a more tricky question.
>
>
> Sorry, I was not really clear with that comment. What I was wondering is if
> we should create a minor version to address that issue (e.g. 4.1).
>
> I am also against making the change in the 4.0 branch.
>
> Le mar. 13 juil. 2021 à 16:09, bened...@apache.org  a
> écrit :
>
> > My point is that we all have different premises we are working from. I
> > don’t think you can convince me that I am mistaken about how I interpret
> > the word feature. The release lifecycle document we voted on is
> ambiguous,
> > and we all clearly take it to mean different things.
> >
> > From: Jeremiah D Jordan 
> > Date: Tuesday, 13 July 2021 at 15:06
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> > Just because it is a feature for users who are developers does not mean
> it
> > is not a new feature?  Adding this capability is adding new functionality
> > to what developers can do with Apache Cassandra.  How is that not a new
> > feature?
> >
> > Semver has been brought up a lot in conversations around what can go
> > where.  If we look at how semver defines such things:
> >
> > MAJOR version when you make incompatible API changes,
> > MINOR version when you add functionality in a backwards compatible
> manner,
> > and
> > PATCH version when you make backwards compatible b

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-14 Thread bened...@apache.org
> I think CEPs would benefit from describing their compatibility and
stability impacts, rather than trying to tie themselves to a
version, regardless of what context a specific version provides.

Yes, we should perhaps remove target version from the template, and introduce 
guidance on describing stability impact etc.

Regarding waivers, I’m not sure we’ve really agreed as a community what the 
criteria are for determining if work goes into a patch release – so I’m not 
sure it would be right to call it a waiver. But I agree that scheduling the 
release to contain some work should be a mixture of project roadmap planning 
(distinct from CEP), and Jira/dev list discussion near the point of merge.

The question is if there is still value in the CEP pages maintaining the 
endeavour’s goal for when the work will be ready, but perhaps this can be 
communicated in normal date format, and used to inform project roadmap planning.


From: Mick Semb Wever 
Date: Wednesday, 14 July 2021 at 10:41
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>
> PRs will land soon for people to look at, but honestly we’re getting into
> an unnecessary tangle over target release. I think it would be a mistake to
> push this to a later release, because it is valuable and it will bring pain
> by creating divergence - but the question a CEP is meant to answer is _if_
> the community wants a piece of work.
>
> Since it’s become an explicit point of contention, we can perhaps
> disaggregate a vote on _when_ to happen in parallel, once discussion on
> _if_ wraps up.



Totally agree, can we remove the "Target Version" from the CEP, so the vote
is based on the _if_ ?

Some further thoughts…

I think CEPs would benefit from describing their compatibility and
stability impacts, rather than trying to tie themselves to a
version, regardless of what context a specific version provides.

Rather than a subsequent vote on the CEP trying to get it into 4.0.x, what
about requests for waivers on each jira ticket as they are ready to land? I
suspect much of the work (once we see it) will be easier to agree to such
waivers than the only other position we have to stand by currently, which
is categories defined by SemVer. (A lot of people are really keen to see us
practice PATCH-only patch versions.) This also ties back to my request to
see a "rough timeline/plan of how the proposed changes are to be defined in
JIRAs and ordered."

It's worth noting that the code divergence will happen between two branches
no matter what, e.g. 3.11, and next April is really not far away at all. Is
it really a problem if the LWT fix is also pushed back to 4.1 (though I
understand this is a bigger discussion) for the sake of driving home
we are a project now serious about stability?

All in all, I am betting this discussion will be a lot more productive a)
when we see more of the work involved and its impact, and b) in a month or
two when we have better witnessed the stability of 4.0.0 and what has gone
into 4.0.1 and 4.0.2.


Re: [VOTE] Release Apache Cassandra 4.0.0 (take2)

2021-07-17 Thread bened...@apache.org
-1 (binding)

From: Dinesh Joshi 
Date: Saturday, 17 July 2021 at 05:38
To: dev@cassandra.apache.org 
Subject: Re: [VOTE] Release Apache Cassandra 4.0.0 (take2)
Thanks for the heads up Jon. Please ping the list once you have filed the jira.

Dinesh

> On Jul 16, 2021, at 5:28 PM, Jon Meredith  wrote:
>
> -1 nb.
>
> I'll open a JIRA with details later tonight or first thing tomorrow.
>
> I've confirmed that the serialization and deserialization of FWD_FRM
> on 4.0 nodes when communicating with pre-4.0 nodes is incorrect and
> includes an incorrect single-byte address length. Additionally the
> logic for whether to use the same messageid when forwarding or not
> needs to include the base message id as well as the forwarding ids. If
> there was a single node to forward onto, the forwarded request was not
> being sent with the correct messageId.
>
>> On Fri, Jul 16, 2021 at 3:05 PM Jon Meredith  wrote:
>>
>> I'd like to request an extension to the vote. There's a possible issue
>> with 4.0 instances serializing FWD_FRM message parameters to pre-4.0
>> nodes that I'm investigating and need a little more time.
>>
>>> On Thu, Jul 15, 2021 at 9:45 AM Sam Tunnicliffe  wrote:
>>>
>>> +1
>>>
 On 13 Jul 2021, at 23:13, Mick Semb Wever  wrote:

 Proposing the test build of Cassandra 4.0.0 for release.

 sha1: 924bf92fab1820942137138c779004acaf834187
 Git: 
 https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0.0-tentative
 Maven Artifacts:
 https://repository.apache.org/content/repositories/orgapachecassandra-1242/org/apache/cassandra/cassandra-all/4.0.0/

 The Source and Build Artifacts, and the Debian and RPM packages and
 repositories, are available here:
 https://dist.apache.org/repos/dist/dev/cassandra/4.0.0/

 The vote will be open for 72 hours (longer if needed). Everyone who
 has tested the build is invited to vote. Votes by PMC members are
 considered binding. A vote passes if there are at least three binding
 +1s and no -1's.

 [1]: CHANGES.txt:
 https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0.0-tentative
 [2]: NEWS.txt: 
 https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0.0-tentative

 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org

>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread bened...@apache.org
+1. I haven’t looked in detail at the API that’s been proposed, but I’m very 
much in favour of the work to support this, and the introduction of the newly 
proposed implementations.

In particular, really happy to see somebody finally finish up C-7282! I look 
forward to seeing how the different approaches compare.


From: Branimir Lambov 
Date: Tuesday, 20 July 2021 at 11:11
To: dev@cassandra.apache.org 
Subject: [DISCUSS] CEP-11: Pluggable memtable implementations
Proposal for a mechanism for plugging in memtable implementations:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations

The proposal supports using custom memtable implementations to support
development and testing of improved alternatives, but also enables a
broader definition of "memtable" to better support more advanced use cases
like persistent memory. To this end, memtable implementations are given
control over flushing and storing data in the commit log, enabling
solutions that implement their own durability mechanisms and live much
longer than their classical counterparts. Taken to the extreme, this also
enables memtables that never flush (in other words, alternative storage
engines) in a minimally-invasive manner.

I am curious to hear your thoughts on the proposal.

Regards,
Branimir


Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread bened...@apache.org
I think it would be a mistake to combine the Memtable with CommitLog; several 
systems use CommitLog-like functionality, and in the medium term I think these 
would benefit from a unified system, that Memtables may opt to register with.  
It might make sense to give the Memtable the choice over whether a Memtable 
write is persisted to this shared facility, but that’s different from merging 
the two conceptually.

I may look into producing a CEP on this evolution sometime in the next few 
months, but just a heads up about my thoughts on the topic, and to reach out if 
you plan your own evolution of this stuff.

From: Joshua McKenzie 
Date: Tuesday, 20 July 2021 at 18:36
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
+1 to the idea.

In general, I think we need to make up our mind as to whether we consider
the Memtable and CommitLog one logical entity (As stated in the CEP:
"Conceptually
these two pieces of the storage engine form one component — the LSM buffer
of Cassandra, and as such it makes a lot of sense to bundle them together. "),
or whether we want to further untangle those two components from an
architectural perspective which we started down that road on with the
pluggable storage engine work.

The interface as drafted codifies the idea that a Memtable should have an
opinion about how a CommitLog does its business (default boolean
writesShouldSkipCommitLog()) which makes sense if our design goal is to
keep those two things interdependent. I advocate for further separating
them but suspect that's a debate better had on JIRA or slack than the CEP
thread, just figured I'd bring it up since it's not yet clear to me whether
that's a pre or post CEP discussion (specific details of interfaces, etc).

Lots of quality work obviously went into this from a bunch of folks -
thanks Branimir!

~Josh




On Tue, Jul 20, 2021 at 6:20 AM bened...@apache.org 
wrote:

> +1. I haven’t looked in detail at the API that’s been proposed, but I’m
> very much in favour of the work to support this, and the introduction of
> the newly proposed implementations.
>
> In particular, really happy to see somebody finally finish up C-7282! I
> look forward to seeing how the different approaches compare.
>
>
> From: Branimir Lambov 
> Date: Tuesday, 20 July 2021 at 11:11
> To: dev@cassandra.apache.org 
> Subject: [DISCUSS] CEP-11: Pluggable memtable implementations
> Proposal for a mechanism for plugging in memtable implementations:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
>
> The proposal supports using custom memtable implementations to support
> development and testing of improved alternatives, but also enables a
> broader definition of "memtable" to better support more advanced use cases
> like persistent memory. To this end, memtable implementations are given
> control over flushing and storing data in the commit log, enabling
> solutions that implement their own durability mechanisms and live much
> longer than their classical counterparts. Taken to the extreme, this also
> enables memtables that never flush (in other words, alternative storage
> engines) in a minimally-invasive manner.
>
> I am curious to hear your thoughts on the proposal.
>
> Regards,
> Branimir
>


Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-21 Thread bened...@apache.org
> memtable-as-a-commitlog-index

Heh, based on 7282? Yeah, I’ve had this idea for a while now (actually there 
was a paper that did this a long time ago), and it could be very nice (if for 
no other benefit than reducing heap utilisation). I don’t think this requires 
that they be modelled as the same concept, however, only that the Memtable must 
be able to receive an address into a commit log entry and to adopt partial 
ownership over the entry’s lifecycle.


From: Branimir Lambov 
Date: Wednesday, 21 July 2021 at 14:28
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> In general, I think we need to make up our mind as to whether we
  consider the Memtable and CommitLog one logical entity [...], or
  whether we want to further untangle those two components from an
  architectural perspective which we started down that road on with
  the pluggable storage engine work.

This CEP is intentionally not attempting to answer this question. FWIW
I do not see them as separable (there's evidence to this fact in the
codebase), but there are valid secondary uses of the commit log that
are served well enough by the current architecture.

It is important, however, to let the memtable implementation opt out,
to permit it to provide its own solution for data persistence.

We should revisit this in the future, especially if Benedict's shared
log facility and my plans for a memtable-as-a-commitlog-index
evolve.

Regards,
Branimir

On Wed, Jul 21, 2021 at 1:34 PM Michael Burman  wrote:

> Hi,
>
> It is nice to see these going forward (and a great use of CEP) so thanks
> for the proposal. I have my reservations regarding the linking of memtable
> to CommitLog and flushing and should not leak abstraction from one to
> another. And I don't see the reasoning why they should be, it doesn't seem
> to add anything else than tight coupling of components, reducing reuse and
> making things unnecessarily complicated. Also, the streaming notions seem
> weird to me - how are they related to memtable? Why should memtable care
> about the behavior outside memtable's responsibility?
>
> Some misc (with some thoughts split / duplicated to different parts) quotes
> and comments:
>
> > Tight coupling between CFS and memtable will be reduced: flushing
> functionality is to be extracted, controlling memtable memory and period
> expiration will be handled by the memtable.
>
> Why is flushing control bad to do in CFS and better in the memtable? Doing
> it outside memtable would allow to control the flushing regardless of how
> the actual memtable is implemented. For example, lets say someone would
> want to implement the HBase's accordion to Cassandra. It shouldn't matter
> what the implementation of memtable is as the compaction of different
> memtables could be beneficial to all implementations. Or the flushing would
> push the memtable to a proper caching instead of only to disk.
>
> Or if we had per table caching structure, we could control the flushing of
> memtables and the cache structure separately. Some data benefits from LRU
> and some from MRW (most-recently-written) caching strategies. But both
> could benefit from the same memtable implementation, it's the data and how
> its used that could control how the flushing should work. For example time
> series data behaves quite differently in terms of data accesses to
> something more "random".
>
> Or even "total memory control" which would check which tables need more
> memory to do their writes and which do not. Or that the memory doesn't grow
> over a boundary and needs to manually maintain how much is dedicated to
> caching and how much to memtables waiting to be flushed. Or delay flushing
> because the disks can't keep up etc. Not to be implemented in this CEP, but
> pushing this strategy to memtable would prevent many features.
>
> > Beyond thread-safety, the concurrency constraints of the memtable are
> intentionally left unspecified.
>
> I like this. I could see use-cases where a single-thread implementation
> could actually outperform some concurrent data structures. But it also
> provides me with a question, is this proposal going to take an angle
> towards per-range memtables? There are certainly benefits to splitting the
> memtables as it would reduce the "n" in the operations, thus providing less
> overhead in lookups and writes. Although, taking it one step backwards I
> could see the benefit of having a commitlog per range also, which would
> allow higher utilization of NVME drives with larger queue depths. And why
> not per-range-sstables for faster scale-outs and .. a bit outside the scope
> of CEP, but just to ensure that the implementation does not block such
> improvement.
>
> Interfaces:
>
> > boolean writesAreDurable()
> > boolean writesShouldSkipCommitLog()
>
> The placement inside memtable implementation for these methods just feels
> incredibly wrong to me. The writing pipeline should have these configured
> and the

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-21 Thread bened...@apache.org
I would love to help out with this in any way that I can, FYI. Definitely one 
of the more impactful performance improvements to the codebase, given the 
benefits to compaction and memory behaviour.

From: bened...@apache.org 
Date: Wednesday, 21 July 2021 at 14:32
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> memtable-as-a-commitlog-index

Heh, based on 7282? Yeah, I’ve had this idea for a while now (actually there 
was a paper that did this a long time ago), and it could be very nice (if for 
no other benefit than reducing heap utilisation). I don’t think this requires 
that they be modelled as the same concept, however, only that the Memtable must 
be able to receive an address into a commit log entry and to adopt partial 
ownership over the entry’s lifecycle.


From: Branimir Lambov 
Date: Wednesday, 21 July 2021 at 14:28
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> In general, I think we need to make up our mind as to whether we
  consider the Memtable and CommitLog one logical entity [...], or
  whether we want to further untangle those two components from an
  architectural perspective which we started down that road on with
  the pluggable storage engine work.

This CEP is intentionally not attempting to answer this question. FWIW
I do not see them as separable (there's evidence to this fact in the
codebase), but there are valid secondary uses of the commit log that
are served well enough by the current architecture.

It is important, however, to let the memtable implementation opt out,
to permit it to provide its own solution for data persistence.

We should revisit this in the future, especially if Benedict's shared
log facility and my plans for a memtable-as-a-commitlog-index
evolve.

Regards,
Branimir

On Wed, Jul 21, 2021 at 1:34 PM Michael Burman  wrote:

> Hi,
>
> It is nice to see these going forward (and a great use of CEP) so thanks
> for the proposal. I have my reservations regarding the linking of memtable
> to CommitLog and flushing and should not leak abstraction from one to
> another. And I don't see the reasoning why they should be, it doesn't seem
> to add anything else than tight coupling of components, reducing reuse and
> making things unnecessarily complicated. Also, the streaming notions seem
> weird to me - how are they related to memtable? Why should memtable care
> about the behavior outside memtable's responsibility?
>
> Some misc (with some thoughts split / duplicated to different parts) quotes
> and comments:
>
> > Tight coupling between CFS and memtable will be reduced: flushing
> functionality is to be extracted, controlling memtable memory and period
> expiration will be handled by the memtable.
>
> Why is flushing control bad to do in CFS and better in the memtable? Doing
> it outside memtable would allow to control the flushing regardless of how
> the actual memtable is implemented. For example, lets say someone would
> want to implement the HBase's accordion to Cassandra. It shouldn't matter
> what the implementation of memtable is as the compaction of different
> memtables could be beneficial to all implementations. Or the flushing would
> push the memtable to a proper caching instead of only to disk.
>
> Or if we had per table caching structure, we could control the flushing of
> memtables and the cache structure separately. Some data benefits from LRU
> and some from MRW (most-recently-written) caching strategies. But both
> could benefit from the same memtable implementation, it's the data and how
> its used that could control how the flushing should work. For example time
> series data behaves quite differently in terms of data accesses to
> something more "random".
>
> Or even "total memory control" which would check which tables need more
> memory to do their writes and which do not. Or that the memory doesn't grow
> over a boundary and needs to manually maintain how much is dedicated to
> caching and how much to memtables waiting to be flushed. Or delay flushing
> because the disks can't keep up etc. Not to be implemented in this CEP, but
> pushing this strategy to memtable would prevent many features.
>
> > Beyond thread-safety, the concurrency constraints of the memtable are
> intentionally left unspecified.
>
> I like this. I could see use-cases where a single-thread implementation
> could actually outperform some concurrent data structures. But it also
> provides me with a question, is this proposal going to take an angle
> towards per-range memtables? There are certainly benefits to splitting the
> memtables as it would reduce the "n" in the operations, thus providing less
> overhead in lookups and writes. Alth

[VOTE] CEP-10: Cluster and Code Simulations

2021-07-26 Thread bened...@apache.org
Proposing the CEP-10 (Cluster and Code Simulations) for adoption

Proposal: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-10%3A+Cluster+and+Code+Simulations
Discussion: 
https://lists.apache.org/thread.html/rc908165994b15a29ef9c17b0b1205b2abc5bd38228b5a0117e442104%40%3Cdev.cassandra.apache.org%3E

The vote will be open for 72 hours.
Votes by PMC members are considered binding. A
vote passes if there are at least three binding +1s and no binding vetoes.


[RESULT] [VOTE] CEP-10: Cluster and Code Simulations

2021-07-30 Thread bened...@apache.org
The vote passes, with 6 +1s and no -1s.


From: Blake Eggleston 
Date: Wednesday, 28 July 2021 at 20:53
To: dev@cassandra.apache.org 
Subject: Re: [VOTE] CEP-10: Cluster and Code Simulations
+1

> On Jul 27, 2021, at 9:21 PM, Scott Andreas  wrote:
>
> +1 nb
>
> 
> From: Sam Tunnicliffe 
> Sent: Tuesday, July 27, 2021 12:54 AM
> To: dev@cassandra.apache.org
> Subject: Re: [VOTE] CEP-10: Cluster and Code Simulations
>
> +1
>
>> On 26 Jul 2021, at 11:51, bened...@apache.org wrote:
>>
>> Proposing the CEP-10 (Cluster and Code Simulations) for adoption
>>
>> Proposal: 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-10%3A+Cluster+and+Code+Simulations
>> Discussion: 
>> https://lists.apache.org/thread.html/rc908165994b15a29ef9c17b0b1205b2abc5bd38228b5a0117e442104%40%3Cdev.cassandra.apache.org%3E
>>
>> The vote will be open for 72 hours.
>> Votes by PMC members are considered binding. A
>> vote passes if there are at least three binding +1s and no binding vetoes.
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] Jira state for second reviewer

2021-08-02 Thread bened...@apache.org
Perhaps “Awaiting Second Review”?

It looks from the flow that this is more accurate, as a second reviewer could 
have been assigned but review could not yet have gotten underway? It’s unclear 
to me what you would do in this case – would it return to Patch Available, or 
sit in Needs Second Reviewer?

From: Brandon Williams 
Date: Monday, 2 August 2021 at 14:57
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Jira state for second reviewer
+1

On Mon, Aug 2, 2021 at 8:40 AM Ekaterina Dimitrova
 wrote:
>
> Hi everyone,
>
> While triaging tickets last week, we realized that the new state works well
> with only one caveat. The expectation is Patch Available to be used when
> there is no reviewer available and Needs Reviewer to be used when we need a
> second reviewer. The name Needs Reviewer might be confusing though and
> someone can use it also for first reviewer needed which makes triaging a
> bit harder. Benjamin suggested a change of name from Needs Reviewer to
> Needs 2nd Reviewer to make its usage more explicit for people. Any thoughts
> or objections here?
>
> Best regards,
> Ekaterina
>
> On Thu, 8 Jul 2021 at 4:54, Benjamin Lerer  wrote:
>
> > That sounds good to me. Thanks a lot Brandon and Ekaterina for taking care
> > of that.
> >
> > Le mer. 7 juil. 2021 à 23:47, Ekaterina Dimitrova 
> > a
> > écrit :
> >
> > > Hey everyone,
> > > Considering the latest report of patches which need a reviewer, I think
> > > this new Jira state is a great addition.
> > > I took it one step further today and asked for it to be available after
> > > PATCH AVAILABLE too. This is already implemented. I hope Brandon doesn’t
> > > mind my intervention. The reason for that decision was that sometimes we
> > > have already first reviewer assigned who is still not working on a review
> > > but this shouldn’t stop us to be looking already for a second reviewer.
> > >
> > > Best regards,
> > > Ekaterina
> > >
> > > On Thu, 1 Jul 2021 at 9:41, Benjamin Lerer  wrote:
> > >
> > > > +1
> > > >
> > > > Le jeu. 1 juil. 2021 à 05:58, Caleb Rackliffe <
> > calebrackli...@gmail.com>
> > > a
> > > > écrit :
> > > >
> > > > > +1
> > > > >
> > > > > > On Jun 30, 2021, at 4:38 PM, Brandon Williams 
> > > > wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Since our project governance requires two committers, which in some
> > > > > > circumstances may mean two committers need to review, I'd like to
> > add
> > > > > > another state to our jira such that finding tickets that need a
> > > second
> > > > > > reviewer is possible, since it is not currently.
> > > > > >
> > > > > > On slack, Paulo Motta suggested this:
> > > > > >
> > > > > > Patch Available -> Review in Progress <-> Needs Reviewer* -> Ready
> > To
> > > > > Commit
> > > > > >
> > > > > > Where "needs reviewer" is an optional state that can then move back
> > > to
> > > > > > "Review in Progress" and carry on.  This would affect all tickets
> > in
> > > > > > the project, so I'm curious if there are any thoughts or
> > objections?
> > > > > >
> > > > > > Kind Regards,
> > > > > > Brandon
> > > > > >
> > > > > >
> > -
> > > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > > > >
> > > > >
> > > > > -
> > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > > >
> > > > >
> > > >
> > >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] Jira state for second reviewer

2021-08-02 Thread bened...@apache.org
I was proposing substituting “Needs Second Reviewer” for “Awaiting Second 
Review” as this encapsulates the need for an additional reviewer _and_ the 
pending status for the review beginning.

I don’t think it is reasonable to assume that once a reviewer is found that 
they will move it into “In Review” nor would that be very helpful, as we would 
not know which tickets were actively under review as opposed to pending review 
by an agreed second reviewer.

From: Ekaterina Dimitrova 
Date: Monday, 2 August 2021 at 15:15
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Jira state for second reviewer
Thank you all.
On Benedict’s question, my understanding is that the idea of Needs Second
Reviewer is to indicate we need to find a second reviewer. I suspect when
we find one he/she will move it to “In review” and provide status updates
in the ticket. I am open for better suggestions.
I guess “Awaiting Second Review” can be added to show that we have
reviewers but the second review is not started yet? I would personally
probably skip adding it and rely that people will follow up on their
assignments. If we incorporate the alerts suggestions that were made some
time ago - I think it would be better after the ticket was in review for
particular amount of time, alert/reminder to be sent to the reviewers. But
probably we can also do both things for more visibility if we as a
community want to.

On Mon, 2 Aug 2021 at 10:02, bened...@apache.org 
wrote:

> Perhaps “Awaiting Second Review”?
>
> It looks from the flow that this is more accurate, as a second reviewer
> could have been assigned but review could not yet have gotten underway?
> It’s unclear to me what you would do in this case – would it return to
> Patch Available, or sit in Needs Second Reviewer?
>
> From: Brandon Williams 
> Date: Monday, 2 August 2021 at 14:57
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Jira state for second reviewer
> +1
>
> On Mon, Aug 2, 2021 at 8:40 AM Ekaterina Dimitrova
>  wrote:
> >
> > Hi everyone,
> >
> > While triaging tickets last week, we realized that the new state works
> well
> > with only one caveat. The expectation is Patch Available to be used when
> > there is no reviewer available and Needs Reviewer to be used when we
> need a
> > second reviewer. The name Needs Reviewer might be confusing though and
> > someone can use it also for first reviewer needed which makes triaging a
> > bit harder. Benjamin suggested a change of name from Needs Reviewer to
> > Needs 2nd Reviewer to make its usage more explicit for people. Any
> thoughts
> > or objections here?
> >
> > Best regards,
> > Ekaterina
> >
> > On Thu, 8 Jul 2021 at 4:54, Benjamin Lerer  wrote:
> >
> > > That sounds good to me. Thanks a lot Brandon and Ekaterina for taking
> care
> > > of that.
> > >
> > > Le mer. 7 juil. 2021 à 23:47, Ekaterina Dimitrova <
> e.dimitr...@gmail.com>
> > > a
> > > écrit :
> > >
> > > > Hey everyone,
> > > > Considering the latest report of patches which need a reviewer, I
> think
> > > > this new Jira state is a great addition.
> > > > I took it one step further today and asked for it to be available
> after
> > > > PATCH AVAILABLE too. This is already implemented. I hope Brandon
> doesn’t
> > > > mind my intervention. The reason for that decision was that
> sometimes we
> > > > have already first reviewer assigned who is still not working on a
> review
> > > > but this shouldn’t stop us to be looking already for a second
> reviewer.
> > > >
> > > > Best regards,
> > > > Ekaterina
> > > >
> > > > On Thu, 1 Jul 2021 at 9:41, Benjamin Lerer 
> wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Le jeu. 1 juil. 2021 à 05:58, Caleb Rackliffe <
> > > calebrackli...@gmail.com>
> > > > a
> > > > > écrit :
> > > > >
> > > > > > +1
> > > > > >
> > > > > > > On Jun 30, 2021, at 4:38 PM, Brandon Williams <
> dri...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > Since our project governance requires two committers, which in
> some
> > > > > > > circumstances may mean two committers need to review, I'd like
> to
> > > add
> > > > > > > another state to our jira such that finding tickets that need a
> > > > second
> > > > >

Re: [DISCUSS] Jira state for second reviewer

2021-08-02 Thread bened...@apache.org
So, I don’t feel strongly about this at all, I just think it will be more 
confusing this way so lead to more inconsistency of usage, as it will be 
unclear what this second reviewer should do if they don’t start reviewing 
immediately, so some tickets will remain in “Needs Second Reviewer” when it 
doesn’t, and others will be in “In Review” when it isn’t.

It will also be more burdensome to find out the true state of a ticket: if the 
new reviewer transitions a ticket to “In Review” but doesn’t in fact start 
review, you now need to ask a human being if they’re really reviewing something 
or not, there’s no way to find out by yourself. If the “Awaiting Second Review” 
state is interpreted as perhaps only needing a second reviewer, a report can 
easily distinguish this by listing the contents of the Reviewers column.

But, I don’t anticipate losing any sleep over whatever we decide here.

From: Ekaterina Dimitrova 
Date: Monday, 2 August 2021 at 15:37
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Jira state for second reviewer
My only worry is that If we incorporate both things in one state this means
that people won’t be able to find immediately tickets to assign for review.
They will have to go and check whether it needs a reviewer or just the
second reviewer haven’t started review yet. That is why I suggested then to
have both “Needs Second Reviewer” and “Awaiting Second Review” as indeed,
we can’t expect that people will immediately start a review when they
assign themselves as a reviewer. That I totally agree with. My only point
is that we need a state that incorporates really only one state - “we need
a person to help with review” and no other meaning. Otherwise triaging will
be again harder. IMHO this will help us to produce good reports and easily
identify spots that need attention/help.
I don’t disagree with you, I just think this is one additional point we
have to consider separately.

On Mon, 2 Aug 2021 at 10:17, bened...@apache.org 
wrote:

> I was proposing substituting “Needs Second Reviewer” for “Awaiting Second
> Review” as this encapsulates the need for an additional reviewer _and_ the
> pending status for the review beginning.
>
> I don’t think it is reasonable to assume that once a reviewer is found
> that they will move it into “In Review” nor would that be very helpful, as
> we would not know which tickets were actively under review as opposed to
> pending review by an agreed second reviewer.
>
> From: Ekaterina Dimitrova 
> Date: Monday, 2 August 2021 at 15:15
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Jira state for second reviewer
> Thank you all.
> On Benedict’s question, my understanding is that the idea of Needs Second
> Reviewer is to indicate we need to find a second reviewer. I suspect when
> we find one he/she will move it to “In review” and provide status updates
> in the ticket. I am open for better suggestions.
> I guess “Awaiting Second Review” can be added to show that we have
> reviewers but the second review is not started yet? I would personally
> probably skip adding it and rely that people will follow up on their
> assignments. If we incorporate the alerts suggestions that were made some
> time ago - I think it would be better after the ticket was in review for
> particular amount of time, alert/reminder to be sent to the reviewers. But
> probably we can also do both things for more visibility if we as a
> community want to.
>
> On Mon, 2 Aug 2021 at 10:02, bened...@apache.org 
> wrote:
>
> > Perhaps “Awaiting Second Review”?
> >
> > It looks from the flow that this is more accurate, as a second reviewer
> > could have been assigned but review could not yet have gotten underway?
> > It’s unclear to me what you would do in this case – would it return to
> > Patch Available, or sit in Needs Second Reviewer?
> >
> > From: Brandon Williams 
> > Date: Monday, 2 August 2021 at 14:57
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] Jira state for second reviewer
> > +1
> >
> > On Mon, Aug 2, 2021 at 8:40 AM Ekaterina Dimitrova
> >  wrote:
> > >
> > > Hi everyone,
> > >
> > > While triaging tickets last week, we realized that the new state works
> > well
> > > with only one caveat. The expectation is Patch Available to be used
> when
> > > there is no reviewer available and Needs Reviewer to be used when we
> > need a
> > > second reviewer. The name Needs Reviewer might be confusing though and
> > > someone can use it also for first reviewer needed which makes triaging
> a
> > > bit harder. Benjamin suggested a change of name from Needs Reviewer to
> > > Needs 2nd Reviewer to make its usage more explicit for people. Any
> > thoughts
> > &g

Re: [DISCUSS] CASSANDRA-16767, CASSANDRA-16768, and CASSANDRA-16769 for 3.11.x

2021-08-10 Thread bened...@apache.org
Hi Scott,

I wonder if it’s possible that too few people who saw your email consider 
themselves sufficiently involved in this part of the codebase.  People tend to 
keep quiet about stuff they don’t participate in deeply, which is why I haven’t 
responded – and I wonder if this might explain the tumbleweed. I wonder how we 
might generally track if areas of the codebase are adequately covered by active 
contributors.

To answer your question, I don’t personally believe it is problematic to add 
additional features to command line tools in a patch version – they’re not 
scary systems where new features introduce much risk of high impact bugs. 
Others have stricter interpretations of the rules, but if they haven’t spoken 
up yet I’d say you’re clear to post some patches – but you might want to first 
make sure there’s somebody able and willing to review them.


From: Scott Carey 
Date: Monday, 9 August 2021 at 20:12
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CASSANDRA-16767, CASSANDRA-16768, and CASSANDRA-16769 
for 3.11.x
Thank you Brandon, for answering my questions on slack, and providing early
feedback on these ideas more than a month before I created the patches and
replying here.

Does anyone else have any comments or opinions?  Can a decision be reached
one way or another?  It is my understanding that we'll need more than one
+1 to move forward here.

I understand that the 4.0 release was a busy time, and that many probably
saw this, thought about replying, but got too busy and never did.
However, in light of the recent discussions around attracting new
contributors, I would like to highlight that being left in limbo with no
resolution is worse than being told "no", especially for new contributors.




On Fri, Jul 2, 2021 at 1:23 PM Brandon Williams  wrote:

> On Tue, Jun 29, 2021 at 5:49 PM Scott Carey  wrote:
> >
> > I'd like to discuss the inclusion of the above tickets for a 3.11.x
> > release.  These are not a pure 'bug fix' so I'll need a waiver to get
> them
> > into 3.11.x  (and implicitly, 4.0.x).
> >
> > The first two are straightforward oversights:  neither *nodetool
> > garbagecollect *nor *nodetool scrub* currently accept a *--user-defined*
> > parameter list of SSTables in the same way that *nodetool compact* does.
> >
> > This is an operational problem for large tables.
> >
> > I often need to scrub just one file that is corrupted for some reason,
> and
> > not scrub an entire 1TB+ of data for a table on a node.  This renders
> > 'nodetool scrub' operationally useless for large tables.
>
> I think that given not having user defined options for these
> compaction types is clearly an oversight, and that the alternative of
> deleting the large 1TB+ sstable and then repairing is a cure worse
> than the disease, this should be added to 3.11.x and 4.0.x. I am +1
> here.
>
> > For *garbagecollect* it is often operationally easy to identify which
> > tables are likely to be full of bloa- and operationally useful to do this
> > task in small increments.  The existing order that garbagecollect
> processes
> > SSTables prevents it from being useful in any incremental fashion -- if
> you
> > stop it and later restart, it will first process the SSTables you just
> > garbage collected.
> >
> > The third ticket adds an option for* nodetool garbagecollect*,
> > *--oldest-fraction* that can select a fraction of the oldest table data
> in
> > bytes, and garbagecollect only the SSTables that 'cover' that percentage
> of
> > data.  Operationally, this lends itself to easy automation -- for example
> > running this once a week on 10% of a table's data would imply that there
> is
> > no data on disk that has been overwritten within the last 10 weeks.  This
> > caps data bloat in ways neither LCS nor STCS can currently achieve
> without
> > regular major compactions or full-pass garbagecollect.
>
> This is a less obvious thing to be added, and I personally lack the
> operational experience to comment on how much relief this would
> provide firsthand, so I'll leave that to others.  But it does make
> sense to me and since it isn't heavily modifying anything my
> inclination is that this could be an acceptable addition as well.
>
> Kind Regards,
> Brandon
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


[DISCUSS] CEP 14: Paxos Improvements

2021-08-18 Thread bened...@apache.org
RE: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements

I’m proposing this CEP for approval by the project. The goal is to both improve 
the performance of LWTs and to ensure their correctness across a range of 
scenario like range movements. This work builds upon the Simulator CEP that has 
been recently adopted, and patches will follow in the coming weeks.

If you have any concerns or questions please raise them here for discussion.


Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-19 Thread bened...@apache.org
Hi Jeremy,

That’s a great question, and the answer is that we shouldn’t compare the two as 
they aren’t in conflict. The goal of this work is only to improve the existing 
Paxos implementation – the characteristics are identical besides being faster, 
so this is a simple and safe upgrade route for users in the short to medium 
term.

Watch this space for a follow up discussion very soon about what we can do to 
modernise transactions in Cassandra more generally, and what this might mean 
for how we perform consensus. A comparative discussion of EPaxos and other 
related work is very well suited to that topic, in my opinion.


From: Jeremy Hanna 
Date: Thursday, 19 August 2021 at 00:58
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
It sounds like a great improvement!

Just for those who had followed the development of ePaxos* that Blake and 
others worked on but was never committed, it would be nice to briefly compare 
the two.

https://issues.apache.org/jira/browse/CASSANDRA-6246

> On Aug 19, 2021, at 9:18 AM, Scott Andreas  wrote:
>
> Benedict, thank you for sharing this CEP!
>
> Adding some notes on why I support this proposal:
>
> - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x on 
> reads is a huge improvement. This latency reduction may be sufficient to 
> allow many users of Cassandra who operate in a single datacenter, 
> availability zone, or region to migrate to a multi-region topology.
>
> - The Cluster Simulation work described in CEP-10 provides a toolchain for 
> probabilistically-exhaustive validation and simulation of transactional 
> correctness, allowing assertion of linearizability in the presence of 
> adversarial thread scheduling and message ordering over an unbounded number 
> of simulated clusters and transactions.
>
> - Some use cases may see a superlinear increase in LWT performance due to a 
> reduction in contention afforded by fewer message round-trips. E.g., halving 
> latency shortens the interval during which competing transactions may 
> conflict, reducing contention and improving throughput beyond a level that 
> would be afforded by the latency reduction alone.
>
> - Better safety among range movements: Electorate verification during range 
> movements provides a stronger assertion of linearizability via assurance of 
> the set of instances voting on a transaction.
>
> – Scott
>
> 
> From: bened...@apache.org 
> Sent: Wednesday, August 18, 2021 2:31 PM
> To: dev@cassandra.apache.org
> Subject: [DISCUSS] CEP 14: Paxos Improvements
>
> RE: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
>
> I’m proposing this CEP for approval by the project. The goal is to both 
> improve the performance of LWTs and to ensure their correctness across a 
> range of scenario like range movements. This work builds upon the Simulator 
> CEP that has been recently adopted, and patches will follow in the coming 
> weeks.
>
> If you have any concerns or questions please raise them here for discussion.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: [VOTE] CEP-11: Pluggable memtable implementations

2021-08-19 Thread bened...@apache.org
+1

From: Brandon Williams 
Date: Thursday, 19 August 2021 at 17:16
To: dev@cassandra.apache.org 
Subject: Re: [VOTE] CEP-11: Pluggable memtable implementations
+1

On Thu, Aug 19, 2021 at 11:11 AM Branimir Lambov  wrote:
>
> Hello everyone,
>
> I am proposing the CEP-11 (Pluggable memtable implementations) for adoption
>
> Discussion thread:
> https://lists.apache.org/thread.html/rb5e950f882196764744c31bc3c13dfbf0603cb9f8bc2f6cfb976d285%40%3Cdev.cassandra.apache.org%3E
>
>
> The vote will be open for 72 hours.
> Votes by PMC members are considered binding.
> A vote passes if there are at least three binding +1s and no binding vetoes.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-19 Thread bened...@apache.org
> Why not throw an exception?

So this is essentially just a reporting mechanism for when an operation 
encounters state that should be impossible – this will have been left behind by 
prior operations, so the damage is already done and there’s no reason to throw 
an exception and fail the current one.

I should also make clear this _isn’t_ a guarantee of spotting violations, but 
it’s quite sensitive and much better than nothing. In a real system the most 
likely cause of this kind of impossible state would be e.g. mixing SERIAL with 
LOCAL_SERIAL, which is not safe unless you perform a really intricate dance, 
but we can distinguish this case from real bugs.

> Also, way to sell the next discussion Benedict :D

:D


From: Patrick McFadin 
Date: Thursday, 19 August 2021 at 21:48
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
I'm curious about this: "We will introduce mechanisms to spot and log
linearizability violations for the user to file as bug reports" Why not
throw an exception? Maybe it's just I don't quite see how this will be
detected. I think this is very interesting though.

Also, way to sell the next discussion Benedict :D

Patrick

On Thu, Aug 19, 2021 at 1:50 AM bened...@apache.org 
wrote:

> Hi Jeremy,
>
> That’s a great question, and the answer is that we shouldn’t compare the
> two as they aren’t in conflict. The goal of this work is only to improve
> the existing Paxos implementation – the characteristics are identical
> besides being faster, so this is a simple and safe upgrade route for users
> in the short to medium term.
>
> Watch this space for a follow up discussion very soon about what we can do
> to modernise transactions in Cassandra more generally, and what this might
> mean for how we perform consensus. A comparative discussion of EPaxos and
> other related work is very well suited to that topic, in my opinion.
>
>
> From: Jeremy Hanna 
> Date: Thursday, 19 August 2021 at 00:58
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
> It sounds like a great improvement!
>
> Just for those who had followed the development of ePaxos* that Blake and
> others worked on but was never committed, it would be nice to briefly
> compare the two.
>
> https://issues.apache.org/jira/browse/CASSANDRA-6246
>
> > On Aug 19, 2021, at 9:18 AM, Scott Andreas  wrote:
> >
> > Benedict, thank you for sharing this CEP!
> >
> > Adding some notes on why I support this proposal:
> >
> > - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x
> on reads is a huge improvement. This latency reduction may be sufficient to
> allow many users of Cassandra who operate in a single datacenter,
> availability zone, or region to migrate to a multi-region topology.
> >
> > - The Cluster Simulation work described in CEP-10 provides a toolchain
> for probabilistically-exhaustive validation and simulation of transactional
> correctness, allowing assertion of linearizability in the presence of
> adversarial thread scheduling and message ordering over an unbounded number
> of simulated clusters and transactions.
> >
> > - Some use cases may see a superlinear increase in LWT performance due
> to a reduction in contention afforded by fewer message round-trips. E.g.,
> halving latency shortens the interval during which competing transactions
> may conflict, reducing contention and improving throughput beyond a level
> that would be afforded by the latency reduction alone.
> >
> > - Better safety among range movements: Electorate verification during
> range movements provides a stronger assertion of linearizability via
> assurance of the set of instances voting on a transaction.
> >
> > – Scott
> >
> > 
> > From: bened...@apache.org 
> > Sent: Wednesday, August 18, 2021 2:31 PM
> > To: dev@cassandra.apache.org
> > Subject: [DISCUSS] CEP 14: Paxos Improvements
> >
> > RE:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
> >
> > I’m proposing this CEP for approval by the project. The goal is to both
> improve the performance of LWTs and to ensure their correctness across a
> range of scenario like range movements. This work builds upon the Simulator
> CEP that has been recently adopted, and patches will follow in the coming
> weeks.
> >
> > If you have any concerns or questions please raise them here for
> discussion.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>


Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-20 Thread bened...@apache.org
ger assertion of linearizability via assurance of 
> the set of instances voting on a transaction.
>
> – Scott
>
> 
> From: bened...@apache.org 
> Sent: Wednesday, August 18, 2021 2:31 PM
> To: dev@cassandra.apache.org
> Subject: [DISCUSS] CEP 14: Paxos Improvements
>
> RE: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
>
> I’m proposing this CEP for approval by the project. The goal is to both 
> improve the performance of LWTs and to ensure their correctness across a 
> range of scenario like range movements. This work builds upon the Simulator 
> CEP that has been recently adopted, and patches will follow in the coming 
> weeks.
>
> If you have any concerns or questions please raise them here for discussion.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-20 Thread bened...@apache.org
> My initial testing suggestedit was not required (when the new DC is not 
> serving reads).

The problem is that today there’s no way to reliably exclude the new DC from 
serving reads, that I know of? If you can, then yes you would only need to 
ensure repair were run prior to activating reads from this DC.

> Perhaps the CL mechanism could be pluggable

I think this is unlikely, particularly as we start to consider things like 
consensus - at least any time soon. Quorums are quite intricately woven into 
any implementation, and it would be quite hard to fully generalise them. In 
practice we can probably accommodate any simple vote threshold quorums  (those 
where some electorate each have a vote, and each vote has an equal weight that 
reaches consensus once a threshold is crossed) and support at least one level 
of nesting (so that DCs may logically vote as a block based on some quorum 
within a DC) in any topology without a plugin system, and I suspect this will 
be more than enough for any system in the foreseeable future.

> I wonder if it should be a ‘default CL’ which can additionally be overridden 
> by queries?

There are some practicalities that probably prohibit us from eliminating user 
provided CLs, but I would like to see them phased out as far as possible as 
they are very hard to verify. To support this flexibility more generally I’d 
prefer to see tables offer potentially multiple consensus schemes with 
potentially different qualities (that can perhaps even be named by the user) 
for these cases, such as (for instance) fast-and-inconsistent-reads. This still 
permits their properties to be vetted by the database while offering 
flexibility to the user, and for them to declare at the operator level what 
meeting this concept requires. It also means the database can maintain these 
properties through any topology change.

But we’ll probably have people using legacy CLs for another decade, so we’re 
going to have to support people querying with those CLs, but we might want to 
encourage people to disable them on their clusters and migrate to safer setups.

From: Miles Garnsey 
Date: Friday, 20 August 2021 at 12:51
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
Many thanks for this detailed response Benedict. I look forward to seeing the 
forthcoming proposals in relation to schema change safety when LWTs are in use.

We have been following almost the scale-by-one workaround you described - I am 
grateful for the additional validation. The only divergence is that we have not 
been advising a repair in between each node addition. My initial testing 
suggestedit was not required (when the new DC is not serving reads). But if you 
are aware of issues that arise at scale then I’d love to hear your experience, 
as we are still in the planning phase for that project.

Regarding CLs (off topic)

> To respond to Mick: we could introduce an EACH_SERIAL which would permit this 
> to be done in one go. This isn’t a super complicated piece of work, and I’d 
> be happy to help review a contribution here. However, in my view we should be 
> reconsidering how quorums are decided more comprehensively. This is very 
> off-topic, but there are other more sensible quorums for multi-region setups 
> (such as quorum-of-quorums), but also there’s a wide range of useful quorums 
> we don’t support, particularly heterogenous ones supporting lower write 
> failure tolerance than read failure tolerance (for instance). Today we 
> support only the most extreme versions of this, and all of our quorums must 
> be mixed manually by clients which is error prone. In my opinion we should be 
> moving towards specifying quorums on a per-table basis for reads and writes, 
> so that clients do not specify their consistency levels. This way the 
> database can configure arbitrary quorums, and also guarantee that these 
> quorums provide the desired consistency.

I agree with your points here. I’d add that the geographical location of DCs 
can be relevant.
Perhaps the CL mechanism could be pluggable (in the same way that authn/z both 
are) so that we can experiment in this area at higher velocity? (I appreciate 
this is an invasive change.)
A colleague and I are considering whether we might be able to look at the 
EACH_QUORUM idea in the shorter term. We will share more if we have the 
bandwidth to undertake the work.
I also agree that CLs defined for tables is a worthy enhancement, I wonder if 
it should be a ‘default CL’ which can additionally be overridden by queries?

In any event I feel I’ve hijacked your thread enough, but thank you again for 
the warm welcome and the interesting discussion!

> On 20 Aug 2021, at 7:04 pm, bened...@apache.org wrote:
>
> Hello and welcome!
>
> So this is a really complicated topic, unfortunately, but the simple answer 
> is that as currently formulated this work won’t address this 

Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-25 Thread bened...@apache.org
I’ll move this to a vote in a day or so, assuming no further discussion.

From: Jeff Jirsa 
Date: Monday, 23 August 2021 at 06:46
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP 14: Paxos Improvements


> On Aug 22, 2021, at 7:25 PM, Miles Garnsey  wrote:
>
> 
>>
>> The problem is that today there’s no way to reliably exclude the new DC from 
>> serving reads, that I know of? If you can, then yes you would only need to 
>> ensure repair were run prior to activating reads from this DC.
>
> We think we have a way to do this using certain settings in the Java driver.
>
> Agree on your other points!

I don’t see how

Your best chance is with snitch games

And those don’t guarantee correctness if a single replica GC pauses and forces 
a speculative retry


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [VOTE] CEP-11: Pluggable memtable implementations

2021-08-25 Thread bened...@apache.org
FYI, as formulated in the project governance document (though, as ever, clarity 
could be improved) for CEP votes from committers have equal weight to those 
from the PMC.



From: Branimir Lambov 
Date: Tuesday, 24 August 2021 at 12:25
To: dev@cassandra.apache.org 
Subject: Re: [VOTE] CEP-11: Pluggable memtable implementations
Vote passes with 7 binding and 4 non-binding +1 votes and no vetoes.

Thank you all. JIRA ticket will be opened soon.

Regards,
Branimir

On Fri, Aug 20, 2021 at 10:41 AM Sam Tunnicliffe  wrote:

> +1
>
> > On 19 Aug 2021, at 17:10, Branimir Lambov  wrote:
> >
> > Hello everyone,
> >
> > I am proposing the CEP-11 (Pluggable memtable implementations) for
> adoption
> >
> > Discussion thread:
> >
> https://lists.apache.org/thread.html/rb5e950f882196764744c31bc3c13dfbf0603cb9f8bc2f6cfb976d285%40%3Cdev.cassandra.apache.org%3E
> >
> >
> > The vote will be open for 72 hours.
> > Votes by PMC members are considered binding.
> > A vote passes if there are at least three binding +1s and no binding
> vetoes.
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


[VOTE] CEP-14: Paxos Improvements

2021-08-27 Thread bened...@apache.org
Hi everyone, I’m proposing this CEP for approval.

Proposal: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
Discussion: 
https://lists.apache.org/thread.html/r1af3da2d875ef93479e3874072ee651f406b96c915759c7968d3266e%40%3Cdev.cassandra.apache.org%3E

The vote will be open for 72 hours.
Votes by committers are considered binding.
A vote passes if there are at least three binding +1s and no binding vetoes.



Re: [VOTE] CEP-14: Paxos Improvements

2021-08-31 Thread bened...@apache.org
With 10 +1 votes and no -1 votes, the vote passes. Thanks everyone!

From: Caleb Rackliffe 
Date: Wednesday, 1 September 2021 at 04:45
To: dev@cassandra.apache.org 
Subject: Re: [VOTE] CEP-14: Paxos Improvements
+1

On Mon, Aug 30, 2021 at 5:12 AM Sam Tunnicliffe  wrote:

> +1
>
> > On 27 Aug 2021, at 20:48, bened...@apache.org wrote:
> >
> > Hi everyone, I’m proposing this CEP for approval.
> >
> > Proposal:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
> > Discussion:
> https://lists.apache.org/thread.html/r1af3da2d875ef93479e3874072ee651f406b96c915759c7968d3266e%40%3Cdev.cassandra.apache.org%3E
> >
> > The vote will be open for 72 hours.
> > Votes by committers are considered binding.
> > A vote passes if there are at least three binding +1s and no binding
> vetoes.
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [DISCUSS] CASSANDRA-15234

2021-09-02 Thread bened...@apache.org
Thanks for bringing this to the list Ekaterina!

It’s worth noting that the two don’t have to be in conflict: we could offer two 
template yaml with the parameters grouped differently, for users to decide for 
themselves.

The proposals primarily define parameter names differently, with my proposal 
going by kind->place, and the other proposal maintaining (mostly) the existing 
name form (which is a bit more like place->kind). While the example yaml groups 
by kind, you can convert nested definitions into a ‘dot’ form (e.g. 
limits.concurrency.reads) for use in a different grouping.

One advantage of grouping parameters together is that it aids maintaining 
coherency of naming between systems, and also potentially permits a more 
succinct config file and better discovery. But it’s far from a silver bullet, 
as value judgements have to be made about where the grouping lines are. I’m 
sure anything we settle on will be a huge improvement over the status quo, 
however.




From: Ekaterina Dimitrova 
Date: Thursday, 2 September 2021 at 16:32
To: dev@cassandra.apache.org 
Subject: [DISCUSS] CASSANDRA-15234
Hi team,

I would like to bring to the attention of the community CASSANDRA-15234,
standardise config and JVM parameters.

This is work we discussed back in Summer 2020 just before our first 4.0
Beta release. During the discussion we figured out that there is more than
one option to do the job and not enough time to get user feedback and
finish it so this was delayed post-4.0 And here I am, bringing it back to
the table.

This work’s goal is:

   -

   To standardize naming - that we did by agreeing to the form noun_verb
   -

   Provision of values with units while maintaining backward compatibility.


Those two parts are more or less already done.

More interesting is the third part - reorganizing the cassandra.yaml file.

My personal approach was to split it into sections, done here

.

Another proposal is done by Benedict; grouping the config parameters.

To make it clearer, he created a yaml

with comments mostly stripped.

In his version, there are basic settings for network, disk etc all grouped
together, followed by operator tuneables mostly under limits within which
we now have throughput, concurrency, capacity. This leads to settings for
some features being kept separate (most notably for caching), but helps the
operator understand what they have to play with for controlling resource
consumption.

I am interested to hear what people think about the two options or if
anyone has another idea to share, open discussion.

Thank you,

Ekaterina


Re: [DISCUSS] CASSANDRA-15234

2021-09-03 Thread bened...@apache.org
> I think as the comments were stripped only for the POC. I guess many of them 
> will get back
in the actual doc version unfortunately.

Well, I think the grouped format lends itself to much briefer comments, with 
groups of related parameters getting an overall description. Even as a 
developer who understands most of the toggles I found the old file very hard to 
navigate.

I also don’t see why we cannot have both heavily commented versions and 
uncommented (or lightly commented) versions.

I don’t personally see why multiple different config templates would be 
confusing if they’re in a suitably labelled directory, even if we settle on one 
for the default. It might even be nice to have a pared-down config that has 
only those properties we expect the normal user to need, so it’s particularly 
easy to navigate.


From: Ekaterina Dimitrova 
Date: Friday, 3 September 2021 at 14:40
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CASSANDRA-15234
> >
> > It’s worth noting that the two don’t have to be in >conflict: we could
> offer two template yaml with the parameters grouped differently, for users
> to decide for themselves.

Sure, my only concern is that three versions of the yaml could bring
confusion (we will have backward compatibility to the current one for some
time). But it might be only me. I am open for feedback


> If we can document this, it would be great as stuff >like “enabled” are
> inconsistent so not sure if I did it properly =D
>
Well, this is for now only in the ticket in the first version but no one
raised any concern. We will definitely have to update our docs on this and
whatever else we came to agreement on - both for users and contributors.

>though I will agree that it can be hard for some >tools (such
> as bash templating), but feel we can always find a >common ground
Valid point and I believe it is one of the reasons we delayed the ticket,
in order to get feedback on that. I am really interested to hear what
concerns people might have.


>Opening up a 1500+ line .yaml file is very daunting, >even if most of it is
>comments. Can't blame folks for being >overwhelmed at the prospect of
tuning
>Cassandra w/that as our operator config API. :)
I am all in for simplification and to make our users’ lives easier. But at
this point we shouldn’t be comparing the length of the files I think as the
comments were stripped only for the POC. I guess many of them will get back
in the actual doc version unfortunately.

Thank you all,
Ekaterina

On Thu, 2 Sep 2021 at 20:07, Joshua McKenzie  wrote:

> Reading through the two, the grouping approach seems like it's a lot more
> friendly to newcomers as well as providing context specific cues for
> relationships between params you're editing. Showing and not telling, if
> you will.
>
> Opening up a 1500+ line .yaml file is very daunting, even if most of it is
> comments. Can't blame folks for being overwhelmed at the prospect of tuning
> Cassandra w/that as our operator config API. :)
>
> ~Josh
>
> On Thu, Sep 2, 2021 at 1:48 PM David Capwell 
> wrote:
>
> > Thanks for bringing this back up; Caleb and I were talking about the lack
> > of clarity with regard to CASSANDRA-16896, fleshing this out would make
> > those configs nicer!
> >
> > >   To standardize naming - that we did by agreeing to the form noun_verb
> >
> > If we can document this, it would be great as stuff like “enabled” are
> > inconsistent so not sure if I did it properly =D
> >
> > >
> > >   Provision of values with units while maintaining backward
> > compatibility.
> >
> > +1
> >
> > I really hate local_read_size_threshold_kb; I would love
> > local_read_size_threshold: 10kb.  Once we have the infrastructure in
> place
> > (believe your patch before had these tools) I would love to switch!
> >
> >
> > > Another proposal is done by Benedict; grouping the config parameters.
> >
> > Yep, this is what triggered Caleb and I to talk about this thread!  To
> > group or not to group; that is the question
> >
> > Personally I like grouping from an organization point of view so am in
> > favor of that; though I will agree that it can be hard for some tools
> (such
> > as bash templating), but feel we can always find a common ground
> >
> >
> > > On Sep 2, 2021, at 8:44 AM, bened...@apache.org wrote:
> > >
> > > Thanks for bringing this to the list Ekaterina!
> > >
> > > It’s worth noting that the two don’t have to be in conflict: we could
> > offer two template yaml with the parameters grouped differently, for
> users
> > to decide for themselves.
> > >
> > > The proposals pr

[DISCUSS] CEP-15: General Purpose Transactions

2021-09-05 Thread bened...@apache.org
Wiki: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
Whitepaper: 
https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
Prototype: https://github.com/belliottsmith/accord

Hi everyone, I’d like to propose this CEP for adoption by the community.

Cassandra has benefitted from LWTs for many years, but application developers 
that want to ensure consistency for complex operations must either accept the 
scalability bottleneck of serializing all related state through a single 
partition, or layer a complex state machine on top of the database. These are 
sophisticated and costly activities that our users should not be expected to 
undertake. Since distributed databases are beginning to offer distributed 
transactions with fewer caveats, it is past time for Cassandra to do so as well.

This CEP proposes the use of several novel techniques that build upon research 
(that followed EPaxos) to deliver (non-interactive) general purpose distributed 
transactions. The approach is outlined in the wikipage and in more detail in 
the linked whitepaper. Importantly, by adopting this approach we will be the 
_only_ distributed database to offer global, scalable, strict serializable 
transactions in one wide area round-trip. This would represent a significant 
improvement in the state of the art, both in the academic literature and in 
commercial or open source offerings.

This work has been partially realised in a prototype. This partial prototype 
has been verified against Jepsen.io’s Maelstrom library and dedicated in-tree 
strict serializability verification tools, but much work remains for the work 
to be production capable and integrated into Cassandra.

I propose including the prototype in the project as a new source repository, to 
be developed as a standalone library for integration into Cassandra. I hope the 
community sees the important value proposition of this proposal, and will adopt 
the CEP after this discussion, so that the library and its integration into 
Cassandra can be developed in parallel and with the involvement of the wider 
community.


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-05 Thread bened...@apache.org
Yep, that’s correct. In fact my goal is that we maintain this as a standalone 
library long term. While its primary goal will be integration with Cassandra, I 
think there is value in maintaining a distinct library for the core 
functionality - so long as the burden remains manageable.

From: Nate McCall 
Date: Sunday, 5 September 2021 at 22:30
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Benedict,
If I'm parsing this correctly, you want to include the stand-alone library
in the project as a separate repo to begin with, correct? (I'm +1 on that,
if so).

Otherwise I am very intrigued by the paper and proposal. This looks
excellent. Thanks Benedict, et all for putting this together!

-Nate

On Mon, Sep 6, 2021 at 2:33 AM bened...@apache.org 
wrote:

> Wiki:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> Whitepaper:
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> <
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >
> Prototype: https://github.com/belliottsmith/accord
>
> Hi everyone, I’d like to propose this CEP for adoption by the community.
>
> Cassandra has benefitted from LWTs for many years, but application
> developers that want to ensure consistency for complex operations must
> either accept the scalability bottleneck of serializing all related state
> through a single partition, or layer a complex state machine on top of the
> database. These are sophisticated and costly activities that our users
> should not be expected to undertake. Since distributed databases are
> beginning to offer distributed transactions with fewer caveats, it is past
> time for Cassandra to do so as well.
>
> This CEP proposes the use of several novel techniques that build upon
> research (that followed EPaxos) to deliver (non-interactive) general
> purpose distributed transactions. The approach is outlined in the wikipage
> and in more detail in the linked whitepaper. Importantly, by adopting this
> approach we will be the _only_ distributed database to offer global,
> scalable, strict serializable transactions in one wide area round-trip.
> This would represent a significant improvement in the state of the art,
> both in the academic literature and in commercial or open source offerings.
>
> This work has been partially realised in a prototype. This partial
> prototype has been verified against Jepsen.io’s Maelstrom library and
> dedicated in-tree strict serializability verification tools, but much work
> remains for the work to be production capable and integrated into Cassandra.
>
> I propose including the prototype in the project as a new source
> repository, to be developed as a standalone library for integration into
> Cassandra. I hope the community sees the important value proposition of
> this proposal, and will adopt the CEP after this discussion, so that the
> library and its integration into Cassandra can be developed in parallel and
> with the involvement of the wider community.
>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-06 Thread bened...@apache.org
er

It's probably the case this is obvious, and it's omitted because it's not
required by ACCORD, but I wanted to add here that if in addition to a
deadline you also impose some upper bound for the maximum allowed
timestamp, you will make all our issues with tombstones from the future go
away. (And since you are now creating an ordered commit log, this will also
avoid having to keep tombstones for 10 days, simplify anti-entropy for
failed nodes, etc...)

3.2 Consensus

The algorithm is hard to read since you omit the roles of the participants.
It's as if all of it was executed on the Coordinator.

Is this sentence correct? Probably it is and I'm at the limits of my
understanding... *"Note that any transitive dependency of another γ ∈depsτ
where Committedγ may be pruned from depsτ, as it is durably a transitive
dependency of τ."*



3.4 Safety

Proofs of theorems 3.1 and 3.2 appear to be identical?

End:

Ok so reads were discussed very briefly in 3.3, leaving the reader to guess
quite a lot...

* Are interactive transactions possible? It appears they could be, even if
Algorithm 2 only allows for one pass at reads.
* Do I understand correctly that t0 is essentially both the start and end
time of the transaction? ...and that serializability is provided by the
fact that a later transaction gamma will not even start to execute reads
before earlier transaction tau has committed?
* If interactive transactions are possible, it seems a client can
denial-of-service a row by never committing, keeping locks open forever?

So I guess my question is how and when reads happen?

More precisely... how is it possible that the Consensus protocol is
executed first, and it already knows its dependencies, even if the
Execution protocol - aka reads and writes - are only executed after?

Similarly, how do you expect to apply writes before reads were returned to
the client? Even if you were proposing some Calvin-like single-shot
transaction, it still begs the question what mechanism can consume read
results and based on those impact the writes?


Reading the CEP:

Are the results of the Jepsen testing available too? (Or will be?)


henrik

On Sun, Sep 5, 2021 at 5:33 PM bened...@apache.org 
wrote:

> Wiki:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> Whitepaper:
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> <
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >
> Prototype: https://github.com/belliottsmith/accord
>
> Hi everyone, I’d like to propose this CEP for adoption by the community.
>
> Cassandra has benefitted from LWTs for many years, but application
> developers that want to ensure consistency for complex operations must
> either accept the scalability bottleneck of serializing all related state
> through a single partition, or layer a complex state machine on top of the
> database. These are sophisticated and costly activities that our users
> should not be expected to undertake. Since distributed databases are
> beginning to offer distributed transactions with fewer caveats, it is past
> time for Cassandra to do so as well.
>
> This CEP proposes the use of several novel techniques that build upon
> research (that followed EPaxos) to deliver (non-interactive) general
> purpose distributed transactions. The approach is outlined in the wikipage
> and in more detail in the linked whitepaper. Importantly, by adopting this
> approach we will be the _only_ distributed database to offer global,
> scalable, strict serializable transactions in one wide area round-trip.
> This would represent a significant improvement in the state of the art,
> both in the academic literature and in commercial or open source offerings.
>
> This work has been partially realised in a prototype. This partial
> prototype has been verified against Jepsen.io’s Maelstrom library and
> dedicated in-tree strict serializability verification tools, but much work
> remains for the work to be production capable and integrated into Cassandra.
>
> I propose including the prototype in the project as a new source
> repository, to be developed as a standalone library for integration into
> Cassandra. I hope the community sees the important value proposition of
> this proposal, and will adopt the CEP after this discussion, so that the
> library and its integration into Cassandra can be developed in parallel and
> with the involvement of the wider community.
>


--

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread bened...@apache.org
> Sorry if a few comments were a bit "editorial" in the first message

Not a problem at all – more than happy to talk about suggestions in that vein! 
Just probably best not to subject everyone else to the discussion.

> What I would like to understand better and without guessing is, what do these 
> transactions look like from a client/user point of view?

This is a fair question, and perhaps something I should pinpoint more directly 
for the reader. The CEP does stipulate non-interactive transactions, i.e. those 
that are one-shot. The only other limitation is that the partition keys must be 
known upfront, however I expect we will follow-up soon after with some weaker 
semantics that build on top (probably using optimistic concurrency control) to 
support transactions where only some partition keys are known upfront, so that 
we may support global secondary indexes with proper isolation and consistency.

> whether I should just* think of this as "better and more efficient LWT”

So, the LWT concept is a Cassandra one and doesn’t have an agreed-upon 
definition. My understanding of a core feature/limitation of LWTs is that they 
operate over a single partition, and as a result many operations are impossible 
even in multiple rounds without complex distributed state machines. The core 
improvement here, besides improved performance, is that we will be able to 
operate over any set of keys at-once.

How this facility is evolved into user-facing capabilities is an open-ended 
question. Initially of course we will at least support the same syntax but 
remove the restriction on operating over a single partition. I haven’t thought 
about this much, as the CEP is primarily for enabling works, but I think we 
will want to expand the syntax in two ways:

 1) to support more complex conditions (simple AND conditions across all 
partitions seem likely too restrictive, though they might make sense for the 
single partition case);
  2) to support inserting data from one row into another, potentially with 
transformations being applied (including via UDFs).

These are both relatively manageable improvements that we might want to land in 
the same major release as the transactions themselves. The core facility can be 
expanded quite broadly, though. It would be possible for instance to support 
some interpreted language(s) as part of a query, so that arbitrary work can be 
applied in the transaction.

Or, perhaps the community would rather build atop the feature to support 
interactive transactions at the client. I can’t predict resourcing for this, 
though, and it might be a community effort. I think it would be quite tractable 
once this work lands, however.

> Suppose I wanted to do a long running read-only transaction

So, there’s two sides to this: with and without paging. A long running 
read-only transaction taking a few seconds is quite likely to be fine and we 
will probably support with some MVCC within the transaction system itself. This 
may or may not be part of v1, it’s hard to predict with certainty as this is 
going to be a large undertaking.

But for paged queries we’d be talking about SNAPSHOT isolation. This is likely 
to be something the community wants to support before long anyway and is 
probably not as hard as you might think. It is probably outside of the scope of 
this work, though the two would dovetail very nicely.


From: Henrik Ingo 
Date: Tuesday, 7 September 2021 at 09:24
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Tue, Sep 7, 2021 at 1:31 AM bened...@apache.org 
wrote:

>
> Of course, but we may have to be selective in our back-and-forth. We can
> always take some discussion off-list to keep it manageable.
>
>
I'll try to converge.Sorry if a few comments were a bit "editorial" in the
first message. I find that sometimes it pays off to also ask the dumb
questions, as long as we don't get stuck on any of them.


> > The algorithm is hard to read since you omit the roles of the
> participants.
>
> Thanks. I will consider how I might make it clearer that the portions of
> the algorithm that execute on receipt of messages that may only be received
> by replicas, are indeed executed by those replicas.
>
>
In fact the same algorithm in the CEP was easier to read exactly because of
this, I now realize.


> > So I guess my question is how and when reads happen?
>
> I think this is reasonably well specified in the protocol and, since it’s
> unclear what you’ve found confusing, I don’t know it would be productive to
> try to explain it again here on list. You can look at the prototype, if
> Java is easier for you to parse, as it is of course fully specified there
> with no ambiguity. Or we can discuss off list, or perhaps on the community
> slack channel.
>
>
Maybe my question was a bit too open ended, as I didn't w

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread bened...@apache.org
> I was thinking that a path similar to Calvin/FaunaDB is certainly looming in 
> the horizon at least.

I’m not sure which aspect of these systems you are referring to. Unless I have 
misunderstood, I consider them to be strictly inferior approaches (particularly 
for Cassandra) as they require a _global_ leader process and as a result have 
scalability limits. Users simply shift the sharding problem to the cluster 
level rather than the node level, but the fundamental problem remains. This may 
be acceptable for many users, but was contrary to the goals of this CEP.

> It seems to me at that point long running queries and interactive 
> transactions are mostly the same problem.

I would estimate long running queries to be easier to deliver by at least an 
order of magnitude. They’re not unrelated, but they’re still quite distinct in 
my opinion.

> good job pulling together ingredients from state of the art work in this area

In case this was lost in the noise: this work is not simply an assembly of 
prior work. It introduces entirely novel approaches that permit the work to 
exceed the capabilities of any prior research or production system. It is worth 
properly highlighting that if we deliver this, Cassandra will have the most 
sophisticated transaction system full stop.

There are to my knowledge no databases offering distributed transactions that 
are both strict serializable and have no scalability bottleneck. Every database 
today clearly aims for this combination, but accepts some trade-off: either 
only guaranteeing serializable isolation, requiring special time keeping 
hardware to guarantee strict serializability, or using a global leader process 
(or uses two phase commit, but this is quite niche).



From: Henrik Ingo 
Date: Tuesday, 7 September 2021 at 14:06
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Tue, Sep 7, 2021 at 12:26 PM bened...@apache.org 
wrote:

> > whether I should just* think of this as "better and more efficient LWT”
>
> So, the LWT concept is a Cassandra one and doesn’t have an agreed-upon
> definition. My understanding of a core feature/limitation of LWTs is that
> they operate over a single partition, and as a result many operations are
> impossible even in multiple rounds without complex distributed state
> machines. The core improvement here, besides improved performance, is that
> we will be able to operate over any set of keys at-once.
>
>
My bad, I have never used LWT and forgot / didn't know they were single
partition. The CEP makes more sense now.



> How this facility is evolved into user-facing capabilities is an
> open-ended question. Initially of course we will at least support the same
> syntax but remove the restriction on operating over a single partition. I
> haven’t thought about this much, as the CEP is primarily for enabling
> works, but I think we will want to expand the syntax in two ways:
>
>  1) to support more complex conditions (simple AND conditions across all
> partitions seem likely too restrictive, though they might make sense for
> the single partition case);
>   2) to support inserting data from one row into another, potentially with
> transformations being applied (including via UDFs).
>
> These are both relatively manageable improvements that we might want to
> land in the same major release as the transactions themselves. The core
> facility can be expanded quite broadly, though. It would be possible for
> instance to support some interpreted language(s) as part of a query, so
> that arbitrary work can be applied in the transaction.
>

I was thinking that a path similar to Calvin/FaunaDB is certainly looming
in the horizon at least. I've been following those with interest, because
a) it's refreshingly outside of the box thinking, and b) they seem to be
able to push the limitations of this approach much beyond what one might
imagine when reading about it the first time. But like you also point out,
it remains to be seen whether users actually want those kinds of
transactions. We are creatures of habit for sure.



> Or, perhaps the community would rather build atop the feature to support
> interactive transactions at the client. I can’t predict resourcing for
> this, though, and it might be a community effort. I think it would be quite
> tractable once this work lands, however.
>
> > Suppose I wanted to do a long running read-only transaction
>
> So, there’s two sides to this: with and without paging. A long running
> read-only transaction taking a few seconds is quite likely to be fine and
> we will probably support with some MVCC within the transaction system
> itself. This may or may not be part of v1, it’s hard to predict with
> certainty as this is going to be a large undertaking.
>
> But for paged queries we’d be talki

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread bened...@apache.org
Hi Jake,

> What structural changes are planned to support an external dependency project 
> like this

To add to Blake’s answer, in case there’s some confusion over this, the 
proposal is to include this library within the Apache Cassandra project. So I 
wouldn’t think of it as an external dependency. This PMC and community will 
still have the usual oversight over direction and development, and APIs will be 
developed solely with the intention of their integration with Cassandra.

> Will this effort eventually replace consistency levels in C*?

I hope we’ll have some very related discussions around consistency levels in 
the coming months more generally, but I don’t think that is tightly coupled to 
this work. I agree with you both that we won’t want to perpetuate the problems 
you’ve highlighted though.

Henrik:
> I was referring to the property that Calvin transactions also need to be sent 
> to the cluster in a single shot

Ah, yes. In that case I agree, and I tried to point to this direction in an 
earlier email, where I discussed the use of scripting languages (i.e. 
transactionally modifying the database with some subset of arbitrary 
computation). I think the JVM is particularly suited to offering quite powerful 
distributed transactions in this vein, and it will be interesting to see what 
we might develop in this direction in future.


From: Jake Luciani 
Date: Tuesday, 7 September 2021 at 19:27
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Great thanks for the information

On Tue, Sep 7, 2021 at 12:44 PM Blake Eggleston
 wrote:

> Hi Jake,
>
> > 1.  Will this effort eventually replace consistency levels in C*?  I ask
> > because one of the shortcomings of our paxos today is
> > it can be easily mixed with non serialized consistencies and therefore
> > users commonly break consistency by for example reading at CL.ONE while
> > also
> > using LWTs.
>
> This will likely require CLs to be specified at the schema level for
> tables using multi partition transactions. I’d expect this to be available
> for other tables, but not required.
>
> > 2. What structural changes are planned to support an external dependency
> > project like this?  Are there some high level interfaces you expect the
> > project to adhere to?
>
> There will be some interfaces that need to be implemented in C* to support
> the library. You can find the current interfaces in the accord.api package,
> but these were written to support some initial testing, and not intended
> for integration into C* as is. Things are pretty fluid right now and will
> be rewritten / refactored multiple times over the next few months.
>
> Thanks,
>
> Blake
>
>
> > On Sun, Sep 5, 2021 at 10:33 AM bened...@apache.org  >
> > wrote:
> >
> >> Wiki:
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> >> Whitepaper:
> >>
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> >> <
> >>
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >>>
> >> Prototype: https://github.com/belliottsmith/accord
> >>
> >> Hi everyone, I’d like to propose this CEP for adoption by the community.
> >>
> >> Cassandra has benefitted from LWTs for many years, but application
> >> developers that want to ensure consistency for complex operations must
> >> either accept the scalability bottleneck of serializing all related
> state
> >> through a single partition, or layer a complex state machine on top of
> the
> >> database. These are sophisticated and costly activities that our users
> >> should not be expected to undertake. Since distributed databases are
> >> beginning to offer distributed transactions with fewer caveats, it is
> past
> >> time for Cassandra to do so as well.
> >>
> >> This CEP proposes the use of several novel techniques that build upon
> >> research (that followed EPaxos) to deliver (non-interactive) general
> >> purpose distributed transactions. The approach is outlined in the
> wikipage
> >> and in more detail in the linked whitepaper. Importantly, by adopting
> this
> >> approach we will be the _only_ distributed database to offer global,
> >> scalable, strict serializable transactions in one wide area round-trip.
> >> This would represent a significant improvement in the state of the art,
> >> both in the academic literature and in commercial or open source
> offerings.
> >>
> >> This work has been partially 

Re: [VOTE] CEP-13: Denylisting partitions

2021-09-08 Thread bened...@apache.org
+1

From: Brandon Williams 
Date: Wednesday, 8 September 2021 at 17:57
To: dev@cassandra.apache.org 
Subject: Re: [VOTE] CEP-13: Denylisting partitions
+1

On Wed, Sep 8, 2021 at 11:31 AM Sumanth Pasupuleti
 wrote:
>
> Hi everyone,
>
> I’m proposing this CEP for approval.
>
> Proposal:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-13%3A+Denylisting+partitions
> Discussion:
> https://lists.apache.org/thread.html/r1547c5f2fb8548e2f7dcbe1a26da8c2a95ebec81adeeb2ea0545924d%40%3Cdev.cassandra.apache.org%3E
>
> The vote will be open for 72 hours.
> Votes by committers are considered binding.
> A vote passes if there are at least three binding +1s and no binding vetoes.
>
> Thanks,
> Sumanth

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-14 Thread bened...@apache.org
Has anyone had a chance to read the drafts, and has any feedback or questions? 
Does anybody still anticipate doing so in the near future? Or shall we move to 
a vote?

From: bened...@apache.org 
Date: Tuesday, 7 September 2021 at 21:27
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Jake,

> What structural changes are planned to support an external dependency project 
> like this

To add to Blake’s answer, in case there’s some confusion over this, the 
proposal is to include this library within the Apache Cassandra project. So I 
wouldn’t think of it as an external dependency. This PMC and community will 
still have the usual oversight over direction and development, and APIs will be 
developed solely with the intention of their integration with Cassandra.

> Will this effort eventually replace consistency levels in C*?

I hope we’ll have some very related discussions around consistency levels in 
the coming months more generally, but I don’t think that is tightly coupled to 
this work. I agree with you both that we won’t want to perpetuate the problems 
you’ve highlighted though.

Henrik:
> I was referring to the property that Calvin transactions also need to be sent 
> to the cluster in a single shot

Ah, yes. In that case I agree, and I tried to point to this direction in an 
earlier email, where I discussed the use of scripting languages (i.e. 
transactionally modifying the database with some subset of arbitrary 
computation). I think the JVM is particularly suited to offering quite powerful 
distributed transactions in this vein, and it will be interesting to see what 
we might develop in this direction in future.


From: Jake Luciani 
Date: Tuesday, 7 September 2021 at 19:27
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Great thanks for the information

On Tue, Sep 7, 2021 at 12:44 PM Blake Eggleston
 wrote:

> Hi Jake,
>
> > 1.  Will this effort eventually replace consistency levels in C*?  I ask
> > because one of the shortcomings of our paxos today is
> > it can be easily mixed with non serialized consistencies and therefore
> > users commonly break consistency by for example reading at CL.ONE while
> > also
> > using LWTs.
>
> This will likely require CLs to be specified at the schema level for
> tables using multi partition transactions. I’d expect this to be available
> for other tables, but not required.
>
> > 2. What structural changes are planned to support an external dependency
> > project like this?  Are there some high level interfaces you expect the
> > project to adhere to?
>
> There will be some interfaces that need to be implemented in C* to support
> the library. You can find the current interfaces in the accord.api package,
> but these were written to support some initial testing, and not intended
> for integration into C* as is. Things are pretty fluid right now and will
> be rewritten / refactored multiple times over the next few months.
>
> Thanks,
>
> Blake
>
>
> > On Sun, Sep 5, 2021 at 10:33 AM bened...@apache.org  >
> > wrote:
> >
> >> Wiki:
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> >> Whitepaper:
> >>
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> >> <
> >>
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >>>
> >> Prototype: https://github.com/belliottsmith/accord
> >>
> >> Hi everyone, I’d like to propose this CEP for adoption by the community.
> >>
> >> Cassandra has benefitted from LWTs for many years, but application
> >> developers that want to ensure consistency for complex operations must
> >> either accept the scalability bottleneck of serializing all related
> state
> >> through a single partition, or layer a complex state machine on top of
> the
> >> database. These are sophisticated and costly activities that our users
> >> should not be expected to undertake. Since distributed databases are
> >> beginning to offer distributed transactions with fewer caveats, it is
> past
> >> time for Cassandra to do so as well.
> >>
> >> This CEP proposes the use of several novel techniques that build upon
> >> research (that followed EPaxos) to deliver (non-interactive) general
> >> purpose distributed transactions. The approach is outlined in the
> wikipage
> >> and in more detail in the linked whitepaper. Importantly, by adopting
> this
> >> approach we will be the _only_ distributed database to offer glob

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-14 Thread bened...@apache.org
Hi Paulo,

> First and foremost, I believe this proposal in its current form focuses on 
> the protocol details (HOW?) but lacks the bigger picture on how this is going 
> to be exposed to the user (WHAT)?

In my opinion this CEP embodies a coherent distinct and complex piece of work, 
that requires specialist expertise. You have after all just suggested a month 
to read only the existing proposal 😊

UX is a whole other kind of discussion, that can be quite opinionated, and 
requires different expertise. It is in my opinion helpful to break out work 
that is not tightly coupled, as well as work that requires different expertise. 
As you point out, multi-key UX features are largely independent of any 
underlying implementation, likely can be done in parallel, and even with 
different contributors.

> Can we not start using it as an external dependency

I would love to understand your rationale, as this is a surprising suggestion 
to me. This is just like any other subsystem, but we would be managing it as a 
separate library primarily for modularity reasons. The reality is that this 
option should anyway be considered unavailable. This is a proposed contribution 
to the Cassandra project, which we can either accept or reject.

> Isn't this a good chance to make the serialization protocol pluggable
with clearly defined integration points

It has recently been demonstrated to be possible to build a system that can 
safely switch between different consensus protocols. However, this was very 
sophisticated work that would require its own CEP, one that we would be unable 
to resource. Even if we could this would be insufficient. This goal has never 
been achieved for a multi-shard transaction protocol to my knowledge, and 
multi-shard transaction protocols are much more divergent in implementation 
detail than consensus protocols.

> so we could easily switch implementations with different guarantees… (ie. 
> Apache Ratis)

As far as I know, there are no other strict serializable protocols available to 
plug in today. Apache Ratis appears to be a straightforward Raft 
implementation, and therefore it is a linearizable consensus protocol. It is 
not multi-shard transaction protocol at all, let alone strict serializable. It 
could be used in place of Paxos, but not Accord.



From: Paulo Motta 
Date: Tuesday, 14 September 2021 at 22:55
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I can start with some preliminary comments while I get more familiarized
with the proposal:

- First and foremost, I believe this proposal in its current form focuses
on the protocol details (HOW?) but lacks the bigger picture on how this is
going to be exposed to the user (WHAT)? Is exposing linearizable
transactions to the user not a goal of this proposal? If not, I think the
proposal is missing the UX (ie. what CQL commands are going to be added
etc) on how these transactions are going to be exposed.

- Why do we need to bring the library into the project umbrella? Can we not
start using it as an external dependency, and later re-evaluate if it's
necessary to bring it into the project or even incubate it as another
Apache project? I feel we may be importing unnecessary management overhead
into the project while only a small subset of contributors will be involved
with the core protocol.

- Isn't this a good chance to make the serialization protocol pluggable
with clearly defined integration points, so we could easily switch
implementations with different guarantees, trade-offs and performance
considerations while leaving the UX intact? This would also allow us to
easily benchmark the protocol against alternatives (ie. Apache Ratis) and
validate the performance claims. I think the best way to do that would be
to define what the feature will look like to the end user (UX), define the
integration points necessary to support this feature, and use accord as the
first implementation of these integration points.

Em ter., 14 de set. de 2021 às 17:57, Paulo Motta 
escreveu:

> Given the extensiveness and complexity of the proposal I'd suggest leaving
> it a little longer (perhaps 4 weeks from the publish date?) for people to
> get a bit more familiarized and have the chance to comment before casting a
> vote. I glanced through the proposal - and it looks outstanding, very
> promising work guys! - but would like a bit more time to take a deeper look
> and digest it before potentially commenting on it.
>
> Em ter., 14 de set. de 2021 às 17:30, bened...@apache.org <
> bened...@apache.org> escreveu:
>
>> Has anyone had a chance to read the drafts, and has any feedback or
>> questions? Does anybody still anticipate doing so in the near future? Or
>> shall we move to a vote?
>>
>> From: bened...@apache.org 
>> Date: Tuesday, 7 September 2021 at 21:27
>> To: dev@cassandra.apache.org 
>> Subj

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-15 Thread bened...@apache.org
> perhaps we can prepare these as examples

There are grammatically correct CQL queries today that cannot be executed, that 
this work will naturally remove the restrictions on. I’m certainly happy to 
specify one of these for the CEP if it will help the reader.

I want to exclude “new CQL commands” or any other enhancement to the grammar 
from the scope of the CEP, however. This work will enable a range of 
improvements to the UX, but I think this work is a separate, long-term project 
of evolution that deserves its own CEPs, and will likely involve input from a 
wider range of contributors and users. If nobody else starts such CEPs, I will 
do so in due course (much further down the line).

Assuming there is not significant dissent on this point I will update the CEP 
to reflect this non-goal.



From: C. Scott Andreas 
Date: Wednesday, 15 September 2021 at 00:31
To: dev@cassandra.apache.org 
Cc: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Adding a few notes from my perspective as well –

Re: the UX question, thanks for asking this.

I agree that offering a set of example queries and use cases may help make the 
specific use cases more understandable; perhaps we can prepare these as 
examples to be included in the CEP.

I do think that all potential UX directions begin with the specification of the 
protocol that will underly them, as what can be expressed by it may be a 
superset of what's immediately exposed by CQL. But at minimum it's great to 
have a sense of the queries one might be able to issue to focus a reading of 
the whitepaper.

Re: "Can we not start using it as an external dependency, and later re-evaluate 
if it's necessary to bring it into the project or even incubate it as another 
Apache project"

I think it would be valuable to the project for the work to be incubated in a 
separate repository as part of the Apache Cassandra project itself, much like 
the in-JVM dtest API and Harry. This pattern worked well for those projects as 
they incubated as it allowed them to evolve outside the primary codebase, but 
subject to the same project governance, set of PMC members, committers, and so 
on. Like those libraries, it also makes sense as the Cassandra project is the 
first (and at this time) only known intended consumer of the library, though 
there may be more in the future.

If the proposal is accepted, the time horizon envisioned for this work's 
completion is ~9 months to a standard of production readiness. The contributors 
see value in the work being donated to and governed by the contribution 
practices of the Foundation. Doing so ensures that it is being developed openly 
and with full opportunity for review and contribution of others, while also 
solidifying contribution of the IP to the project.

Spinning up a separate ASF incubation project is an interesting idea, but I 
feel that doing so would introduce a far greater overhead in process and 
governance, and that the most suitable governance and set of committers/PMC 
members are those of the Apache Cassandra project itself.

On Sep 14, 2021, at 3:53 PM, "bened...@apache.org"  wrote:


Hi Paulo,

First and foremost, I believe this proposal in its current form focuses on the 
protocol details (HOW?) but lacks the bigger picture on how this is going to be 
exposed to the user (WHAT)?

In my opinion this CEP embodies a coherent distinct and complex piece of work, 
that requires specialist expertise. You have after all just suggested a month 
to read only the existing proposal 😊

UX is a whole other kind of discussion, that can be quite opinionated, and 
requires different expertise. It is in my opinion helpful to break out work 
that is not tightly coupled, as well as work that requires different expertise. 
As you point out, multi-key UX features are largely independent of any 
underlying implementation, likely can be done in parallel, and even with 
different contributors.

Can we not start using it as an external dependency

I would love to understand your rationale, as this is a surprising suggestion 
to me. This is just like any other subsystem, but we would be managing it as a 
separate library primarily for modularity reasons. The reality is that this 
option should anyway be considered unavailable. This is a proposed contribution 
to the Cassandra project, which we can either accept or reject.

Isn't this a good chance to make the serialization protocol pluggable
with clearly defined integration points

It has recently been demonstrated to be possible to build a system that can 
safely switch between different consensus protocols. However, this was very 
sophisticated work that would require its own CEP, one that we would be unable 
to resource. Even if we could this would be insufficient. This goal has never 
been achieved for a multi-shard transaction protocol to my knowledge, and 
multi-shard transaction protocols are much more di

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-15 Thread bened...@apache.org
Ok, so the act of typing out an example was actually a really good reminder of 
just how limited our functionality is today, even for single partition 
operations.

I don’t want to distract from any discussion around the underlying protocol, 
but we could kick off a separate conversation about how to evolve CQL sooner 
than later if there is the appetite. There are no concrete proposals to 
discuss, it would be brainstorming.

Do people also generally agree this work warrants a distinct CEP, or would 
people prefer to see this developed under the same umbrella?



From: bened...@apache.org 
Date: Wednesday, 15 September 2021 at 09:19
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> perhaps we can prepare these as examples

There are grammatically correct CQL queries today that cannot be executed, that 
this work will naturally remove the restrictions on. I’m certainly happy to 
specify one of these for the CEP if it will help the reader.

I want to exclude “new CQL commands” or any other enhancement to the grammar 
from the scope of the CEP, however. This work will enable a range of 
improvements to the UX, but I think this work is a separate, long-term project 
of evolution that deserves its own CEPs, and will likely involve input from a 
wider range of contributors and users. If nobody else starts such CEPs, I will 
do so in due course (much further down the line).

Assuming there is not significant dissent on this point I will update the CEP 
to reflect this non-goal.



From: C. Scott Andreas 
Date: Wednesday, 15 September 2021 at 00:31
To: dev@cassandra.apache.org 
Cc: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Adding a few notes from my perspective as well –

Re: the UX question, thanks for asking this.

I agree that offering a set of example queries and use cases may help make the 
specific use cases more understandable; perhaps we can prepare these as 
examples to be included in the CEP.

I do think that all potential UX directions begin with the specification of the 
protocol that will underly them, as what can be expressed by it may be a 
superset of what's immediately exposed by CQL. But at minimum it's great to 
have a sense of the queries one might be able to issue to focus a reading of 
the whitepaper.

Re: "Can we not start using it as an external dependency, and later re-evaluate 
if it's necessary to bring it into the project or even incubate it as another 
Apache project"

I think it would be valuable to the project for the work to be incubated in a 
separate repository as part of the Apache Cassandra project itself, much like 
the in-JVM dtest API and Harry. This pattern worked well for those projects as 
they incubated as it allowed them to evolve outside the primary codebase, but 
subject to the same project governance, set of PMC members, committers, and so 
on. Like those libraries, it also makes sense as the Cassandra project is the 
first (and at this time) only known intended consumer of the library, though 
there may be more in the future.

If the proposal is accepted, the time horizon envisioned for this work's 
completion is ~9 months to a standard of production readiness. The contributors 
see value in the work being donated to and governed by the contribution 
practices of the Foundation. Doing so ensures that it is being developed openly 
and with full opportunity for review and contribution of others, while also 
solidifying contribution of the IP to the project.

Spinning up a separate ASF incubation project is an interesting idea, but I 
feel that doing so would introduce a far greater overhead in process and 
governance, and that the most suitable governance and set of committers/PMC 
members are those of the Apache Cassandra project itself.

On Sep 14, 2021, at 3:53 PM, "bened...@apache.org"  wrote:


Hi Paulo,

First and foremost, I believe this proposal in its current form focuses on the 
protocol details (HOW?) but lacks the bigger picture on how this is going to be 
exposed to the user (WHAT)?

In my opinion this CEP embodies a coherent distinct and complex piece of work, 
that requires specialist expertise. You have after all just suggested a month 
to read only the existing proposal 😊

UX is a whole other kind of discussion, that can be quite opinionated, and 
requires different expertise. It is in my opinion helpful to break out work 
that is not tightly coupled, as well as work that requires different expertise. 
As you point out, multi-key UX features are largely independent of any 
underlying implementation, likely can be done in parallel, and even with 
different contributors.

Can we not start using it as an external dependency

I would love to understand your rationale, as this is a surprising suggestion 
to me. This is just like any other subsystem, but we would be managing it as a 
separate library primarily for modularit

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-15 Thread bened...@apache.org
> I would kind of expect this work, if it pans out, to _replace_ the current 
> paxos implementation

That’s a good point. I think the clear direction of travel would be total 
replacement of Paxos, but I anticipate that this will be feature-flagged at 
least initially. So for some period of time we may maintain both options, with 
the advanced CQL functionality disabled if you opt for classic Paxos.

I think this is a necessary corollary of a requirement to support live upgrades 
– something that is non-negotiable IMO, but that I have also neglected to 
discuss in the CEP. I will rectify this. An open question is if we want to 
support live downgrades back to Classic Paxos. I kind of expect that we will, 
though that will no doubt be informed by the difficulty of doing so.

Either way, this means the deprecation cycle for Classic Paxos is probably a 
separate and future decision for the community. We could choose to maintain it 
indefinitely, but I would vote to retire it the following major version.

A related open question is defaults – I would probably vote for new clusters to 
default to Accord, and existing clusters to need to run a migration command 
after fully upgrading the cluster.

From: Sylvain Lebresne 
Date: Wednesday, 15 September 2021 at 14:13
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Fwiw, it makes sense to me to talk about CQL syntax evolution separately.

It's pretty clear to me that we _can_ extend CQL to make sure of a general
purpose transaction mechanism, so I don't think deciding if we want a
general purpose transaction mechanism has to depend on deciding on the
syntax. Especially since the syntax question can get pretty far on its own
and could be a serious upfront distraction.

And as you said, there are even queries that can be expressed with the
current syntax that we refuse now and would be able to accept with this, so
those could be "ground zero" of what this work would allow.

But outside of pure syntax questions, one thing that I don't see discussed
in the CEP (or did I miss it) is what the relationship of this new
mechanism with the existing paxos implementation would be? I would kind of
expect this work, if it pans out, to _replace_ the current paxos
implementation (because 1) why not and 2) the idea of having 2
serialization mechanisms that serialize separately sounds like a nightmare
from the user POV) but it isn't stated clearly. If replacement is indeed
the intent, then I think there needs to be a plan for the upgrade path. If
that's not the intent, then what?
--
Sylvain


On Wed, Sep 15, 2021 at 12:09 PM bened...@apache.org 
wrote:

> Ok, so the act of typing out an example was actually a really good
> reminder of just how limited our functionality is today, even for single
> partition operations.
>
> I don’t want to distract from any discussion around the underlying
> protocol, but we could kick off a separate conversation about how to evolve
> CQL sooner than later if there is the appetite. There are no concrete
> proposals to discuss, it would be brainstorming.
>
> Do people also generally agree this work warrants a distinct CEP, or would
> people prefer to see this developed under the same umbrella?
>
>
>
> From: bened...@apache.org 
> Date: Wednesday, 15 September 2021 at 09:19
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > perhaps we can prepare these as examples
>
> There are grammatically correct CQL queries today that cannot be executed,
> that this work will naturally remove the restrictions on. I’m certainly
> happy to specify one of these for the CEP if it will help the reader.
>
> I want to exclude “new CQL commands” or any other enhancement to the
> grammar from the scope of the CEP, however. This work will enable a range
> of improvements to the UX, but I think this work is a separate, long-term
> project of evolution that deserves its own CEPs, and will likely involve
> input from a wider range of contributors and users. If nobody else starts
> such CEPs, I will do so in due course (much further down the line).
>
> Assuming there is not significant dissent on this point I will update the
> CEP to reflect this non-goal.
>
>
>
> From: C. Scott Andreas 
> Date: Wednesday, 15 September 2021 at 00:31
> To: dev@cassandra.apache.org 
> Cc: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Adding a few notes from my perspective as well –
>
> Re: the UX question, thanks for asking this.
>
> I agree that offering a set of example queries and use cases may help make
> the specific use cases more understandable; perhaps we can prepare these as
> examples to be included in the CEP.
>
> I do think that all potential UX directions beg

Re: [DISCUSS] CASSANDRA-16922 CEP-10: Major Prerequisites (Phase 1)

2021-09-17 Thread bened...@apache.org
It’s worth clarifying that CEP-10 has been broken up into phases, and this will 
be a roll-up branch for only the first portion.

I think we should be cautious about how we approach the idea of feature 
branches, as there is significant overhead for everyone as branches grow - the 
CEP-10 and CEP-14 work has had significant additional overhead introduced by 
this. There are also additional risks introduced during frequent or long term 
rebases, as they are hard to review.

I think the idea is good, but ideally if feature development is expected to 
span more than a single quarter it would be best to target phased incorporation 
into mainline, and not defer everything to the final moment. I think it also 
helps focus review, testing, documentation etc. to have manageable chunks of 
work merged long before any perceived deadline.


From: Jeremiah D Jordan 
Date: Friday, 17 September 2021 at 20:50
To: Cassandra DEV 
Subject: Re: [DISCUSS] CASSANDRA-16922 CEP-10: Major Prerequisites (Phase 1)
> As these progress through review, the aim is to roll them up into a single 
> branch and merge that to trunk together, keeping the separate commits for the 
> specific JIRAs.

I think this is a great idea.  Where do you see the “Roll Up Branch” living?  
Does the project want to start keeping long lived feature branches in the 
apache/cassandra repository?  Or should the roll up branch still be kept in a 
fork?

Caleb expressed interest in following this development model for SAI as well, 
and I think it makes sense for all of the larger CEPs to develop them in longer 
lived feature branches to be merged into trunk once they are complete.

-Jeremiah

> On Sep 17, 2021, at 1:52 PM, Sam Tunnicliffe  wrote:
>
> This umbrella issue covers the major structural refactorings to enable the 
> higher level pieces of CEP-10. The current proposal is to post separate 
> patches for each JIRA to lessen the review burden as much as possible. 
> However, the patches are incremental, so there is a dependency from one to 
> the next. As these progress through review, the aim is to roll them up into a 
> single branch and merge that to trunk together, keeping the separate commits 
> for the specific JIRAs.
>
> These patches are not intended to introduce any significant new behaviour, 
> they're largely just introducing new abstractions to enable pieces of the 
> system to be swapped out when running simulations.These patches are 
> foundational to the CEP-10 work and so getting them landed is something of a 
> priority. They have been produced collaboratively by several committers, but 
> obviously further review and feedback is strongly encouraged. That said, 
> allocating requisite time and resources to such large and complex changesets 
> can be challenging, so we have a balance to strike.
>
> Whilst the 2 committer review requirement can technically be satisfied 
> already, it's reasonable to give fair warning and opportunity to contribute 
> before we start moving this forward. Notwithstanding that, there are some 
> failing tests still to address, mostly due to changes made in trunk since 
> this work was started and subsequently encountered during rebase.
>
> Thanks,
> Sam
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-20 Thread bened...@apache.org
ll prior 
> conflicting transactions, t0 is accepted by a replica. If a fast path quorum 
> of responses accept, the transaction is agreed to execute at t0. Replicas 
> respond with the set of transactions they have witnessed that may execute 
> with a lower timestamp, i.e. those with a lower t0.

What is t0 here? I’m guessing it is the Lamport clock time of the most recent 
mutation to the partition? May be worth clarifying because otherwise the 
perception may be that it is the commencement time of the current transaction 
which may not be the intention.

Regarding the use of logical clocks in general -

Do we have one clock-per-shard-per-node? Or is there a single clock for all 
transactions on a node?
What happens in network partitions?
In a cross-shard transaction does maintaining simple majorities of replicas 
protect you from potential inconsistencies arising when a transaction W10 
addressing partitions p1, p2 comes from a different majority (potentially 
isolated due to a network partition) from earlier writes W[1,9] to p1 only?
It seems that this may cause a sudden change to the dependancy graph for 
partition p2 which may render it vulnerable to strange effects?
Do we consider adversarial cases or any sort of byzantine faults? (That’s a bit 
out of left field, feel free to kick me.)
Why do we prefer Lamport clocks to vector clocks or other types of logical 
clock?

Slow Path

> This value is proposed to at least a simple majority of nodes, along with the 
> union of the dependenciesreceived


Related to the earlier point: when we say `union` here - what set are we 
forming a union over? Is it a union of all dependancies t_n < t as seen by all 
coordinators? I presume that the logic precludes the possibility that these 
dependancies will conflict, since all foregoing transactions which are in 
progress as dependancies must be non-conflicting with earlier transactions in 
the dependancy graph?

In any case, further information about how the dependancy graph is computed 
would be interesting.

> The inclusion of dependencies in the proposal is solely to facilitate 
> Recovery of other transactions that may be incomplete - these are stored on 
> each replica to facilitate decisions at recovery.


Every replica? Or only those participating in the transaction?

> If C fails to reach fast path consensus it takes the highest t it witnessed 
> from its responses, which constitutes a simple Lamport clock value imposing a 
> valid total order. This value is proposed to at least a simple majority of 
> nodes,


When speaking about the simple majority of nodes to whom the max(t) value 
returned will be proposed to -
It sounds like this need not be the same majority from whom the original sets 
of T_n and dependancies was obtained?
Is there a proof to show that the dependancies created from the union of the 
first set of replicas resolves to an acceptable dependancy graph for an 
arbitrary majority of replicas? (Especially given that a majority of replicas 
is not a majority of nodes, given we are in a cross-shard scenario here).
What happens in cases where the replica set has changed due to (a) scaling RF 
in a single DC (b) adding a whole new DC?
Wikipedia <https://en.wikipedia.org/wiki/Lamport_timestamp> tells me that 
Lamport clocks only impose partial, not total order. I’m guessing we’re 
thinking of a different type of logical clock when we speak of Lamport clocks 
here (but my expertise is sketchy on this topic).

Recovery

I would be interested in further exploration of the unhappy path (where 'a 
newer ballot has been issued by a recovery coordinator to take over the 
transaction’). I understand that this may be partially covered in the 
pseudocode for `Recovery` but I’m struggling to reconcile the ’new ballot has 
been issued’ language with the ‘any R in responses had X as Applied, Committed, 
or Accepted’ language.

Well done again and thank you for pushing the envelope in this area Benedict.

Miles

> On 15 Sep 2021, at 11:33 pm, bened...@apache.org wrote:
>
>> I would kind of expect this work, if it pans out, to _replace_ the current 
>> paxos implementation
>
> That’s a good point. I think the clear direction of travel would be total 
> replacement of Paxos, but I anticipate that this will be feature-flagged at 
> least initially. So for some period of time we may maintain both options, 
> with the advanced CQL functionality disabled if you opt for classic Paxos.
>
> I think this is a necessary corollary of a requirement to support live 
> upgrades – something that is non-negotiable IMO, but that I have also 
> neglected to discuss in the CEP. I will rectify this. An open question is if 
> we want to support live downgrades back to Classic Paxos. I kind of expect 
> that we will, though that will no doubt be informed by the difficulty of 
> doing so.
>
> Either way, this means the deprecation

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-20 Thread bened...@apache.org
rrent transaction 
> which may not be the intention.
>
> Regarding the use of logical clocks in general -
>
> Do we have one clock-per-shard-per-node? Or is there a single clock for all 
> transactions on a node?
> What happens in network partitions?
> In a cross-shard transaction does maintaining simple majorities of replicas 
> protect you from potential inconsistencies arising when a transaction W10 
> addressing partitions p1, p2 comes from a different majority (potentially 
> isolated due to a network partition) from earlier writes W[1,9] to p1 only?
> It seems that this may cause a sudden change to the dependancy graph for 
> partition p2 which may render it vulnerable to strange effects?
> Do we consider adversarial cases or any sort of byzantine faults? (That’s a 
> bit out of left field, feel free to kick me.)
> Why do we prefer Lamport clocks to vector clocks or other types of logical 
> clock?
>
> Slow Path
>
> > This value is proposed to at least a simple majority of nodes, along with 
> > the union of the dependenciesreceived
>
>
> Related to the earlier point: when we say `union` here - what set are we 
> forming a union over? Is it a union of all dependancies t_n < t as seen by 
> all coordinators? I presume that the logic precludes the possibility that 
> these dependancies will conflict, since all foregoing transactions which are 
> in progress as dependancies must be non-conflicting with earlier transactions 
> in the dependancy graph?
>
> In any case, further information about how the dependancy graph is computed 
> would be interesting.
>
> > The inclusion of dependencies in the proposal is solely to facilitate 
> > Recovery of other transactions that may be incomplete - these are stored on 
> > each replica to facilitate decisions at recovery.
>
>
> Every replica? Or only those participating in the transaction?
>
> > If C fails to reach fast path consensus it takes the highest t it witnessed 
> > from its responses, which constitutes a simple Lamport clock value imposing 
> > a valid total order. This value is proposed to at least a simple majority 
> > of nodes,
>
>
> When speaking about the simple majority of nodes to whom the max(t) value 
> returned will be proposed to -
> It sounds like this need not be the same majority from whom the original sets 
> of T_n and dependancies was obtained?
> Is there a proof to show that the dependancies created from the union of the 
> first set of replicas resolves to an acceptable dependancy graph for an 
> arbitrary majority of replicas? (Especially given that a majority of replicas 
> is not a majority of nodes, given we are in a cross-shard scenario here).
> What happens in cases where the replica set has changed due to (a) scaling RF 
> in a single DC (b) adding a whole new DC?
> Wikipedia <https://en.wikipedia.org/wiki/Lamport_timestamp> tells me that 
> Lamport clocks only impose partial, not total order. I’m guessing we’re 
> thinking of a different type of logical clock when we speak of Lamport clocks 
> here (but my expertise is sketchy on this topic).
>
> Recovery
>
> I would be interested in further exploration of the unhappy path (where 'a 
> newer ballot has been issued by a recovery coordinator to take over the 
> transaction’). I understand that this may be partially covered in the 
> pseudocode for `Recovery` but I’m struggling to reconcile the ’new ballot has 
> been issued’ language with the ‘any R in responses had X as Applied, 
> Committed, or Accepted’ language.
>
> Well done again and thank you for pushing the envelope in this area Benedict.
>
> Miles
>
> > On 15 Sep 2021, at 11:33 pm, bened...@apache.org wrote:
> >
> >> I would kind of expect this work, if it pans out, to _replace_ the current 
> >> paxos implementation
> >
> > That’s a good point. I think the clear direction of travel would be total 
> > replacement of Paxos, but I anticipate that this will be feature-flagged at 
> > least initially. So for some period of time we may maintain both options, 
> > with the advanced CQL functionality disabled if you opt for classic Paxos.
> >
> > I think this is a necessary corollary of a requirement to support live 
> > upgrades – something that is non-negotiable IMO, but that I have also 
> > neglected to discuss in the CEP. I will rectify this. An open question is 
> > if we want to support live downgrades back to Classic Paxos. I kind of 
> > expect that we will, though that will no doubt be informed by the 
> > difficulty of doing so.
> >
> > Either way, this means the deprecation cycle for Classic Paxos is probably 
> > a separate and future 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread bened...@apache.org
this is effectively 2M reads and 2M
writes as we normally measure them in C*.

Calvin supports mixed read/write transactions, but because the transaction
execution logic requires knowing all partition keys in advance to ensure
that all replicas can reproduce the same results with no coordination,
reads against non-PK predicates must be done ahead of time (transparently,
by the server) to determine the set of keys, and this must be retried if
the set of rows affected is updated before the actual transaction executes.

Batching and global consensus adds latency -- 100ms in the Calvin paper and
apparently about 50ms in FaunaDB.  Glass half full: all transactions
(including multi-partition updates) are equally performant in Calvin since
the coordination is handled up front in the sequencing step.  Glass half
empty: even single-row reads and writes have to pay the full coordination
cost.  Fauna has optimized this away for reads but I am not aware of a
description of how they changed the design to allow this.

Functionality and limitations: since the entire transaction must be known
in advance to allow coordination-less execution at the replicas, Calvin
cannot support interactive transactions at all.  FaunaDB mitigates this by
allowing server-side logic to be included, but a Calvin approach will never
be able to offer SQL compatibility.

Guarantees: Calvin transactions are strictly serializable.  There is no
additional complexity or performance hit to generalizing to multiple
regions, apart from the speed of light.  And since Calvin is already paying
a batching latency penalty, this is less painful than for other systems.

Application to Cassandra: B-.  Distributed transactions are handled by the
sequencing and scheduling layers, which are leaderless, and Calvin’s
requirements for the storage layer are easily met by C*.  But Calvin also
requires a global consensus protocol and LWT is almost certainly not
sufficiently performant, so this would require ZK or etcd (reasonable for a
library approach but not for replacing LWT in C* itself), or an
implementation of Accord.  I don’t believe Calvin would require additional
table-level metadata in Cassandra.

On Sun, Sep 5, 2021 at 9:33 AM bened...@apache.org 
wrote:

> Wiki:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> Whitepaper:
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> <
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >
> Prototype: https://github.com/belliottsmith/accord
>
> Hi everyone, I’d like to propose this CEP for adoption by the community.
>
> Cassandra has benefitted from LWTs for many years, but application
> developers that want to ensure consistency for complex operations must
> either accept the scalability bottleneck of serializing all related state
> through a single partition, or layer a complex state machine on top of the
> database. These are sophisticated and costly activities that our users
> should not be expected to undertake. Since distributed databases are
> beginning to offer distributed transactions with fewer caveats, it is past
> time for Cassandra to do so as well.
>
> This CEP proposes the use of several novel techniques that build upon
> research (that followed EPaxos) to deliver (non-interactive) general
> purpose distributed transactions. The approach is outlined in the wikipage
> and in more detail in the linked whitepaper. Importantly, by adopting this
> approach we will be the _only_ distributed database to offer global,
> scalable, strict serializable transactions in one wide area round-trip.
> This would represent a significant improvement in the state of the art,
> both in the academic literature and in commercial or open source offerings.
>
> This work has been partially realised in a prototype. This partial
> prototype has been verified against Jepsen.io’s Maelstrom library and
> dedicated in-tree strict serializability verification tools, but much work
> remains for the work to be production capable and integrated into Cassandra.
>
> I propose including the prototype in the project as a new source
> repository, to be developed as a standalone library for integration into
> Cassandra. I hope the community sees the important value proposition of
> this proposal, and will adopt the CEP after this discussion, so that the
> library and its integration into Cassandra can be developed in parallel and
> with the involvement of the wider community.
>


--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread bened...@apache.org
Demonstrating how subtle, complex and difficult to pin-down this topic is, 
Fauna’s recent blog post implies they may have migrated to a leaderless 
sequencing protocol (an earlier blog post made clear they used a leader 
process). However, Calvin still assumes a global sequencing shard, so this only 
modifies latency for clients, i.e. goal (3). Whether they have also removed 
Calvin’s single-shard linearization of transactions is unclear; there is no 
public information to suggest that they have met goal (1). With this the 
protocol would in essence begin to look a lot like Accord, and perhaps they are 
moving towards a similar approach.


From: bened...@apache.org 
Date: Wednesday, 22 September 2021 at 03:52
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Jonathan,

These other systems are incompatible with the goals of the CEP. I do discuss 
them (besides 2PC) in both the whitepaper and the CEP, and will summarise that 
discussion below. A true and accurate comparison of these other systems is 
essentially intractable, as there are complex subtleties to each flavour, and 
those who are interested would be better served by performing their own 
research.

I think it is more productive to focus on what we want to achieve as a 
community. If you believe the goals of this CEP are wrong for the project, 
let’s focus on that. If you want to compare and contrast specific facets of 
alternative systems that you consider to be preferable in some dimension, let’s 
do that here or in a Q&A as proposed by Joey.

The relevant goals are that we:


  1.  Guarantee strict serializable isolation on commodity hardware
  2.  Scale to any cluster size
  3.  Achieve optimal latency

The approach taken by Spanner derivatives is rejected by (1) because they 
guarantee only Serializable isolation (they additionally fail (3)). From 
watching talks by YugaByte, and inferring from Cockroach’s panic-cluster-death 
under clock skew, this is clearly considered by everyone to be undesirable but 
necessary to achieve scalability.

The approach taken by FaunaDB (Calvin) is rejected by (2) because its 
sequencing layer requires a global leader process for the cluster, which is 
incompatible with Cassandra’s scalability requirements. It additionally fails 
(3) for global clients.

Two phase commit fails (3). As an aside, AFAICT DynamoDB is today a Spanner 
clone for its multi-key transaction functionality, not 2PC.

Systems such as RAMP with even weaker isolation are not considered for the 
simple reason that they do not even claim to meet (1).

If we want to additionally offer weaker isolation levels than Serializable, 
such as that provided by the recent RAMP-TAO paper, Cassandra is likely able to 
support multiple distinct transaction layers that operate independently. I 
would encourage you to file a CEP to explore how we can meet these distinct use 
cases, but I consider them to be niche. I expect that a majority of our user 
base desire strict serializable isolation, and certainly no less than 
serializable isolation, to augment the existing weaker isolation offered by 
quorum reads and writes.

I would tangentially note that we are not an AP database under normal 
recommended operation. A minority in any network partition cannot reach QUORUM, 
so under recommended usage we are a high-availability leaderless CP database.


From: Jonathan Ellis 
Date: Tuesday, 21 September 2021 at 23:45
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Benedict, thanks for taking the lead in putting this together. Since
Cassandra is the only relevant database today designed around a leaderless
architecture, it's quite likely that we'll be better served with a custom
transaction design instead of trying to retrofit one from CP systems.

The whitepaper here is a good description of the consensus algorithm itself
as well as its robustness and stability characteristics, and its comparison
with other state-of-the-art consensus algorithms is very useful.  In the
context of Cassandra, where a consensus algorithm is only part of what will
be implemented, I'd like to see a more complete evaluation of the
transactional side of things as well, including performance characteristics
as well as the types of transactions that can be supported and at least a
general idea of what it would look like applied to Cassandra. This will
allow the PMC to make a more informed decision about what tradeoffs are
best for the entire long-term project of first supplementing and ultimately
replacing LWT.

(Allowing users to mix LWT and AP Cassandra operations against the same
rows was probably a mistake, so in contrast with LWT we’re not looking for
something fast enough for occasional use but rather something within a
reasonable factor of AP operations, appropriate to being the only way to
interact with tables declared as such.)

Besides Accord, this should cover

- Calvin and FaunaDB
- A Spa

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread bened...@apache.org
Oh, finally, to address your question about how Fauna achieves low-cost reads: 
they default to serializable isolation only. They no doubt ensure the 
transaction log is replicated in order, so that any read from the DC-local 
transaction log is serializable. Accord will similarly be able to offer cheap 
serializable reads, and additionally is able to offer strict serializable reads 
without performing any write during consensus (nod to Alex Miller for pointing 
out this advantage over Calvin)

From: bened...@apache.org 
Date: Wednesday, 22 September 2021 at 04:19
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Demonstrating how subtle, complex and difficult to pin-down this topic is, 
Fauna’s recent blog post implies they may have migrated to a leaderless 
sequencing protocol (an earlier blog post made clear they used a leader 
process). However, Calvin still assumes a global sequencing shard, so this only 
modifies latency for clients, i.e. goal (3). Whether they have also removed 
Calvin’s single-shard linearization of transactions is unclear; there is no 
public information to suggest that they have met goal (1). With this the 
protocol would in essence begin to look a lot like Accord, and perhaps they are 
moving towards a similar approach.


From: bened...@apache.org 
Date: Wednesday, 22 September 2021 at 03:52
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Jonathan,

These other systems are incompatible with the goals of the CEP. I do discuss 
them (besides 2PC) in both the whitepaper and the CEP, and will summarise that 
discussion below. A true and accurate comparison of these other systems is 
essentially intractable, as there are complex subtleties to each flavour, and 
those who are interested would be better served by performing their own 
research.

I think it is more productive to focus on what we want to achieve as a 
community. If you believe the goals of this CEP are wrong for the project, 
let’s focus on that. If you want to compare and contrast specific facets of 
alternative systems that you consider to be preferable in some dimension, let’s 
do that here or in a Q&A as proposed by Joey.

The relevant goals are that we:


  1.  Guarantee strict serializable isolation on commodity hardware
  2.  Scale to any cluster size
  3.  Achieve optimal latency

The approach taken by Spanner derivatives is rejected by (1) because they 
guarantee only Serializable isolation (they additionally fail (3)). From 
watching talks by YugaByte, and inferring from Cockroach’s panic-cluster-death 
under clock skew, this is clearly considered by everyone to be undesirable but 
necessary to achieve scalability.

The approach taken by FaunaDB (Calvin) is rejected by (2) because its 
sequencing layer requires a global leader process for the cluster, which is 
incompatible with Cassandra’s scalability requirements. It additionally fails 
(3) for global clients.

Two phase commit fails (3). As an aside, AFAICT DynamoDB is today a Spanner 
clone for its multi-key transaction functionality, not 2PC.

Systems such as RAMP with even weaker isolation are not considered for the 
simple reason that they do not even claim to meet (1).

If we want to additionally offer weaker isolation levels than Serializable, 
such as that provided by the recent RAMP-TAO paper, Cassandra is likely able to 
support multiple distinct transaction layers that operate independently. I 
would encourage you to file a CEP to explore how we can meet these distinct use 
cases, but I consider them to be niche. I expect that a majority of our user 
base desire strict serializable isolation, and certainly no less than 
serializable isolation, to augment the existing weaker isolation offered by 
quorum reads and writes.

I would tangentially note that we are not an AP database under normal 
recommended operation. A minority in any network partition cannot reach QUORUM, 
so under recommended usage we are a high-availability leaderless CP database.


From: Jonathan Ellis 
Date: Tuesday, 21 September 2021 at 23:45
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Benedict, thanks for taking the lead in putting this together. Since
Cassandra is the only relevant database today designed around a leaderless
architecture, it's quite likely that we'll be better served with a custom
transaction design instead of trying to retrofit one from CP systems.

The whitepaper here is a good description of the consensus algorithm itself
as well as its robustness and stability characteristics, and its comparison
with other state-of-the-art consensus algorithms is very useful.  In the
context of Cassandra, where a consensus algorithm is only part of what will
be implemented, I'd like to see a more complete evaluation of the
transactional side of things as well, including performance characteristics
as well as the types of transactions that can 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread bened...@apache.org
Could you explain why you believe this trade-off is necessary? We can support 
full SQL just fine with Accord, and I hope that we eventually do so.

This domain is incredibly complex, so it is easy to reach wrong conclusions. I 
would invite you again to propose a system for discussion that you think offers 
something Accord is unable to, and that you consider desirable, and we can work 
from there.

To pre-empt some possible discussions, I am not aware of anything we cannot do 
with Accord that we could do with either Calvin or Spanner. Interactive 
transactions are possible on top of Accord, as are transactions with an unknown 
read/write set. In each case the only cost is that they would use optimistic 
concurrency control, which is no worse the spanner derivatives anyway (which I 
have to assume is your benchmark in this regard). I do not expect to deliver 
either functionality initially, but Accord takes us most of the way there for 
both.


From: Jonathan Ellis 
Date: Wednesday, 22 September 2021 at 05:36
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Right, I'm looking for exactly a discussion on the high level goals.
Instead of saying "here's the goals and we ruled out X because Y" we should
start with a discussion around, "Approach A allows X and W, approach B
allows Y and Z" and decide together what the goals should be and and what
we are willing to trade to get those goals, e.g., are we willing to give up
global strict serializability to get the ability to support full SQL.  Both
of these are nice to have!

On Tue, Sep 21, 2021 at 9:52 PM bened...@apache.org 
wrote:

> Hi Jonathan,
>
> These other systems are incompatible with the goals of the CEP. I do
> discuss them (besides 2PC) in both the whitepaper and the CEP, and will
> summarise that discussion below. A true and accurate comparison of these
> other systems is essentially intractable, as there are complex subtleties
> to each flavour, and those who are interested would be better served by
> performing their own research.
>
> I think it is more productive to focus on what we want to achieve as a
> community. If you believe the goals of this CEP are wrong for the project,
> let’s focus on that. If you want to compare and contrast specific facets of
> alternative systems that you consider to be preferable in some dimension,
> let’s do that here or in a Q&A as proposed by Joey.
>
> The relevant goals are that we:
>
>
>   1.  Guarantee strict serializable isolation on commodity hardware
>   2.  Scale to any cluster size
>   3.  Achieve optimal latency
>
> The approach taken by Spanner derivatives is rejected by (1) because they
> guarantee only Serializable isolation (they additionally fail (3)). From
> watching talks by YugaByte, and inferring from Cockroach’s
> panic-cluster-death under clock skew, this is clearly considered by
> everyone to be undesirable but necessary to achieve scalability.
>
> The approach taken by FaunaDB (Calvin) is rejected by (2) because its
> sequencing layer requires a global leader process for the cluster, which is
> incompatible with Cassandra’s scalability requirements. It additionally
> fails (3) for global clients.
>
> Two phase commit fails (3). As an aside, AFAICT DynamoDB is today a
> Spanner clone for its multi-key transaction functionality, not 2PC.
>
> Systems such as RAMP with even weaker isolation are not considered for the
> simple reason that they do not even claim to meet (1).
>
> If we want to additionally offer weaker isolation levels than
> Serializable, such as that provided by the recent RAMP-TAO paper, Cassandra
> is likely able to support multiple distinct transaction layers that operate
> independently. I would encourage you to file a CEP to explore how we can
> meet these distinct use cases, but I consider them to be niche. I expect
> that a majority of our user base desire strict serializable isolation, and
> certainly no less than serializable isolation, to augment the existing
> weaker isolation offered by quorum reads and writes.
>
> I would tangentially note that we are not an AP database under normal
> recommended operation. A minority in any network partition cannot reach
> QUORUM, so under recommended usage we are a high-availability leaderless CP
> database.
>
>
> From: Jonathan Ellis 
> Date: Tuesday, 21 September 2021 at 23:45
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Benedict, thanks for taking the lead in putting this together. Since
> Cassandra is the only relevant database today designed around a leaderless
> architecture, it's quite likely that we'll be better served with a custom
> transaction design instead of trying to retrofit one from CP systems.
>
> The whitepaper here is a

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-22 Thread bened...@apache.org
Sure, that works for me.

From: Patrick McFadin 
Date: Wednesday, 22 September 2021 at 04:47
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I would be happy to host a Zoom as I've done in the past. I can post a
transcript and the recording after the call.

Instead of right after your talk Benedict, maybe we can set a time for next
week and let everyone know the time?

Patrick

On Mon, Sep 20, 2021 at 11:05 AM bened...@apache.org 
wrote:

> Hi Joey,
>
> Thanks for the feedback and suggestions.
>
> > I was wondering what do you think about having some extended Q&A after
> your ApacheCon talk Wednesday
>
> I would love to do this. I’ll have to figure out how though – my
> understanding is that I have a hard 40m for my talk and any Q&A, and I
> expect the talk to occupy most of those 40m as I try to cover both the
> CEP-14 and CEP-15. I’m not sure what facilities are made available by
> Hopin, but if necessary we can perhaps post some external video chat link?
>
> The time of day is also a question, as I think the last talk ends at
> 9:20pm local time. But we can make that work if necessary.
>
> > It might help to have a diagram (perhaps I can collaborate with you
> on this?)
>
> I absolutely agree. This is something I had planned to produce but it’s
> been a question of time. In part I wanted to ensure we published long in
> advance of ApacheCon, but now also with CEP-10, CEP-14 and CEP-15 in flight
> it’s hard to get back to improving the draft. If you’d be interested in
> collaborating on this that would be super appreciated, as this would
> certainly help the reader.
>
> >I think that WAN is always paid during the Consensus Protocol, and then
> in most cases execution can remain LAN except in 3+ datacenters where I
> think you'd have to include at least one replica in a neighboring
> datacenter…
>
> As designed the only WAN cost is consensus as Accord ensures every replica
> receives a complete copy of every transaction, and is aware of any gaps. If
> there are gaps there may be WAN delays as those are filled in. This might
> occur because of network outages, but is most likely to occur when
> transactions are being actively executed by multiple DCs at once – in which
> case there’ll be one further unidirectional WAN latency during execution
> while the earlier transaction disseminates its result to the later
> transaction(s). There are other similar scenario we can discuss, e.g. if a
> transaction takes the slow path and will execute after a transaction being
> executed in another DC, that remote transaction needs to receive this
> notification before executing.
>
> There might potentially be some interesting optimisations to make in
> future, where with many queued transactions a single DC may nominate itself
> to execute all outstanding queries and respond to the remote DCs that
> issued them so as to eliminate the WAN latency for disseminating the result
> of each transaction. But we’re getting way ahead of ourselves there 😊
>
> There’s also no LAN cost on write, at least for responding to the client.
> If there is a dependent transaction within the same DC then (as in the
> above case) there will be a LAN penalty for the second transaction to
> execute.
>
> > Relatedly I'm curious if there is any way that the client can
> acquire the timestamp used by the transaction before sending the data
> so we can make the operations idempotent and unrelated to the
> coordinator that was executing them as the storage nodes are
> vulnerable to disk and heap failure modes which makes them much more
> likely to enter grey failure (slow). Alternatively, perhaps it would
> make sense to introduce a set of optional dedicated C* nodes for
> reaching consensus that do not act as storage nodes so we don't have
> to worry about hanging coordinators (join_ring=false?)?
>
> So, in principle coordination can be performed by any node on the network
> including a client – though we’d need to issue the client a unique id this
> can be done cheaply on joining. This might be something to explore in
> future, though there are downsides to having more coordinators too (more
> likely to fail, and stall further transactions that depend on transactions
> it is coordinating).
>
> However, with respect to idempotency, I expect Accord not to perpetuate
> the problems of LWTs where the result of an earlier query is unknown. At
> least success/fail will be maintained in a distributed fashion for some
> reasonable time horizon, and there will also be protection against zombie
> transactions (those proposed to a node that went into a failure spiral
> before reaching healthy nodes, that somehow regurgitates it hours or days
&g

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-22 Thread bened...@apache.org
FWIW I retract this – looking again at the blog post I don’t see adequate 
reason to infer they are using a leaderless approach. On balance I expect Fauna 
is still using a stable leader. Do you have reason to believe they are now 
leaderless?

From: bened...@apache.org 
Date: Wednesday, 22 September 2021 at 04:19
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Demonstrating how subtle, complex and difficult to pin-down this topic is, 
Fauna’s recent blog post implies they may have migrated to a leaderless 
sequencing protocol (an earlier blog post made clear they used a leader 
process). However, Calvin still assumes a global sequencing shard, so this only 
modifies latency for clients, i.e. goal (3). Whether they have also removed 
Calvin’s single-shard linearization of transactions is unclear; there is no 
public information to suggest that they have met goal (1). With this the 
protocol would in essence begin to look a lot like Accord, and perhaps they are 
moving towards a similar approach.


From: bened...@apache.org 
Date: Wednesday, 22 September 2021 at 03:52
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Jonathan,

These other systems are incompatible with the goals of the CEP. I do discuss 
them (besides 2PC) in both the whitepaper and the CEP, and will summarise that 
discussion below. A true and accurate comparison of these other systems is 
essentially intractable, as there are complex subtleties to each flavour, and 
those who are interested would be better served by performing their own 
research.

I think it is more productive to focus on what we want to achieve as a 
community. If you believe the goals of this CEP are wrong for the project, 
let’s focus on that. If you want to compare and contrast specific facets of 
alternative systems that you consider to be preferable in some dimension, let’s 
do that here or in a Q&A as proposed by Joey.

The relevant goals are that we:


  1.  Guarantee strict serializable isolation on commodity hardware
  2.  Scale to any cluster size
  3.  Achieve optimal latency

The approach taken by Spanner derivatives is rejected by (1) because they 
guarantee only Serializable isolation (they additionally fail (3)). From 
watching talks by YugaByte, and inferring from Cockroach’s panic-cluster-death 
under clock skew, this is clearly considered by everyone to be undesirable but 
necessary to achieve scalability.

The approach taken by FaunaDB (Calvin) is rejected by (2) because its 
sequencing layer requires a global leader process for the cluster, which is 
incompatible with Cassandra’s scalability requirements. It additionally fails 
(3) for global clients.

Two phase commit fails (3). As an aside, AFAICT DynamoDB is today a Spanner 
clone for its multi-key transaction functionality, not 2PC.

Systems such as RAMP with even weaker isolation are not considered for the 
simple reason that they do not even claim to meet (1).

If we want to additionally offer weaker isolation levels than Serializable, 
such as that provided by the recent RAMP-TAO paper, Cassandra is likely able to 
support multiple distinct transaction layers that operate independently. I 
would encourage you to file a CEP to explore how we can meet these distinct use 
cases, but I consider them to be niche. I expect that a majority of our user 
base desire strict serializable isolation, and certainly no less than 
serializable isolation, to augment the existing weaker isolation offered by 
quorum reads and writes.

I would tangentially note that we are not an AP database under normal 
recommended operation. A minority in any network partition cannot reach QUORUM, 
so under recommended usage we are a high-availability leaderless CP database.


From: Jonathan Ellis 
Date: Tuesday, 21 September 2021 at 23:45
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Benedict, thanks for taking the lead in putting this together. Since
Cassandra is the only relevant database today designed around a leaderless
architecture, it's quite likely that we'll be better served with a custom
transaction design instead of trying to retrofit one from CP systems.

The whitepaper here is a good description of the consensus algorithm itself
as well as its robustness and stability characteristics, and its comparison
with other state-of-the-art consensus algorithms is very useful.  In the
context of Cassandra, where a consensus algorithm is only part of what will
be implemented, I'd like to see a more complete evaluation of the
transactional side of things as well, including performance characteristics
as well as the types of transactions that can be supported and at least a
general idea of what it would look like applied to Cassandra. This will
allow the PMC to make a more informed decision about what tradeoffs are
best for the entire long-term project of first supplementing and ultimately
replac

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-22 Thread bened...@apache.org
No, I would expect to deliver strict serializable interactive transactions 
using Accord. These would simply corroborate that the participating keys had 
not modified their write timestamps during the final transaction. These could 
even be undertaken with still only a single wide area round-trip, using local 
copies of the data to assemble the transaction (though this would marginally 
increase the chance of aborts)

My goal for MVCC is parallelism, not additional isolation levels (though 
snapshot isolation is useful and we’ll probably also want to offer that 
eventually)

From: Henrik Ingo 
Date: Wednesday, 22 September 2021 at 15:15
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Wed, Sep 22, 2021 at 7:56 AM bened...@apache.org 
wrote:

> Could you explain why you believe this trade-off is necessary? We can
> support full SQL just fine with Accord, and I hope that we eventually do so.
>

I assume this is really referring to interactive transactions = multiple
round trips to the client within a transaction.

You mentioned previously we could later build a more MVCC like transaction
semantic on top of Accord. (Independent reads from a single snapshot,
followed by a commit using Accord.) In this case I think the relevant
discussion is whether Accord is still the optimal building block
performance wise to do so, or whether users would then have lower
consistency level but still pay the performance cost of a stricter
consistency level.

henrik
--

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-22 Thread bened...@apache.org
Hi everyone,

Joey has helpfully arranged a call for tomorrow at 8am PST / 10am CST / 4pm BST 
to discuss Accord and other things in the community. There are no plans to make 
any kind of project decisions. Everyone is welcome to drop in to discuss Accord 
or whatever else might be on your mind.

https://gather.town/app/2UKSboSjqKXIXliE/ac2021-cass-social


From: bened...@apache.org 
Date: Wednesday, 22 September 2021 at 16:22
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
No, I would expect to deliver strict serializable interactive transactions 
using Accord. These would simply corroborate that the participating keys had 
not modified their write timestamps during the final transaction. These could 
even be undertaken with still only a single wide area round-trip, using local 
copies of the data to assemble the transaction (though this would marginally 
increase the chance of aborts)

My goal for MVCC is parallelism, not additional isolation levels (though 
snapshot isolation is useful and we’ll probably also want to offer that 
eventually)

From: Henrik Ingo 
Date: Wednesday, 22 September 2021 at 15:15
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Wed, Sep 22, 2021 at 7:56 AM bened...@apache.org 
wrote:

> Could you explain why you believe this trade-off is necessary? We can
> support full SQL just fine with Accord, and I hope that we eventually do so.
>

I assume this is really referring to interactive transactions = multiple
round trips to the client within a transaction.

You mentioned previously we could later build a more MVCC like transaction
semantic on top of Accord. (Independent reads from a single snapshot,
followed by a commit using Accord.) In this case I think the relevant
discussion is whether Accord is still the optimal building block
performance wise to do so, or whether users would then have lower
consistency level but still pay the performance cost of a stricter
consistency level.

henrik
--

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-24 Thread bened...@apache.org
I’m not aware of anybody having taken any notes, but somebody please chime in 
if I’m wrong.

>From my recollection, re Accord:


  *   Q: Will batches now support rollbacks?
 *   Batches would apply atomically or not, but unlikely to have a concept 
of rollback. Timeouts remain unknown, but hope to have some mechanism to 
provide clients a definitive answer about such transactions after the fact.
  *   Q: Can stale replicas participate in transactions?
 *   Accord applies conflicting transactions in-order at every replica, so 
only nodes that are up-to-date may participate in the execution of a 
transaction, but any replica may participate in agreeing a transaction. To 
ensure replicas remain up-to-date I anticipate introducing a real-time repair 
facility at the transactional message level, with peers reconciling recently 
processed messages and cross-delivering any that are missing.
  *   Possible UX directions in very vague terms: CQL atomic and conditional 
batches initially; going forwards interactive transactions? Complex user 
defined functions? SQL?
  *   Discussed possibility of LOCAL_QUORUM reads for globally replicated 
transactional tables, as this is an important use case
 *   Simple stale reads to transactional tables
 *   Brainstormed a bit about serializable reads to a single DC without 
(normally) crossing WAN
 *   Discussed possibility of multiple ACKs providing separate LAN and WAN 
persistence notifications to clients
  *   Discussed size of fast path quorums in Accord, and how this might affect 
global latency in high RF clusters (i.e. not optimal, and in some cases may 
need every DC to participate) and how this can be modified by biasing fast path 
electorate so that 2 of the 3 DCs may reach fast-path decisions with each other 
(remaining DC having to reach both those DCs to reach fast path). Also 
discussed Calvin-like modes of operation that would offer optimal global 
latency for sufficiently small clusters at RF=3 or RF=5.

I’m sure there were other discussions I can’t remember, perhaps others can fill 
in the blanks.


From: Jonathan Ellis 
Date: Friday, 24 September 2021 at 20:28
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Does anyone have notes for those of us who couldn't make the call?

On Wed, Sep 22, 2021 at 1:35 PM bened...@apache.org 
wrote:

> Hi everyone,
>
> Joey has helpfully arranged a call for tomorrow at 8am PST / 10am CST /
> 4pm BST to discuss Accord and other things in the community. There are no
> plans to make any kind of project decisions. Everyone is welcome to drop in
> to discuss Accord or whatever else might be on your mind.
>
> https://gather.town/app/2UKSboSjqKXIXliE/ac2021-cass-social
>
>
> From: bened...@apache.org 
> Date: Wednesday, 22 September 2021 at 16:22
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> No, I would expect to deliver strict serializable interactive transactions
> using Accord. These would simply corroborate that the participating keys
> had not modified their write timestamps during the final transaction. These
> could even be undertaken with still only a single wide area round-trip,
> using local copies of the data to assemble the transaction (though this
> would marginally increase the chance of aborts)
>
> My goal for MVCC is parallelism, not additional isolation levels (though
> snapshot isolation is useful and we’ll probably also want to offer that
> eventually)
>
> From: Henrik Ingo 
> Date: Wednesday, 22 September 2021 at 15:15
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> On Wed, Sep 22, 2021 at 7:56 AM bened...@apache.org 
> wrote:
>
> > Could you explain why you believe this trade-off is necessary? We can
> > support full SQL just fine with Accord, and I hope that we eventually do
> so.
> >
>
> I assume this is really referring to interactive transactions = multiple
> round trips to the client within a transaction.
>
> You mentioned previously we could later build a more MVCC like transaction
> semantic on top of Accord. (Independent reads from a single snapshot,
> followed by a commit using Accord.) In this case I think the relevant
> discussion is whether Accord is still the optimal building block
> performance wise to do so, or whether users would then have lower
> consistency level but still pay the performance cost of a stricter
> consistency level.
>
> henrik
> --
>
> Henrik Ingo
>
> +358 40 569 7354 <358405697354>
>
> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
> Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSM

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-27 Thread bened...@apache.org
Ok, it’s time for the weekly poking of the hornet’s nest.

Any more thoughts, questions or criticisms, anyone?

From: bened...@apache.org 
Date: Friday, 24 September 2021 at 22:41
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I’m not aware of anybody having taken any notes, but somebody please chime in 
if I’m wrong.

>From my recollection, re Accord:


  *   Q: Will batches now support rollbacks?
 *   Batches would apply atomically or not, but unlikely to have a concept 
of rollback. Timeouts remain unknown, but hope to have some mechanism to 
provide clients a definitive answer about such transactions after the fact.
  *   Q: Can stale replicas participate in transactions?
 *   Accord applies conflicting transactions in-order at every replica, so 
only nodes that are up-to-date may participate in the execution of a 
transaction, but any replica may participate in agreeing a transaction. To 
ensure replicas remain up-to-date I anticipate introducing a real-time repair 
facility at the transactional message level, with peers reconciling recently 
processed messages and cross-delivering any that are missing.
  *   Possible UX directions in very vague terms: CQL atomic and conditional 
batches initially; going forwards interactive transactions? Complex user 
defined functions? SQL?
  *   Discussed possibility of LOCAL_QUORUM reads for globally replicated 
transactional tables, as this is an important use case
 *   Simple stale reads to transactional tables
 *   Brainstormed a bit about serializable reads to a single DC without 
(normally) crossing WAN
 *   Discussed possibility of multiple ACKs providing separate LAN and WAN 
persistence notifications to clients
  *   Discussed size of fast path quorums in Accord, and how this might affect 
global latency in high RF clusters (i.e. not optimal, and in some cases may 
need every DC to participate) and how this can be modified by biasing fast path 
electorate so that 2 of the 3 DCs may reach fast-path decisions with each other 
(remaining DC having to reach both those DCs to reach fast path). Also 
discussed Calvin-like modes of operation that would offer optimal global 
latency for sufficiently small clusters at RF=3 or RF=5.

I’m sure there were other discussions I can’t remember, perhaps others can fill 
in the blanks.


From: Jonathan Ellis 
Date: Friday, 24 September 2021 at 20:28
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Does anyone have notes for those of us who couldn't make the call?

On Wed, Sep 22, 2021 at 1:35 PM bened...@apache.org 
wrote:

> Hi everyone,
>
> Joey has helpfully arranged a call for tomorrow at 8am PST / 10am CST /
> 4pm BST to discuss Accord and other things in the community. There are no
> plans to make any kind of project decisions. Everyone is welcome to drop in
> to discuss Accord or whatever else might be on your mind.
>
> https://gather.town/app/2UKSboSjqKXIXliE/ac2021-cass-social
>
>
> From: bened...@apache.org 
> Date: Wednesday, 22 September 2021 at 16:22
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> No, I would expect to deliver strict serializable interactive transactions
> using Accord. These would simply corroborate that the participating keys
> had not modified their write timestamps during the final transaction. These
> could even be undertaken with still only a single wide area round-trip,
> using local copies of the data to assemble the transaction (though this
> would marginally increase the chance of aborts)
>
> My goal for MVCC is parallelism, not additional isolation levels (though
> snapshot isolation is useful and we’ll probably also want to offer that
> eventually)
>
> From: Henrik Ingo 
> Date: Wednesday, 22 September 2021 at 15:15
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> On Wed, Sep 22, 2021 at 7:56 AM bened...@apache.org 
> wrote:
>
> > Could you explain why you believe this trade-off is necessary? We can
> > support full SQL just fine with Accord, and I hope that we eventually do
> so.
> >
>
> I assume this is really referring to interactive transactions = multiple
> round trips to the client within a transaction.
>
> You mentioned previously we could later build a more MVCC like transaction
> semantic on top of Accord. (Independent reads from a single snapshot,
> followed by a commit using Accord.) In this case I think the relevant
> discussion is whether Accord is still the optimal building block
> performance wise to do so, or whether users would then have lower
> consistency level but still pay the performance cost of a stricter
> consistency level.
>
> henrik
> --
>
> Henrik Ingo
>
> +358 40 569 7354 <358405697354>
>
> 

Re: [DISCUSS] CASSANDRA-16922 CEP-10: Major Prerequisites (Phase 1)

2021-09-28 Thread bened...@apache.org
Hi everyone,

Just wanted to send out a final call before I start merging Phase 1 of CEP-10. 
If somebody is keen to get involved pipe up here - more than happy to defer 
commit in order to collaborate or discuss the improvements further. Otherwise I 
plan to start committing Phase 1 of CEP-10 this week, starting with 16923 and 
16924, moving on to 16925 and 16926 next week.

Note that patches for the later phases are being posted now as well, with the 
Simulator itself to follow this week.


From: Joshua McKenzie 
Date: Friday, 17 September 2021 at 22:08
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CASSANDRA-16922 CEP-10: Major Prerequisites (Phase 1)
>
> ideally if feature development is expected to span more than a single
> quarter it would be best to target phased incorporation into mainline

Strong +1 here. Ariel was right. :)



On Fri, Sep 17, 2021 at 4:47 PM Ekaterina Dimitrova 
wrote:

> Gmail cut what I wrote.
>
> The way I read the excerpt from Benedict’s email below - feature branch
> merged on per-phase basis to keep it incremental and easier for
> maintenance. Sounds reasonable to me.
>
> On Fri, 17 Sep 2021 at 16:44, Ekaterina Dimitrova 
> wrote:
>
> > I think the idea is good, but ideally if feature development is expected
> > to span more than a single quarter it would be best to target phased
> > incorporation into mainline, and not defer everything to the final
> moment.
> > I think it also helps focus review, testing, documentation etc. to have
> > manageable chunks of work merged long before any perceived deadline.
> >
> > The way I read this - feature split into few phases. Feature branch that
> > is merged at the end of a phase.
> > Sounds reasonable to me. Incremental work is always preferable, easier to
> > maintain.
> >
> >
> > On Fri, 17 Sep 2021 at 16:40, bened...@apache.org 
> > wrote:
> >
> >> It’s worth clarifying that CEP-10 has been broken up into phases, and
> >> this will be a roll-up branch for only the first portion.
> >>
> >> I think we should be cautious about how we approach the idea of feature
> >> branches, as there is significant overhead for everyone as branches
> grow -
> >> the CEP-10 and CEP-14 work has had significant additional overhead
> >> introduced by this. There are also additional risks introduced during
> >> frequent or long term rebases, as they are hard to review.
> >>
> >> I think the idea is good, but ideally if feature development is expected
> >> to span more than a single quarter it would be best to target phased
> >> incorporation into mainline, and not defer everything to the final
> moment.
> >> I think it also helps focus review, testing, documentation etc. to have
> >> manageable chunks of work merged long before any perceived deadline.
> >>
> >>
> >> From: Jeremiah D Jordan 
> >> Date: Friday, 17 September 2021 at 20:50
> >> To: Cassandra DEV 
> >> Subject: Re: [DISCUSS] CASSANDRA-16922 CEP-10: Major Prerequisites
> (Phase
> >> 1)
> >> > As these progress through review, the aim is to roll them up into a
> >> single branch and merge that to trunk together, keeping the separate
> >> commits for the specific JIRAs.
> >>
> >> I think this is a great idea.  Where do you see the “Roll Up Branch”
> >> living?  Does the project want to start keeping long lived feature
> branches
> >> in the apache/cassandra repository?  Or should the roll up branch still
> be
> >> kept in a fork?
> >>
> >> Caleb expressed interest in following this development model for SAI as
> >> well, and I think it makes sense for all of the larger CEPs to develop
> them
> >> in longer lived feature branches to be merged into trunk once they are
> >> complete.
> >>
> >> -Jeremiah
> >>
> >> > On Sep 17, 2021, at 1:52 PM, Sam Tunnicliffe  wrote:
> >> >
> >> > This umbrella issue covers the major structural refactorings to enable
> >> the higher level pieces of CEP-10. The current proposal is to post
> separate
> >> patches for each JIRA to lessen the review burden as much as possible.
> >> However, the patches are incremental, so there is a dependency from one
> to
> >> the next. As these progress through review, the aim is to roll them up
> into
> >> a single branch and merge that to trunk together, keeping the separate
> >> commits for the specific JIRAs.
> >> >
> >> > These patches are not intended to introduce any significant new
> >> behaviour, they&#

Re: [DISCUSS] CASSANDRA-16922 CEP-10: Major Prerequisites (Phase 1)

2021-09-28 Thread bened...@apache.org
Hi Stefan,

Thanks for asking, that’s very considerate. I’ll cope with rebasing these 
patches as necessary, that’s just one of the joys of being an OSS maintainer. 
Feel free to commit CEP-9 and any other work as and when it’s ready.

But yes, the pain of wrangling a dozen patches is why I would prefer to merge 
sooner than later, if there’s no particular reason to delay.


From: Stefan Miklosovic 
Date: Tuesday, 28 September 2021 at 13:52
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CASSANDRA-16922 CEP-10: Major Prerequisites (Phase 1)
Hi Benedict,

do I need to somehow accommodate this initiative in such a way that I
will postpone what I have (like CEP-9 is pretty close to merging) so
you have it easier to merge that? It seems to me these patches are
quite big and I am just thinking how to make it easier for you -
without having to constantly check if something hasn't changed in the
meanwhile.

Regards

On Tue, 28 Sept 2021 at 14:06, bened...@apache.org  wrote:
>
> Hi everyone,
>
> Just wanted to send out a final call before I start merging Phase 1 of 
> CEP-10. If somebody is keen to get involved pipe up here - more than happy to 
> defer commit in order to collaborate or discuss the improvements further. 
> Otherwise I plan to start committing Phase 1 of CEP-10 this week, starting 
> with 16923 and 16924, moving on to 16925 and 16926 next week.
>
> Note that patches for the later phases are being posted now as well, with the 
> Simulator itself to follow this week.
>
>
> From: Joshua McKenzie 
> Date: Friday, 17 September 2021 at 22:08
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CASSANDRA-16922 CEP-10: Major Prerequisites (Phase 1)
> >
> > ideally if feature development is expected to span more than a single
> > quarter it would be best to target phased incorporation into mainline
>
> Strong +1 here. Ariel was right. :)
>
>
>
> On Fri, Sep 17, 2021 at 4:47 PM Ekaterina Dimitrova 
> wrote:
>
> > Gmail cut what I wrote.
> >
> > The way I read the excerpt from Benedict’s email below - feature branch
> > merged on per-phase basis to keep it incremental and easier for
> > maintenance. Sounds reasonable to me.
> >
> > On Fri, 17 Sep 2021 at 16:44, Ekaterina Dimitrova 
> > wrote:
> >
> > > I think the idea is good, but ideally if feature development is expected
> > > to span more than a single quarter it would be best to target phased
> > > incorporation into mainline, and not defer everything to the final
> > moment.
> > > I think it also helps focus review, testing, documentation etc. to have
> > > manageable chunks of work merged long before any perceived deadline.
> > >
> > > The way I read this - feature split into few phases. Feature branch that
> > > is merged at the end of a phase.
> > > Sounds reasonable to me. Incremental work is always preferable, easier to
> > > maintain.
> > >
> > >
> > > On Fri, 17 Sep 2021 at 16:40, bened...@apache.org 
> > > wrote:
> > >
> > >> It’s worth clarifying that CEP-10 has been broken up into phases, and
> > >> this will be a roll-up branch for only the first portion.
> > >>
> > >> I think we should be cautious about how we approach the idea of feature
> > >> branches, as there is significant overhead for everyone as branches
> > grow -
> > >> the CEP-10 and CEP-14 work has had significant additional overhead
> > >> introduced by this. There are also additional risks introduced during
> > >> frequent or long term rebases, as they are hard to review.
> > >>
> > >> I think the idea is good, but ideally if feature development is expected
> > >> to span more than a single quarter it would be best to target phased
> > >> incorporation into mainline, and not defer everything to the final
> > moment.
> > >> I think it also helps focus review, testing, documentation etc. to have
> > >> manageable chunks of work merged long before any perceived deadline.
> > >>
> > >>
> > >> From: Jeremiah D Jordan 
> > >> Date: Friday, 17 September 2021 at 20:50
> > >> To: Cassandra DEV 
> > >> Subject: Re: [DISCUSS] CASSANDRA-16922 CEP-10: Major Prerequisites
> > (Phase
> > >> 1)
> > >> > As these progress through review, the aim is to roll them up into a
> > >> single branch and merge that to trunk together, keeping the separate
> > >> commits for the specific JIRAs.
> > >>
> > >> I think this is a great idea.  Where do you see the “Roll Up Bran

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-30 Thread bened...@apache.org
Essentially this, although I think in practice we will need to track each 
partition’s timestamp separately (or optionally for reduced conflicts, each row 
or datum’s), and make them all part of the conditional application of the 
transaction - at least for strict-serializability.

The alternative is to insert read/write intents for the transaction during each 
step, and to confirm they are still valid on commit, but this approach would 
require a WAN round-trip for each step in the interactive transaction, whereas 
the timestamp-validating approach can use a LAN round-trip for each step 
besides the final one, and is also much simpler to implement.


From: Blake Eggleston 
Date: Thursday, 30 September 2021 at 05:47
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
You could establish a lower timestamp bound and buffer transaction state on the 
coordinator, then make the commit an operation that only applies if all 
partitions involved haven’t been changed by a more recent timestamp. You could 
also implement mvcc either in the storage layer or for some period of time by 
buffering commits on each replica before applying.

> On Sep 29, 2021, at 6:18 PM, Jonathan Ellis  wrote:
>
> How are interactive transactions possible with Accord?
>
>
>
> On Tue, Sep 21, 2021 at 11:56 PM bened...@apache.org 
> wrote:
>
>> Could you explain why you believe this trade-off is necessary? We can
>> support full SQL just fine with Accord, and I hope that we eventually do so.
>>
>> This domain is incredibly complex, so it is easy to reach wrong
>> conclusions. I would invite you again to propose a system for discussion
>> that you think offers something Accord is unable to, and that you consider
>> desirable, and we can work from there.
>>
>> To pre-empt some possible discussions, I am not aware of anything we
>> cannot do with Accord that we could do with either Calvin or Spanner.
>> Interactive transactions are possible on top of Accord, as are transactions
>> with an unknown read/write set. In each case the only cost is that they
>> would use optimistic concurrency control, which is no worse the spanner
>> derivatives anyway (which I have to assume is your benchmark in this
>> regard). I do not expect to deliver either functionality initially, but
>> Accord takes us most of the way there for both.
>>
>>
>> From: Jonathan Ellis 
>> Date: Wednesday, 22 September 2021 at 05:36
>> To: dev 
>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>> Right, I'm looking for exactly a discussion on the high level goals.
>> Instead of saying "here's the goals and we ruled out X because Y" we should
>> start with a discussion around, "Approach A allows X and W, approach B
>> allows Y and Z" and decide together what the goals should be and and what
>> we are willing to trade to get those goals, e.g., are we willing to give up
>> global strict serializability to get the ability to support full SQL.  Both
>> of these are nice to have!
>>
>> On Tue, Sep 21, 2021 at 9:52 PM bened...@apache.org 
>> wrote:
>>
>>> Hi Jonathan,
>>>
>>> These other systems are incompatible with the goals of the CEP. I do
>>> discuss them (besides 2PC) in both the whitepaper and the CEP, and will
>>> summarise that discussion below. A true and accurate comparison of these
>>> other systems is essentially intractable, as there are complex subtleties
>>> to each flavour, and those who are interested would be better served by
>>> performing their own research.
>>>
>>> I think it is more productive to focus on what we want to achieve as a
>>> community. If you believe the goals of this CEP are wrong for the
>> project,
>>> let’s focus on that. If you want to compare and contrast specific facets
>> of
>>> alternative systems that you consider to be preferable in some dimension,
>>> let’s do that here or in a Q&A as proposed by Joey.
>>>
>>> The relevant goals are that we:
>>>
>>>
>>>  1.  Guarantee strict serializable isolation on commodity hardware
>>>  2.  Scale to any cluster size
>>>  3.  Achieve optimal latency
>>>
>>> The approach taken by Spanner derivatives is rejected by (1) because they
>>> guarantee only Serializable isolation (they additionally fail (3)). From
>>> watching talks by YugaByte, and inferring from Cockroach’s
>>> panic-cluster-death under clock skew, this is clearly considered by
>>> everyone to be undesirable but necessary to achieve scalability.
>>>
>>> The approach taken

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread bened...@apache.org
Hi Jonathan,

It would be great if we could achieve a bandwidth higher than 1-2 short emails 
per week. It remains unclear to me what your goal is, and it would help if you 
could make a statement like “I want Cassandra to be able to do X” so that we 
can respond directly to it. I am also available to have another call, in which 
we can have a back and forth, please feel free to propose a London-compatible 
time within the next week that is suitable for you.

In my opinion we are at risk of veering off-topic, though. This CEP is not to 
deliver interactive transactions, and to my knowledge nobody is proposing a CEP 
for interactive transactions. So, for the CEP at hand the salient question 
seems: does this CEP prevent us from implementing interactive transactions with 
properties X, Y, Z in future? To which the answer is almost certainly no.

However, to continue the discussion and respond directly to your queries, I 
believe we agree on the definition of an interactive transaction.

Two protocols were loosely outlined. The first, using timestamps for optimistic 
concurrency control, would indeed involve the possibility of aborts. It would 
not however inherently adopt the issue of LWTs where no transaction is able to 
make progress. Whether or not progress is guaranteed (in a livelock-free sense) 
would depend on the structure of the transactions that were interfering.

This approach has the advantage of being very simple to implement, so that we 
could realistically support interactive transactions quite quickly. It has the 
additional advantage that transactions would execute very quickly by avoiding 
the WAN during construction, and as a result may in practice experience fewer 
aborts than protocols that guarantee livelock-freedom.

The second protocol proposed using read/write intents and would be able to 
support almost any behaviour you want. We could even utilise pessimistic 
concurrency control, or anything in-between. This is its own huge design space, 
and discussion of this approach and the trade-offs that could be made is (in my 
opinion) entirely out of scope for this CEP.


From: Jonathan Ellis 
Date: Friday, 1 October 2021 at 05:00
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
The obstacle for me is you've provided a protocol but not a fully fleshed
out architecture, so it's hard to fill in some of the blanks.  But it looks
to me like optimistic concurrency control for interactive transactions
applied to Accord would leave you in a LWT-like situation under fairly
light contention where nobody actually makes progress due to retries.

To make sure we're talking about the same thing, as Henrik pointed out,
interactive transactions mean multiple round trips from the client within a
transaction.  For example, here
<https://github.com/apavlo/py-tpcc/blob/master/pytpcc/drivers/sqlitedriver.py#L213>
is a simple implementation of the TPC-C New Order transaction.  The high
level logic (via
<https://courses.cs.washington.edu/courses/csep545/01wi/lectures/class1/tsld039.htm>)
is,

   1. Get records describing a warehouse, customer, & district
   2. Update the district
   3. Increment next available order number
   4. Insert record into Order and New-Order tables
   5. For 5-15 items, get Item record, get/update Stock record
   6. Insert Order-Line Record

As you can see, this requires a lot of client-side logic mixed in with the
actual SQL commands.


On Thu, Sep 30, 2021 at 2:30 AM bened...@apache.org 
wrote:

> Essentially this, although I think in practice we will need to track each
> partition’s timestamp separately (or optionally for reduced conflicts, each
> row or datum’s), and make them all part of the conditional application of
> the transaction - at least for strict-serializability.
>
> The alternative is to insert read/write intents for the transaction during
> each step, and to confirm they are still valid on commit, but this approach
> would require a WAN round-trip for each step in the interactive
> transaction, whereas the timestamp-validating approach can use a LAN
> round-trip for each step besides the final one, and is also much simpler to
> implement.
>
>
> From: Blake Eggleston 
> Date: Thursday, 30 September 2021 at 05:47
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> You could establish a lower timestamp bound and buffer transaction state
> on the coordinator, then make the commit an operation that only applies if
> all partitions involved haven’t been changed by a more recent timestamp.
> You could also implement mvcc either in the storage layer or for some
> period of time by buffering commits on each replica before applying.
>
> > On Sep 29, 2021, at 6:18 PM, Jonathan Ellis  wrote:
> >
> > How are interactive transactions possible with Accord?
> >
> >
> >
> > On T

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread bened...@apache.org
Actually, thinking about it again, the simple optimistic protocol would in fact 
guarantee system forward progress (i.e. independent of transaction formulation).


From: bened...@apache.org 
Date: Friday, 1 October 2021 at 09:14
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Jonathan,

It would be great if we could achieve a bandwidth higher than 1-2 short emails 
per week. It remains unclear to me what your goal is, and it would help if you 
could make a statement like “I want Cassandra to be able to do X” so that we 
can respond directly to it. I am also available to have another call, in which 
we can have a back and forth, please feel free to propose a London-compatible 
time within the next week that is suitable for you.

In my opinion we are at risk of veering off-topic, though. This CEP is not to 
deliver interactive transactions, and to my knowledge nobody is proposing a CEP 
for interactive transactions. So, for the CEP at hand the salient question 
seems: does this CEP prevent us from implementing interactive transactions with 
properties X, Y, Z in future? To which the answer is almost certainly no.

However, to continue the discussion and respond directly to your queries, I 
believe we agree on the definition of an interactive transaction.

Two protocols were loosely outlined. The first, using timestamps for optimistic 
concurrency control, would indeed involve the possibility of aborts. It would 
not however inherently adopt the issue of LWTs where no transaction is able to 
make progress. Whether or not progress is guaranteed (in a livelock-free sense) 
would depend on the structure of the transactions that were interfering.

This approach has the advantage of being very simple to implement, so that we 
could realistically support interactive transactions quite quickly. It has the 
additional advantage that transactions would execute very quickly by avoiding 
the WAN during construction, and as a result may in practice experience fewer 
aborts than protocols that guarantee livelock-freedom.

The second protocol proposed using read/write intents and would be able to 
support almost any behaviour you want. We could even utilise pessimistic 
concurrency control, or anything in-between. This is its own huge design space, 
and discussion of this approach and the trade-offs that could be made is (in my 
opinion) entirely out of scope for this CEP.


From: Jonathan Ellis 
Date: Friday, 1 October 2021 at 05:00
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
The obstacle for me is you've provided a protocol but not a fully fleshed
out architecture, so it's hard to fill in some of the blanks.  But it looks
to me like optimistic concurrency control for interactive transactions
applied to Accord would leave you in a LWT-like situation under fairly
light contention where nobody actually makes progress due to retries.

To make sure we're talking about the same thing, as Henrik pointed out,
interactive transactions mean multiple round trips from the client within a
transaction.  For example, here
<https://github.com/apavlo/py-tpcc/blob/master/pytpcc/drivers/sqlitedriver.py#L213>
is a simple implementation of the TPC-C New Order transaction.  The high
level logic (via
<https://courses.cs.washington.edu/courses/csep545/01wi/lectures/class1/tsld039.htm>)
is,

   1. Get records describing a warehouse, customer, & district
   2. Update the district
   3. Increment next available order number
   4. Insert record into Order and New-Order tables
   5. For 5-15 items, get Item record, get/update Stock record
   6. Insert Order-Line Record

As you can see, this requires a lot of client-side logic mixed in with the
actual SQL commands.


On Thu, Sep 30, 2021 at 2:30 AM bened...@apache.org 
wrote:

> Essentially this, although I think in practice we will need to track each
> partition’s timestamp separately (or optionally for reduced conflicts, each
> row or datum’s), and make them all part of the conditional application of
> the transaction - at least for strict-serializability.
>
> The alternative is to insert read/write intents for the transaction during
> each step, and to confirm they are still valid on commit, but this approach
> would require a WAN round-trip for each step in the interactive
> transaction, whereas the timestamp-validating approach can use a LAN
> round-trip for each step besides the final one, and is also much simpler to
> implement.
>
>
> From: Blake Eggleston 
> Date: Thursday, 30 September 2021 at 05:47
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> You could establish a lower timestamp bound and buffer transaction state
> on the coordinator, then make the commit an operation that only applies if
> all partitions involved haven’t been changed by a more recent timestamp.
> You could als

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread bened...@apache.org
I think this is getting circular and unproductive. Basic disagreements about 
whether the CEP specifies a feature I am inclined to leave for a vote. In my 
view the CEP specifies several features, both immediate ones for the user (ACID 
batches and multi-key LWTS) and developer-focused ones around ground-breaking 
semantics that will be enabled.

The proposal as it stands today is exceptionally thorough, more so than any 
other CEP to date, or any CEP is likely to be in the near future.

This is a Cassandra Enhancement *Proposal*, and at some point we have to engage 
with what is proposed, not what you might like to be proposed. Since it remains 
unclear to me what either yourself or Jonathan want to see as an alternative, 
at this point it would seem more productive to produce your own proposals for 
the community to consider. It is possible for multiple transaction systems to 
co-exist, if you feel this is necessary.



From: Paulo Motta 
Date: Friday, 1 October 2021 at 13:58
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I share similar feelings as jbellis that this proposal seems to be focusing
on the protocol itself but lacking the actual feature that will use the
protocol which IMO a key element to discuss on a CEP.

It's similar to saying: hey I want to add this Tries Serialization Protocol
to Cassandra, but not providing specific details of how this protocol is
going to be used.

I think the right route for a CEP is to describe the feature that will be
added to the database and the protocol is a mere requirement of the
high-level feature, for example:

CEP: Add Trie-backed memtable
- Trie Serialization Protocol: implementation detail of the above CEP

What is the difficulty of taking this approach, picking one of the myriad
of features that will be enabled by Accord and using that as the initial
CEP to introduce the protocol to the database?

Em sex., 1 de out. de 2021 às 08:37, bened...@apache.org <
bened...@apache.org> escreveu:

> Actually, thinking about it again, the simple optimistic protocol would in
> fact guarantee system forward progress (i.e. independent of transaction
> formulation).
>
>
> From: bened...@apache.org 
> Date: Friday, 1 October 2021 at 09:14
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Hi Jonathan,
>
> It would be great if we could achieve a bandwidth higher than 1-2 short
> emails per week. It remains unclear to me what your goal is, and it would
> help if you could make a statement like “I want Cassandra to be able to do
> X” so that we can respond directly to it. I am also available to have
> another call, in which we can have a back and forth, please feel free to
> propose a London-compatible time within the next week that is suitable for
> you.
>
> In my opinion we are at risk of veering off-topic, though. This CEP is not
> to deliver interactive transactions, and to my knowledge nobody is
> proposing a CEP for interactive transactions. So, for the CEP at hand the
> salient question seems: does this CEP prevent us from implementing
> interactive transactions with properties X, Y, Z in future? To which the
> answer is almost certainly no.
>
> However, to continue the discussion and respond directly to your queries,
> I believe we agree on the definition of an interactive transaction.
>
> Two protocols were loosely outlined. The first, using timestamps for
> optimistic concurrency control, would indeed involve the possibility of
> aborts. It would not however inherently adopt the issue of LWTs where no
> transaction is able to make progress. Whether or not progress is guaranteed
> (in a livelock-free sense) would depend on the structure of the
> transactions that were interfering.
>
> This approach has the advantage of being very simple to implement, so that
> we could realistically support interactive transactions quite quickly. It
> has the additional advantage that transactions would execute very quickly
> by avoiding the WAN during construction, and as a result may in practice
> experience fewer aborts than protocols that guarantee livelock-freedom.
>
> The second protocol proposed using read/write intents and would be able to
> support almost any behaviour you want. We could even utilise pessimistic
> concurrency control, or anything in-between. This is its own huge design
> space, and discussion of this approach and the trade-offs that could be
> made is (in my opinion) entirely out of scope for this CEP.
>
>
> From: Jonathan Ellis 
> Date: Friday, 1 October 2021 at 05:00
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> The obstacle for me is you've provided a protocol but not a fully fleshed
> out architecture, so it's hard to fill in some of the blanks.  But it looks

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread bened...@apache.org
I am of course more than happy to continue discussing CEP-15 with respect to 
the proposed goals, and queries about the proposed protocol. I hope people feel 
free to continue raising queries. If anybody disagrees with the goals or any 
specific part of the proposal on substantive (rather than aesthetic/structural) 
grounds I also remain very open to further discussion.

However, I think at this point it is reasonable to request that we engage with 
the proposal as defined, and in particular the goals that have been proposed. 
Those who wish for a different proposal can produce one so that it may be 
engaged with on the same terms.

From: bened...@apache.org 
Date: Friday, 1 October 2021 at 14:19
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I think this is getting circular and unproductive. Basic disagreements about 
whether the CEP specifies a feature I am inclined to leave for a vote. In my 
view the CEP specifies several features, both immediate ones for the user (ACID 
batches and multi-key LWTS) and developer-focused ones around ground-breaking 
semantics that will be enabled.

The proposal as it stands today is exceptionally thorough, more so than any 
other CEP to date, or any CEP is likely to be in the near future.

This is a Cassandra Enhancement *Proposal*, and at some point we have to engage 
with what is proposed, not what you might like to be proposed. Since it remains 
unclear to me what either yourself or Jonathan want to see as an alternative, 
at this point it would seem more productive to produce your own proposals for 
the community to consider. It is possible for multiple transaction systems to 
co-exist, if you feel this is necessary.



From: Paulo Motta 
Date: Friday, 1 October 2021 at 13:58
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I share similar feelings as jbellis that this proposal seems to be focusing
on the protocol itself but lacking the actual feature that will use the
protocol which IMO a key element to discuss on a CEP.

It's similar to saying: hey I want to add this Tries Serialization Protocol
to Cassandra, but not providing specific details of how this protocol is
going to be used.

I think the right route for a CEP is to describe the feature that will be
added to the database and the protocol is a mere requirement of the
high-level feature, for example:

CEP: Add Trie-backed memtable
- Trie Serialization Protocol: implementation detail of the above CEP

What is the difficulty of taking this approach, picking one of the myriad
of features that will be enabled by Accord and using that as the initial
CEP to introduce the protocol to the database?

Em sex., 1 de out. de 2021 às 08:37, bened...@apache.org <
bened...@apache.org> escreveu:

> Actually, thinking about it again, the simple optimistic protocol would in
> fact guarantee system forward progress (i.e. independent of transaction
> formulation).
>
>
> From: bened...@apache.org 
> Date: Friday, 1 October 2021 at 09:14
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Hi Jonathan,
>
> It would be great if we could achieve a bandwidth higher than 1-2 short
> emails per week. It remains unclear to me what your goal is, and it would
> help if you could make a statement like “I want Cassandra to be able to do
> X” so that we can respond directly to it. I am also available to have
> another call, in which we can have a back and forth, please feel free to
> propose a London-compatible time within the next week that is suitable for
> you.
>
> In my opinion we are at risk of veering off-topic, though. This CEP is not
> to deliver interactive transactions, and to my knowledge nobody is
> proposing a CEP for interactive transactions. So, for the CEP at hand the
> salient question seems: does this CEP prevent us from implementing
> interactive transactions with properties X, Y, Z in future? To which the
> answer is almost certainly no.
>
> However, to continue the discussion and respond directly to your queries,
> I believe we agree on the definition of an interactive transaction.
>
> Two protocols were loosely outlined. The first, using timestamps for
> optimistic concurrency control, would indeed involve the possibility of
> aborts. It would not however inherently adopt the issue of LWTs where no
> transaction is able to make progress. Whether or not progress is guaranteed
> (in a livelock-free sense) would depend on the structure of the
> transactions that were interfering.
>
> This approach has the advantage of being very simple to implement, so that
> we could realistically support interactive transactions quite quickly. It
> has the additional advantage that transactions would execute very quickly
> by avoiding the WAN during construction, and as a result may in p

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread bened...@apache.org
I disagree with you. However, this is the wrong forum to have a meta discussion 
about how CEP should be structured.

If you want to impose your views on CEP structure on others, please file a CEP 
with the additional restrictions and guidance you want to impose and start a 
discussion thread. I can then respond in detail to why I perceive this approach 
to be flawed, in a dedicated context.


From: Paulo Motta 
Date: Friday, 1 October 2021 at 14:48
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>  The proposal as it stands today is exceptionally thorough, more so than
any other CEP to date, or any CEP is likely to be in the near future.

The protocol is thoroughly described, but in my view CEP is a forum to
discuss the high level architecture and plan for adding a full end-to-end
enhancement to the database, breaking it into sub-CEPs if needed, as long
as the full plan is known in advance, otherwise the community will not have
the context to judge the full extent and impact of the proposed enhancement.

> Since it remains unclear to me what either yourself or Jonathan want to
see as an alternative

I would personally like to see something along these lines:

CEP1: Add ACID-compliant atomic batches
- UX changes needed: none, CQL provides the grammar we need.
- Distributed transaction protocol needed: Accord (link to white paper if
you want specific details about the protcool)
- High-level architecture: what new components will be added, how existing
components will be modified, what new messages will be added, what new
configuration knobs will be introduced, what are the milestones of the
project, etc.

CEP2: Make LWT faster and more reliable
- UX changes needed: none
- Distributed transaction protocol needed: Accord, already added by
previous CEP.
- High-level architecture: blablabla... and so on.

Em sex., 1 de out. de 2021 às 10:19, bened...@apache.org <
bened...@apache.org> escreveu:

> I think this is getting circular and unproductive. Basic disagreements
> about whether the CEP specifies a feature I am inclined to leave for a
> vote. In my view the CEP specifies several features, both immediate ones
> for the user (ACID batches and multi-key LWTS) and developer-focused ones
> around ground-breaking semantics that will be enabled.
>
> The proposal as it stands today is exceptionally thorough, more so than
> any other CEP to date, or any CEP is likely to be in the near future.
>
> This is a Cassandra Enhancement *Proposal*, and at some point we have to
> engage with what is proposed, not what you might like to be proposed. Since
> it remains unclear to me what either yourself or Jonathan want to see as an
> alternative, at this point it would seem more productive to produce your
> own proposals for the community to consider. It is possible for multiple
> transaction systems to co-exist, if you feel this is necessary.
>
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 13:58
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> I share similar feelings as jbellis that this proposal seems to be focusing
> on the protocol itself but lacking the actual feature that will use the
> protocol which IMO a key element to discuss on a CEP.
>
> It's similar to saying: hey I want to add this Tries Serialization Protocol
> to Cassandra, but not providing specific details of how this protocol is
> going to be used.
>
> I think the right route for a CEP is to describe the feature that will be
> added to the database and the protocol is a mere requirement of the
> high-level feature, for example:
>
> CEP: Add Trie-backed memtable
> - Trie Serialization Protocol: implementation detail of the above CEP
>
> What is the difficulty of taking this approach, picking one of the myriad
> of features that will be enabled by Accord and using that as the initial
> CEP to introduce the protocol to the database?
>
> Em sex., 1 de out. de 2021 às 08:37, bened...@apache.org <
> bened...@apache.org> escreveu:
>
> > Actually, thinking about it again, the simple optimistic protocol would
> in
> > fact guarantee system forward progress (i.e. independent of transaction
> > formulation).
> >
> >
> > From: bened...@apache.org 
> > Date: Friday, 1 October 2021 at 09:14
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > Hi Jonathan,
> >
> > It would be great if we could achieve a bandwidth higher than 1-2 short
> > emails per week. It remains unclear to me what your goal is, and it would
> > help if you could make a statement like “I want Cassandra to be able to
> do
> > X” so that we can respond directly to it. I am also available to have
> > another call, in 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread bened...@apache.org
I’m not, though it might seem that way. I disagree with your views about how 
CEP should be structured. Since the CEP process was itself codified via the CEP 
process, if you want to recodify how CEP work, the correct way is via the CEP 
process itself.

The discussion is being drawn in multiple directions away from the CEP itself, 
and I am trying to keep this particular thread focused on the business at hand, 
not meta discussions around CEP structure that will no doubt be unproductive 
given likely irreconcilable views about the topic, nor discussions about other 
CEP that could have been.

If you want to start a separate exploratory discussion thread about CEP 
structure without filing a CEP feel free to do so.


From: Paulo Motta 
Date: Friday, 1 October 2021 at 15:04
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> If you want to impose your views on CEP structure on others, please file
a CEP with the additional restrictions and guidance you want to impose and
start a discussion thread. I can then respond in detail to why I perceive
this approach to be flawed, in a dedicated context.

This sounds very kafkaesque. You know I won't file a meta-CEP to change the
structure of CEP so you're just using this as an excuse to just shut the
discussion on the lack of clarity on what actual palpable feature will be
available once the CEP lands. :-)

I'm just providing my humble feedback on how a CEP could be more digestible
and easier to consume from an external point of view, and this seems like
an appropriate and contextualized place to voice this opinion which is
perhaps shared by others.

Em sex., 1 de out. de 2021 às 10:55, bened...@apache.org <
bened...@apache.org> escreveu:

> I disagree with you. However, this is the wrong forum to have a meta
> discussion about how CEP should be structured.
>
> If you want to impose your views on CEP structure on others, please file a
> CEP with the additional restrictions and guidance you want to impose and
> start a discussion thread. I can then respond in detail to why I perceive
> this approach to be flawed, in a dedicated context.
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 14:48
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >  The proposal as it stands today is exceptionally thorough, more so than
> any other CEP to date, or any CEP is likely to be in the near future.
>
> The protocol is thoroughly described, but in my view CEP is a forum to
> discuss the high level architecture and plan for adding a full end-to-end
> enhancement to the database, breaking it into sub-CEPs if needed, as long
> as the full plan is known in advance, otherwise the community will not have
> the context to judge the full extent and impact of the proposed
> enhancement.
>
> > Since it remains unclear to me what either yourself or Jonathan want to
> see as an alternative
>
> I would personally like to see something along these lines:
>
> CEP1: Add ACID-compliant atomic batches
> - UX changes needed: none, CQL provides the grammar we need.
> - Distributed transaction protocol needed: Accord (link to white paper if
> you want specific details about the protcool)
> - High-level architecture: what new components will be added, how existing
> components will be modified, what new messages will be added, what new
> configuration knobs will be introduced, what are the milestones of the
> project, etc.
>
> CEP2: Make LWT faster and more reliable
> - UX changes needed: none
> - Distributed transaction protocol needed: Accord, already added by
> previous CEP.
> - High-level architecture: blablabla... and so on.
>
> Em sex., 1 de out. de 2021 às 10:19, bened...@apache.org <
> bened...@apache.org> escreveu:
>
> > I think this is getting circular and unproductive. Basic disagreements
> > about whether the CEP specifies a feature I am inclined to leave for a
> > vote. In my view the CEP specifies several features, both immediate ones
> > for the user (ACID batches and multi-key LWTS) and developer-focused ones
> > around ground-breaking semantics that will be enabled.
> >
> > The proposal as it stands today is exceptionally thorough, more so than
> > any other CEP to date, or any CEP is likely to be in the near future.
> >
> > This is a Cassandra Enhancement *Proposal*, and at some point we have to
> > engage with what is proposed, not what you might like to be proposed.
> Since
> > it remains unclear to me what either yourself or Jonathan want to see as
> an
> > alternative, at this point it would seem more productive to produce your
> > own proposals for the community to consider. It is possible for multiple
> > transaction systems to co-ex

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread bened...@apache.org
and the
implications of using a Spanner-inspired approach where the clock skew
between cluster nodes is a necessary part of the commit latency:

Deadline(t0 ,C,P) = t0 +SkewMax +max(Latency(C′,P) |C′ ∈C)−Latency(C,P)

In the white paper you even explicitly mention the trade off you have
chosen: *"This technique trades wide area round-trips for an additional
latency penalty equal to the bounds on clock synchrony."*

If we try to quantify what this trade off means in practice, I get:

Typical value for SkewMax in e.g. the Spanner paper, some CockroachDB
discussions = 7 ms. Maybe 10 - 20 ms if you don't have Google-level
hardware.
Common network latencies in a globally distributed cluster:
US West - East = 60 ms
US East - EU Central = 100 ms
US/EU to APAC, Africa, LATAM = 100-200 ms
Source: https://www.cloudping.co/grid

The conclusion is that this tradeoff definitely makes sense for globally
distributed transactions. This resembles QUORUM writes in current Cassandra.

However, users commonly prefer LOCAL_QUORUM in current Cassandra. I read
that this was discussed in the phone call, but haven't read about a
specific proposal. Just for the sake of completing my math, let's assume
that some LOCAL_QUORUM style Accord commit is invented. A naive example
could be to simply deploy a Cassandra cluster *with Accord transactions* in
a single geographical region, and other geographical regions would be
served by some external replication mechanism and would have to be
read-only.

Whatever the (hypothetical) solution, for LOCAL_QUORUM style or just single
region commits we end up with:

Typical SkewMax = 7 - 20 ms
Network latency < 1 ms.


It seems the SkewMax is quite high for a cluster deployed in a single
region, and what's worse there's no way to avoid it or make it much smaller
than 7 ms?

The only solution that comes to mind while writing this is to design Accord
to be pluggable such that the consensus part could be switched to something
that uses a logical clock for the transaction id. The user would choose one
or the other depending on what they optimize for.


I'll finish with a few notes:

Commit latency in itself isn't categorically bad for performance. I've
worked with several implementations of distributed databases that provide
good throughput even when a single write has high latency due to
geography/speed of light.

However, the duration of a commit is the window during which other
transactions may conflict with the committing transaction. Thus commit
latency will either increase the likelihood of aborted transactions, or in
other concurrency mechanisms block and impose a max throughput for hot rows.

A known optimization for the hot rows problem is to "hint" or manually
force clients to direct all updates to the hot row to the same node,
essentially making the system leader based. This allows the database to
start processing new updates even while the first one is still committing.
(See Galera for an example implementing this
<https://galeracluster.com/library/documentation/using-sr.html#usr-hot-records>.)
This makes me wonder whether there is a similar optimization for Accord
where transactions from the same coordinator can be allowed to commit
within the SkewMax window, because we can assume that the trx timestamps
originating at the same coordinator cannot arrive out of order when using
TPC?



henrik







On Mon, Sep 27, 2021 at 11:59 PM bened...@apache.org 
wrote:

> Ok, it’s time for the weekly poking of the hornet’s nest.
>
> Any more thoughts, questions or criticisms, anyone?
>
> From: bened...@apache.org 
> Date: Friday, 24 September 2021 at 22:41
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> I’m not aware of anybody having taken any notes, but somebody please chime
> in if I’m wrong.
>
> From my recollection, re Accord:
>
>
>   *   Q: Will batches now support rollbacks?
>  *   Batches would apply atomically or not, but unlikely to have a
> concept of rollback. Timeouts remain unknown, but hope to have some
> mechanism to provide clients a definitive answer about such transactions
> after the fact.
>   *   Q: Can stale replicas participate in transactions?
>  *   Accord applies conflicting transactions in-order at every
> replica, so only nodes that are up-to-date may participate in the execution
> of a transaction, but any replica may participate in agreeing a
> transaction. To ensure replicas remain up-to-date I anticipate introducing
> a real-time repair facility at the transactional message level, with peers
> reconciling recently processed messages and cross-delivering any that are
> missing.
>   *   Possible UX directions in very vague terms: CQL atomic and
> conditional batches initially; going forwards interactive transactions?
> Complex user defined functions? SQL?
>   * 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread bened...@apache.org
>From the CEP:

Batches (including unconditional batches) on transactional tables will receive 
ACID properties, and grammatically correct conditional batch operations that 
would be rejected for operating over multiple CQL partitions will now be 
supported


From: Paulo Motta 
Date: Friday, 1 October 2021 at 15:30
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Can you just answer what palpable feature will be available once this CEP
lands because this is still not clear to me (and perhaps to others) from
the current CEP structure. The current document details thoroughly the
protocol but in my view lacks to illustrate what specific API, methods,
modules will become available to developers, how it fits into the larger
picture and interacts with existing modules if at all and perhaps a few
examples of how it can be used to build features on top.

Em sex., 1 de out. de 2021 às 11:10, bened...@apache.org <
bened...@apache.org> escreveu:

> I’m not, though it might seem that way. I disagree with your views about
> how CEP should be structured. Since the CEP process was itself codified via
> the CEP process, if you want to recodify how CEP work, the correct way is
> via the CEP process itself.
>
> The discussion is being drawn in multiple directions away from the CEP
> itself, and I am trying to keep this particular thread focused on the
> business at hand, not meta discussions around CEP structure that will no
> doubt be unproductive given likely irreconcilable views about the topic,
> nor discussions about other CEP that could have been.
>
> If you want to start a separate exploratory discussion thread about CEP
> structure without filing a CEP feel free to do so.
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 15:04
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > If you want to impose your views on CEP structure on others, please file
> a CEP with the additional restrictions and guidance you want to impose and
> start a discussion thread. I can then respond in detail to why I perceive
> this approach to be flawed, in a dedicated context.
>
> This sounds very kafkaesque. You know I won't file a meta-CEP to change the
> structure of CEP so you're just using this as an excuse to just shut the
> discussion on the lack of clarity on what actual palpable feature will be
> available once the CEP lands. :-)
>
> I'm just providing my humble feedback on how a CEP could be more digestible
> and easier to consume from an external point of view, and this seems like
> an appropriate and contextualized place to voice this opinion which is
> perhaps shared by others.
>
> Em sex., 1 de out. de 2021 às 10:55, bened...@apache.org <
> bened...@apache.org> escreveu:
>
> > I disagree with you. However, this is the wrong forum to have a meta
> > discussion about how CEP should be structured.
> >
> > If you want to impose your views on CEP structure on others, please file
> a
> > CEP with the additional restrictions and guidance you want to impose and
> > start a discussion thread. I can then respond in detail to why I perceive
> > this approach to be flawed, in a dedicated context.
> >
> >
> > From: Paulo Motta 
> > Date: Friday, 1 October 2021 at 14:48
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > >  The proposal as it stands today is exceptionally thorough, more so
> than
> > any other CEP to date, or any CEP is likely to be in the near future.
> >
> > The protocol is thoroughly described, but in my view CEP is a forum to
> > discuss the high level architecture and plan for adding a full end-to-end
> > enhancement to the database, breaking it into sub-CEPs if needed, as long
> > as the full plan is known in advance, otherwise the community will not
> have
> > the context to judge the full extent and impact of the proposed
> > enhancement.
> >
> > > Since it remains unclear to me what either yourself or Jonathan want to
> > see as an alternative
> >
> > I would personally like to see something along these lines:
> >
> > CEP1: Add ACID-compliant atomic batches
> > - UX changes needed: none, CQL provides the grammar we need.
> > - Distributed transaction protocol needed: Accord (link to white paper if
> > you want specific details about the protcool)
> > - High-level architecture: what new components will be added, how
> existing
> > components will be modified, what new messages will be added, what new
> > configuration knobs will be introduced, what are the milestones of the
> > project, etc.
> >
> > CEP2: Make LWT fast

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread bened...@apache.org
> The current document details thoroughly the protocol but in my view lacks to 
> illustrate what specific API, methods, modules will become available to 
> developers

With respect to this, in my view this kind of detail is not warranted within a 
CEP. Software development is an exploratory process with respect to structure, 
and these decisions will be made as the CEP progresses. If these need to be 
specified upfront, then the purpose of a CEP – seeking buy in – is invalidated, 
because the work must be complete before you know the answers.


From: bened...@apache.org 
Date: Friday, 1 October 2021 at 15:31
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>From the CEP:

Batches (including unconditional batches) on transactional tables will receive 
ACID properties, and grammatically correct conditional batch operations that 
would be rejected for operating over multiple CQL partitions will now be 
supported


From: Paulo Motta 
Date: Friday, 1 October 2021 at 15:30
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Can you just answer what palpable feature will be available once this CEP
lands because this is still not clear to me (and perhaps to others) from
the current CEP structure. The current document details thoroughly the
protocol but in my view lacks to illustrate what specific API, methods,
modules will become available to developers, how it fits into the larger
picture and interacts with existing modules if at all and perhaps a few
examples of how it can be used to build features on top.

Em sex., 1 de out. de 2021 às 11:10, bened...@apache.org <
bened...@apache.org> escreveu:

> I’m not, though it might seem that way. I disagree with your views about
> how CEP should be structured. Since the CEP process was itself codified via
> the CEP process, if you want to recodify how CEP work, the correct way is
> via the CEP process itself.
>
> The discussion is being drawn in multiple directions away from the CEP
> itself, and I am trying to keep this particular thread focused on the
> business at hand, not meta discussions around CEP structure that will no
> doubt be unproductive given likely irreconcilable views about the topic,
> nor discussions about other CEP that could have been.
>
> If you want to start a separate exploratory discussion thread about CEP
> structure without filing a CEP feel free to do so.
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 15:04
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > If you want to impose your views on CEP structure on others, please file
> a CEP with the additional restrictions and guidance you want to impose and
> start a discussion thread. I can then respond in detail to why I perceive
> this approach to be flawed, in a dedicated context.
>
> This sounds very kafkaesque. You know I won't file a meta-CEP to change the
> structure of CEP so you're just using this as an excuse to just shut the
> discussion on the lack of clarity on what actual palpable feature will be
> available once the CEP lands. :-)
>
> I'm just providing my humble feedback on how a CEP could be more digestible
> and easier to consume from an external point of view, and this seems like
> an appropriate and contextualized place to voice this opinion which is
> perhaps shared by others.
>
> Em sex., 1 de out. de 2021 às 10:55, bened...@apache.org <
> bened...@apache.org> escreveu:
>
> > I disagree with you. However, this is the wrong forum to have a meta
> > discussion about how CEP should be structured.
> >
> > If you want to impose your views on CEP structure on others, please file
> a
> > CEP with the additional restrictions and guidance you want to impose and
> > start a discussion thread. I can then respond in detail to why I perceive
> > this approach to be flawed, in a dedicated context.
> >
> >
> > From: Paulo Motta 
> > Date: Friday, 1 October 2021 at 14:48
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > >  The proposal as it stands today is exceptionally thorough, more so
> than
> > any other CEP to date, or any CEP is likely to be in the near future.
> >
> > The protocol is thoroughly described, but in my view CEP is a forum to
> > discuss the high level architecture and plan for adding a full end-to-end
> > enhancement to the database, breaking it into sub-CEPs if needed, as long
> > as the full plan is known in advance, otherwise the community will not
> have
> > the context to judge the full extent and impact of the proposed
> > enhancement.
> >
> > > Since it remains unclear to me what either yourself or Jo

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread bened...@apache.org
You can take a look at the Accord library, as linked in the CEP: 
https://github.com/belliottsmith/accord

It will of course be modified extensively over time, but this is the basic 
shape of the API that is envisaged. You can take a look at the Maelstrom 
implementation for how this will be integrated with Cassandra (which of course 
will be much more involved).

There will be a function for describing atomic transactions involving some 
combination of reads and writes, and it will be possible to submit these 
operations and receive an answer back. The relevant point of integration for 
this is accord.local.Node#coordinate.

There will likely be separate APIs for providing the system with topology 
changes, which it will ensure are linearized correctly with respect to ongoing 
transactions.

But when it boils down to it, we are providing a single point of entry for 
one-shot transactions. So the API from the perspective of a developer building 
features on top is pretty simple.


From: Paulo Motta 
Date: Friday, 1 October 2021 at 15:40
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> With respect to this, in my view this kind of detail is not warranted
within a CEP. Software development is an exploratory process with respect
to structure, and these decisions will be made as the CEP progresses. If
these need to be specified upfront, then the purpose of a CEP – seeking buy
in – is invalidated, because the work must be complete before you know the
answers.

These need not to be set in stone, they're just a rough sketch of what the
end product will look like to make it easier to build a mental model of the
project, specially for those not directly involved with it, as well as to
guide its development for those involved. At least for me it's much easier
to visualize a project top-down (from how it's going to be used to its
particular implementation details) versus the other way around.

Em sex., 1 de out. de 2021 às 11:33, bened...@apache.org <
bened...@apache.org> escreveu:

> > The current document details thoroughly the protocol but in my view
> lacks to illustrate what specific API, methods, modules will become
> available to developers
>
> With respect to this, in my view this kind of detail is not warranted
> within a CEP. Software development is an exploratory process with respect
> to structure, and these decisions will be made as the CEP progresses. If
> these need to be specified upfront, then the purpose of a CEP – seeking buy
> in – is invalidated, because the work must be complete before you know the
> answers.
>
>
> From: bened...@apache.org 
> Date: Friday, 1 October 2021 at 15:31
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> From the CEP:
>
> Batches (including unconditional batches) on transactional tables will
> receive ACID properties, and grammatically correct conditional batch
> operations that would be rejected for operating over multiple CQL
> partitions will now be supported
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 15:30
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Can you just answer what palpable feature will be available once this CEP
> lands because this is still not clear to me (and perhaps to others) from
> the current CEP structure. The current document details thoroughly the
> protocol but in my view lacks to illustrate what specific API, methods,
> modules will become available to developers, how it fits into the larger
> picture and interacts with existing modules if at all and perhaps a few
> examples of how it can be used to build features on top.
>
> Em sex., 1 de out. de 2021 às 11:10, bened...@apache.org <
> bened...@apache.org> escreveu:
>
> > I’m not, though it might seem that way. I disagree with your views about
> > how CEP should be structured. Since the CEP process was itself codified
> via
> > the CEP process, if you want to recodify how CEP work, the correct way is
> > via the CEP process itself.
> >
> > The discussion is being drawn in multiple directions away from the CEP
> > itself, and I am trying to keep this particular thread focused on the
> > business at hand, not meta discussions around CEP structure that will no
> > doubt be unproductive given likely irreconcilable views about the topic,
> > nor discussions about other CEP that could have been.
> >
> > If you want to start a separate exploratory discussion thread about CEP
> > structure without filing a CEP feel free to do so.
> >
> >
> > From: Paulo Motta 
> > Date: Friday, 1 October 2021 at 15:04
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > > If y

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread bened...@apache.org
> If I'm reading you correctly, then Accord does / could do exactly what I was 
> asking for: two round trips in a single DC cluster, and one roundtrip + 
> SkewMax when network roundtrips are >> SkewMax.

Yes, in fact it’s even better than that. Even in this setup *most* transactions 
will still take only one round-trip, and at worst case (under conflicts) two 
round-trips.

> assuming I got it correct...

As far as I can tell your understanding is correct, yes - though worth noting 
of course that the WAN round-trip on write is asynchronous.

I haven’t encountered Galera – do you have any technical papers to hand?

From: Henrik Ingo 
Date: Friday, 1 October 2021 at 16:24
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Fri, Oct 1, 2021 at 5:30 PM bened...@apache.org 
wrote:

> > Typical value for SkewMax in e.g. the Spanner paper, some CockroachDB
> discussions = 7 ms
>
> I think skew max is likely to be much lower than this, even on commodity
> hardware. Bear in mind that unlike Cockroach and Spanner correctness does
> not depend on this value, only performance. So we can pick the real number,
> not some p100 outlier value.
>
> Also bear in mind that this is an optimisation. In clusters where it makes
> no sense we can simply use the raw protocol and accept transactions will
> very infrequently take two round-trips (which is fine, because in this
> scenario round-trips are cheap).
>
>
Oh, this was not at all obvious :-D

If I'm reading you correctly, then Accord does / could do exactly what I
was asking for: two round trips in a single DC cluster, and one roundtrip +
SkewMax when network roundtrips are >> SkewMax.



> > A known optimization for the hot rows problem is to "hint" or manually
> force clients to direct all updates to the hot row to the same node
>
> So, with a leaderless protocol like Accord the ordering decisions are
> never really bottlenecked - no matter how many are in-flight, a new
> transaction will experience no additional latency determining its execution
> order. The only bottleneck will be execution. For this it is absolutely
> possible to funnel everything to a single coordinator, but I don’t know
> that this would in practice achieve much – the important bottleneck would
> be that the coordinators are all within the same
>
> DC, so that the _replicas_ may all respond to them with their data
> dependencies with minimal delay. This is something we discussed in the
> ApacheCon call as it happens. If a significant number of transactions are
> pending, and they are in different DCs, it would be quite straightforward
> to nominate a coordinator within the DC serving the majority of operations
> to serve the remainder, and to forward the results to the original
> coordinators.
>
>
Thanks for explaining. This is really interesting. I now reread section 2.2
of the paper and realize it says exactly this.

So in Accord:

Step 1: One network round trip + SkewMax to establish a global ordering.

Step 2: a) One (local) network round trip for read phase, One (wan) round
trip for writes.
 b) In addition, before either reading or writing, the node
must first commit and apply all previous transactions that are in the
"deps" set of this transaction.

In addition, if we implement interactive transactions, or support for
secondary indexes, or other "complex" transactions, then that work would
happen before Step 1.

Ok, now that I spelled this out... assuming I got it correct... Then this
actually resembles Galera more than Spanner. The wall clock time is not
actually the transaction id, it's just a step in the consensus dialogue
where nodes agree on a global ordering.



> I don’t anticipate this optimisation being a high priority until we have
> user reports of this bottleneck in the wild, however. Since clients for
> many workloads will naturally be geo-partitioned so that related state is
> being updated from the same region, it might simply not be needed – at
> least any time soon.
>
>
For sure. I think we're all just trying to understand the landscape what we
are talking about here, not trying to say everything should be implemented
in v1.


henrik

--

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-03 Thread bened...@apache.org
Hi everyone,

It’s been a month since I brought this proposal forward. I think we’re ready 
for a vote, and I’d like to get a show of hands to see if others agree.

I don’t intend for this to curtail any further questions or suggestions. I’m 
grateful for the continued healthy discussion, but from my point of view the 
topics we are now covering are not core to the proposal’s adoption.

If anyone think this proposal is not ready for a vote, I would really 
appreciate it if that sentiment could be accompanied by a brief statement of 
what is wrong with the substance of the proposal, so that we can address these 
issues directly to move things forward.

Thanks!



Re: [VOTE] Release dtest-api 0.0.10

2021-10-05 Thread bened...@apache.org
+1

From: Oleksandr Petrov 
Date: Tuesday, 5 October 2021 at 17:47
To: dev 
Subject: [VOTE] Release dtest-api 0.0.10
Proposing the test build of in-jvm dtest API 0.0.10 for release.

Repository:
https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git;a=shortlog;h=refs/tags/0.0.10

Candidate SHA:
https://github.com/apache/cassandra-in-jvm-dtest-api/commit/2139b4c85e319b17afbdea2f653152d1e1895fc6
tagged with 0.0.10

Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1249/org/apache/cassandra/dtest-api/0.0.10/

Key signature: A4C465FEA0C552561A392A61E91335D77E3E87CB

Changes since last release:
  * CASSANDRA-17013: CEP-10 Simulator Improvements


The vote will be open for 24 hours. Everyone who has tested the build
is invited to vote. Votes by PMC members are considered binding. A
vote passes if there are at least three binding +1s.


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-06 Thread bened...@apache.org
We have discussed the API at length in this thread. The API primarily involves 
the semantics of the transactions, as besides this the API of a transaction is 
simply:

Result perform(Transaction transaction)

As discussed in follow-up to that email, a prototype API is specified alongside 
the prototype protocol. I am unsure what more you want than this, or the above, 
or the prior semantic discussions.

It seems clear that you’re unhappy with the proposal, but it remains ambiguous 
as to why. Your emails are terse, infrequent and unclear. My responses receive 
no follow up from you, even to clarify if I have answered your query. Sometime 
later I seem to be able to expect a new unrelated problem that you are unhappy 
about. You have not yet responded to even one of my repeated offers to hop on a 
call to hash out any of your concerns, even if only to decline.

This does not feel like constructive and respectful engagement to me, and I am 
losing interest.



From: Jonathan Ellis 
Date: Wednesday, 6 October 2021 at 00:02
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I honestly can't understand the perspective that on the one hand, you're
asking for approval of a specific protocol as part of the CEP, but on the
other, you think discussion of the APIs this will enable is not warranted.
Surely we need agreement on what APIs we're trying to build, before we
discuss the protocols and architectures with which to build them.

On Fri, Oct 1, 2021 at 9:34 AM bened...@apache.org 
wrote:

> > The current document details thoroughly the protocol but in my view
> lacks to illustrate what specific API, methods, modules will become
> available to developers
>
> With respect to this, in my view this kind of detail is not warranted
> within a CEP. Software development is an exploratory process with respect
> to structure, and these decisions will be made as the CEP progresses. If
> these need to be specified upfront, then the purpose of a CEP – seeking buy
> in – is invalidated, because the work must be complete before you know the
> answers.
>
>
> From: bened...@apache.org 
> Date: Friday, 1 October 2021 at 15:31
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> From the CEP:
>
> Batches (including unconditional batches) on transactional tables will
> receive ACID properties, and grammatically correct conditional batch
> operations that would be rejected for operating over multiple CQL
> partitions will now be supported
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 15:30
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Can you just answer what palpable feature will be available once this CEP
> lands because this is still not clear to me (and perhaps to others) from
> the current CEP structure. The current document details thoroughly the
> protocol but in my view lacks to illustrate what specific API, methods,
> modules will become available to developers, how it fits into the larger
> picture and interacts with existing modules if at all and perhaps a few
> examples of how it can be used to build features on top.
>
> Em sex., 1 de out. de 2021 às 11:10, bened...@apache.org <
> bened...@apache.org> escreveu:
>
> > I’m not, though it might seem that way. I disagree with your views about
> > how CEP should be structured. Since the CEP process was itself codified
> via
> > the CEP process, if you want to recodify how CEP work, the correct way is
> > via the CEP process itself.
> >
> > The discussion is being drawn in multiple directions away from the CEP
> > itself, and I am trying to keep this particular thread focused on the
> > business at hand, not meta discussions around CEP structure that will no
> > doubt be unproductive given likely irreconcilable views about the topic,
> > nor discussions about other CEP that could have been.
> >
> > If you want to start a separate exploratory discussion thread about CEP
> > structure without filing a CEP feel free to do so.
> >
> >
> > From: Paulo Motta 
> > Date: Friday, 1 October 2021 at 15:04
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > > If you want to impose your views on CEP structure on others, please
> file
> > a CEP with the additional restrictions and guidance you want to impose
> and
> > start a discussion thread. I can then respond in detail to why I perceive
> > this approach to be flawed, in a dedicated context.
> >
> > This sounds very kafkaesque. You know I won't file a meta-CEP to change
> the
> > structure of CEP so you're just using this as an excuse to just shut the
> > discu

[DISCUSS] CASSANDRA-17024: Artificial Latency Injection

2021-10-06 Thread bened...@apache.org
Hi Everyone,

This is a modest user-facing feature that I want to highlight in case anyone 
has any input. In order to validate if a real cluster may modify its topology 
or consistency level (e.g. from local to global), this ticket introduces a 
facility for injecting latency to internode messages. This is particularly 
helpful for high-availability topologies, and in particular for LWTs (where 
performance may be unpredictable due to contention), so that real traffic may 
be modified to experience gradually increasing latency in order to validate a 
topology (or the impact of a global consistency level) before any transition is 
undertaken.

The user-visible changes include new config parameters, new JMX end points for 
modifying these parameters, and new consistency levels that may be supplied to 
mark queries as suitable for latency injection (so that applications may 
nominate queries for this mechanism)




Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection

2021-10-06 Thread bened...@apache.org
This is a very good point. I forget the reason we settled on consistency 
levels, I assume it was due to simplicity of the solution, as deploying support 
for a new protocol-level change is more involved.

That’s probably not a good reason here, and I agree that overloading 
consistency level feels wrong. I hope we will retire user-provided consistency 
levels over the coming year or two, which is another good reason not to begin 
enhancing it with new meanings.

I will rework the ticket and patches.

From: Paulo Motta 
Date: Wednesday, 6 October 2021 at 14:37
To: Cassandra DEV 
Subject: Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection
This sounds like a great feature!

I wonder if Consistencylevel is the best way to expose this to users
though, can't we implement this via another driver/protocol option ? Ie.
"delay_enabled" flag that would be a modifier to an existing CL.

If we decide to go the CL route, I wonder if this isn't a good opportunity
to introduce pluggable consistency levels (CASSANDRA-8119 <
https://issues.apache.org/jira/browse/CASSANDRA-8119>) so these would only
become available when the feature is enabled.

My concern here is adding niche consistency levels to the default CL table
which may create confusion to non-power users.

Em qua., 6 de out. de 2021 às 10:12, bened...@apache.org <
bened...@apache.org> escreveu:

> Hi Everyone,
>
> This is a modest user-facing feature that I want to highlight in case
> anyone has any input. In order to validate if a real cluster may modify its
> topology or consistency level (e.g. from local to global), this ticket
> introduces a facility for injecting latency to internode messages. This is
> particularly helpful for high-availability topologies, and in particular
> for LWTs (where performance may be unpredictable due to contention), so
> that real traffic may be modified to experience gradually increasing
> latency in order to validate a topology (or the impact of a global
> consistency level) before any transition is undertaken.
>
> The user-visible changes include new config parameters, new JMX end points
> for modifying these parameters, and new consistency levels that may be
> supplied to mark queries as suitable for latency injection (so that
> applications may nominate queries for this mechanism)
>
>
>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-06 Thread bened...@apache.org
The goals of the CEP are stated clearly, and these were the goals we had going 
into the (multi-month) research project we undertook before proposing this CEP. 
These goals are necessarily value judgements, so we cannot expect that everyone 
will agree that they are optimal.

So far you have not engaged with these goals to state any specific 
disagreement. I have engaged with all of the trade-offs you imagined, and every 
specific concern you have raised. Despite a month having elapsed and a great 
deal of time spent answering your emails, this is the first confirmation I have 
that you are dissatisfied with my responses to you.

The role of the CEP is to advertise a project, allowing people to register 
their interest in collaborating, and for technical concerns to be stated in 
advance. So far you have expressed no specific technical concerns that I have 
not engaged with, and yet I have received no response to my engagements.

The role of the CEP is *not* to permit members of the community to dictate 
their preferences on the proposers, or to declare that the CEP is inadequate 
because it doesn’t meet their goals, or to demand additional work to explore 
others’ preferred research avenues on the topic.

You have to do some of the work here, Jonathan.

If you have an alternative approach, I continue to ask you to propose it so we 
may compare and contrast in a specific and technical manner.  If you have any 
specific technical concerns I exhort you to raise them, so we my discuss them. 
If you dispute the goals, please make an argument as to why. If our goals are 
irreconcilable, file another CEP.



From: Jonathan Ellis 
Date: Wednesday, 6 October 2021 at 14:41
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I've repeatedly explained why I'm unhappy: instead of starting with a
discussion of what API and tradeoffs we should make to get that, this CEP
starts with a protocol and asks us to figure out what API we can build with
it.

Of course by API I mean, what kinds of CQL and SQL operations we can
perform, with what kinds of ACID semantics and what kinds of performance,
not "Result perform(Transaction transaction)".  And it's not simply SQL
syntax, either.  I realize that this could sound a little vague, but that's
why I gave an example of the kind of analysis I'm talking about in my first
reply.  Your responses have been to attempt to avoid the discussion
entirely ("the relevant goals are [mine]") or to declare it to be out of
scope.

The CEP process is intended to help get to alignment across the community
of PMC members, committers, and contributors on goals and outcomes before
starting in writing code, not simply to bless a completed design.  That's
why we're going in circles here.

On Wed, Oct 6, 2021 at 2:12 AM bened...@apache.org 
wrote:

> We have discussed the API at length in this thread. The API primarily
> involves the semantics of the transactions, as besides this the API of a
> transaction is simply:
>
> Result perform(Transaction transaction)
>
> As discussed in follow-up to that email, a prototype API is specified
> alongside the prototype protocol. I am unsure what more you want than this,
> or the above, or the prior semantic discussions.
>
> It seems clear that you’re unhappy with the proposal, but it remains
> ambiguous as to why. Your emails are terse, infrequent and unclear. My
> responses receive no follow up from you, even to clarify if I have answered
> your query. Sometime later I seem to be able to expect a new unrelated
> problem that you are unhappy about. You have not yet responded to even one
> of my repeated offers to hop on a call to hash out any of your concerns,
> even if only to decline.
>
> This does not feel like constructive and respectful engagement to me, and
> I am losing interest.
>
>
>
> From: Jonathan Ellis 
> Date: Wednesday, 6 October 2021 at 00:02
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> I honestly can't understand the perspective that on the one hand, you're
> asking for approval of a specific protocol as part of the CEP, but on the
> other, you think discussion of the APIs this will enable is not warranted.
> Surely we need agreement on what APIs we're trying to build, before we
> discuss the protocols and architectures with which to build them.
>
> On Fri, Oct 1, 2021 at 9:34 AM bened...@apache.org 
> wrote:
>
> > > The current document details thoroughly the protocol but in my view
> > lacks to illustrate what specific API, methods, modules will become
> > available to developers
> >
> > With respect to this, in my view this kind of detail is not warranted
> > within a CEP. Software development is an exploratory process with respect
> > to structure, and these decisions will be made as the CEP progresses. If
> > these need to be specified upfront, then the purpose of a CEP – seeking
> buy
> > in – is invalidated, because the work must be complete before you know
> the
> > answers.
> >


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-06 Thread bened...@apache.org
The problem with dropping a patch on Jira is that there is no opportunity to 
point out problems, either with the fundamental approach or with the specific 
implementation. So please point out some problems I can engage with!


From: Jonathan Ellis 
Date: Wednesday, 6 October 2021 at 15:48
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Wed, Oct 6, 2021 at 9:21 AM bened...@apache.org 
wrote:

> The goals of the CEP are stated clearly, and these were the goals we had
> going into the (multi-month) research project we undertook before proposing
> this CEP. These goals are necessarily value judgements, so we cannot expect
> that everyone will agree that they are optimal.
>

Right, so I'm saying that this is exactly the most important thing to get
consensus on, and creating a CEP for a protocol to achieve goals that you
have not discussed with the community is the CEP equivalent of dropping a
patch on Jira without discussing its goals either.

That's why our conversations haven't gone anywhere, because I keep saying
"we need discuss the goals and tradeoffs", and I'll give an example of what
I mean, and you keep addressing the examples (sometimes very shallowly, "it
would be possible to X" or "Y could be done as an optimization") while
ignoring the request to open a discussion around the big picture.


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-06 Thread bened...@apache.org
ity or performance hit to generalizing to multiple
regions, apart from the speed of light.  And since Calvin is already paying
a batching latency penalty, this is less painful than for other systems.

Application to Cassandra: B-.  Distributed transactions are handled by the
sequencing and scheduling layers, which are leaderless, and Calvin’s
requirements for the storage layer are easily met by C*.  But Calvin also
requires a global consensus protocol and LWT is almost certainly not
sufficiently performant, so this would require ZK or etcd (reasonable for a
library approach but not for replacing LWT in C* itself), or an
implementation of Accord.  I don’t believe Calvin would require additional
table-level metadata in Cassandra.

On Wed, Oct 6, 2021 at 9:53 AM bened...@apache.org 
wrote:

> The problem with dropping a patch on Jira is that there is no opportunity
> to point out problems, either with the fundamental approach or with the
> specific implementation. So please point out some problems I can engage
> with!
>
>
> From: Jonathan Ellis 
> Date: Wednesday, 6 October 2021 at 15:48
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> On Wed, Oct 6, 2021 at 9:21 AM bened...@apache.org 
> wrote:
>
> > The goals of the CEP are stated clearly, and these were the goals we had
> > going into the (multi-month) research project we undertook before
> proposing
> > this CEP. These goals are necessarily value judgements, so we cannot
> expect
> > that everyone will agree that they are optimal.
> >
>
> Right, so I'm saying that this is exactly the most important thing to get
> consensus on, and creating a CEP for a protocol to achieve goals that you
> have not discussed with the community is the CEP equivalent of dropping a
> patch on Jira without discussing its goals either.
>
> That's why our conversations haven't gone anywhere, because I keep saying
> "we need discuss the goals and tradeoffs", and I'll give an example of what
> I mean, and you keep addressing the examples (sometimes very shallowly, "it
> would be possible to X" or "Y could be done as an optimization") while
> ignoring the request to open a discussion around the big picture.
>


--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Tradeoffs for Cassandra transaction management

2021-10-10 Thread bened...@apache.org
Hi Jonathan,

I will summarise my position below, that I have outlined at various points in 
the other thread, and then I would be interested to hear how you propose we 
move forwards. I will commit to responding the same day to any email I receive 
before 7pm GMT, and to engaging with each of your points. I would appreciate it 
if you could make similar commitments so that we may conclude this discussion 
in a reasonable time frame and conduct a vote on CEP-15.

I also reiterate my standing invitation to an open video chat, to discuss 
anything you like, for as long as you like. Please nominate a suitable time and 
day.

==TL;DR==
CEP-15 does not narrow our future options, it only broadens them. Accord is a 
distributed consensus protocol, so these techniques may build upon it without 
penalty. Alternatively, these approaches may simply live alongside Accord.

Since these alternative approaches do not achieve the goals of the CEP, and 
this CEP only enhances your ability to pursue them, it seems hard to conclude 
it should not proceed.

==Goals==
Our goals are first order principles: we want strict serializable cross-shard 
isolation that is highly available and can be scaled while maintaining optimal 
and predictable latency. Anything less, and the CEP is not achieved.

As outlined already (except SLOG, which I address below), these alternative 
approaches do not achieve these goals.

==Compatibility with other approaches==
0. In general, research systems are not irreducible - they are an assembly of 
ideas that can be mixed together. Accord is a distributed consensus protocol. 
These other protocols may utilise it without penalty for consensus, in many 
cases obtaining improved characteristics. Conversely, Accord may itself 
directly integrate some of these ideas.

1. Cockroach, YugaByte, Dynamo et al utilize read and write intents, the same 
as outlined as a technique for interactive transactions with Accord. They 
manage these in a distributed state machine with per-shard consensus, 
permitting them to achieve serializable isolation. This same technique can be 
used with Accord, with the advantage that strict serializable isolation would 
be achievable. For simple transactions we would be able to execute with “pure” 
Accord and retain its execution advantage. Accord does not disadvantage this 
approach, it is only enhanced and made easier.

2. Calvin: Accord is broadly functionally equivalent, only leaderless, thereby 
achieving better global latency properties.

3. SLOG: This is essentially Calvin. The main modification is that we may 
assign data a home region, so that transactions may be faster if they 
participate in just one region, and slower if they involve multiple regions. 
Note that this protocol does not achieve global serializability without either 
losing consistency or availability under network partition or paying a WAN cost.

In its consistent mode SLOG therefore remains slower than Accord for both 
single-home and multi-home transactions. Accord requires one WAN penalty for 
linearizing a transaction (competing transactions pay this cost simultaneously, 
as with SLOG), however this is achieved for global clients, whereas SLOG must 
cross the WAN multiple times for transactions initiated from outside their 
home, and for all multi-home transactions.

As discussed elsewhere, a future optimisation with Accord is to temporarily 
“home” competing transaction for execution only, so that there is no additional 
WAN penalty when executing competing transactions. This would confer the same 
performance advantages as SLOG, without any of its penalties for multi-home 
transactions or heterogenous latency characteristics, nor any of the 
complexities of re-homing data, thus avoiding these unpredictable performance 
characteristics.

For those use cases that do not require high availability, it would be possible 
to implement a “home” region setup with Accord, as with SLOG. This is not an 
idea that is exclusive to this particular system. We even discussed this 
briefly in the call, as some use cases do indeed prefer this trade-off.

SLOG additionally offers a kind of “home group” multi-home optimisation for 
clusters with many regions, that accept availability loss if fewer than half of 
their regions fail (e.g. in the paper 6 regions in pairs of 2 for 
availability). This is also exploitable by Accord, and something we can pursue 
as a future optimisation, as users explore such topologies in the real world.

==Responding to specific points==

>because it was asserted in the CEP-15 thread that Accord could support SQL by 
>applying known techniques on top. This is mistaken. Deterministic systems like 
>Calvin or SLOG or Accord can support queries where the rows affected are not 
>known in advance using a technique that Abadi calls OLLP

Language is hard and it is easy to conflate things. Here you seem to be 
discussing abort-free interactive transactions, not SQL. SQL does not 
necessitate suppor

  1   2   3   >