from:"Josh McKenzie"

Re: [DISCUSSION] New dependencies for SAI CEP-7

2022-12-13 Thread Josh McKenzie

Whatever we decide on, let's make sure we document it so newcomers on the 
project (or really anyone new to property based testing) can better discover 
those things.

https://cassandra.apache.org/_/development/testing.html

On Tue, Dec 13, 2022, at 1:08 PM, David Capwell wrote:
> Speaking to Caleb in Slack, so putting the main comments I have there here…
> 
> I am not -1 on this new dependency, but more asking what we should use for 
> random testing moving forward…. ATM we have the following:
> 
> 1) QuickTheories - I feel like I am the only user at this point…
> 2) 1-off - many reinvent random testing for a specific class; using Random, 
> ThreadLocalRandom, UUID.randomUUID(), and lang3 classes (such as 
> org.apache.commons.lang3.RandomUtils)
> 3) Harry - even though the main API is for cluster testing, this is built 
> on-top of random generation so could be used for low level random testing 
> (just less fleshed out for this use-case)
> 4) Simulator - same as Harry, built on top of a random generator and not 
> fleshed out for low level random testing
> 
> Another reason I ask this is I have a fuzz testing that I have developed for 
> Accord testing that generates random valid CQL statements to make sure we “do 
> the right thing” and have been struggling with the question “where do I put 
> this” and “what random do I use?”.  I built this off QuickTheories as I have 
> a lot of utilities for building all supported Tables and Types so really 
> quick do bootstrap, and every other random testing thing we have are less 
> fleshed out… so if we add yet another random testing library what “should” we 
> be using?  Do we build on-top of it to get to the same level QuickTheory is 
> (see org.apache.cassandra.utils.Generators, 
> org.apache.cassandra.utils.CassandraGenerators, and 
> org.apache.cassandra.utils.AbstractTypeGenerators)?
> 
>> On Dec 13, 2022, at 9:21 AM, Caleb Rackliffe  
>> wrote:
>> 
>> We need random generators no matter what for these tests, so I think what we 
>> need to decide is whether to continue to use Carrot or migrate those to 
>> QuickTheories, along the lines of what we have now in 
>> org.apache.cassandra.utils.Generators.
>> 
>> When it comes to a library like this, the thing I would optimize for is how 
>> much it already provides (and therefore how much we need to write and 
>> maintain ourselves). If you look at something like NumericTypeSortingTest in 
>> the 18058 branch , it's pretty 
>> compact w/ Carrot's RandomizedTest in use, but I suppose it could also use 
>> IntegersDSL from QT...
>> 
>> (Not that it matters, but just for reference, we do use 
>> com.carrotsearch.hppc already.)
>> 
>> On Tue, Dec 13, 2022 at 10:14 AM Mike Adamson  wrote:
 Can you talk more about why?  There are several ways to do random testing 
 in-tree ATM, so wondering why we need another one
>>> 
>>> I can see one mechanism for random testing in-tree. That is the Simulator 
>>> but that seems primarily involved in the random orchestration of 
>>> operations. My apologies if I have simplified its significance. Apart from 
>>> that, I can only see different usages of Random in unit tests. I admit I 
>>> have not looked beyond this at dtests.
>>> 
>>> The random testing in SAI is more focussed on the behaviour of the 
>>> low-level index structures and flow of data to / from these. Using randomly 
>>> generated values in tests has proved invaluable in highlighting edge 
>>> conditions in the code. This above library was only added to provide us 
>>> with a rich set of random generators. I am happy to look at removing this 
>>> library if its inclusion is contentious.
>>> 
>>> 
>>> On Mon, 12 Dec 2022 at 19:41, David Capwell  wrote:
> com.carrotsearch.randomizedtesting.randomizedtesting-runner 2.1.2 - test 
> dependency
 
 Can you talk more about why?  There are several ways to do random testing 
 in-tree ATM, so wondering why we need another one
 
 
> On Dec 8, 2022, at 6:51 AM, Mike Adamson  wrote:
> 
> Hi,
> 
> I wanted to discuss the addition of the following dependencies for CEP-7. 
> The dependencies are:
> 
> org.apache.lucene.lucene-core 7.5.0
> org.apache.lucene.lucene-analyzers-common 7.5.0
> com.carrotsearch.randomizedtesting.randomizedtesting-runner 2.1.2 - test 
> dependency
> 
> Lucene is an apache project so is licensed APL2. Carrotsearch is not an 
> apache project but is licensed APL2
> 
> We are also removing the dependency on 
> com.github.rholder.snowball-stemmer. This library is used by SASI 
> stemming filters but a later version of the same library is available in 
> the lucene libraries.
> 
> Does anyone have any concerns about these changes?
> 
> Mike Adamson
> 
 
>>> 
>>> 
>>> -- 
>>> DataStax Logo Square 
>>> *Mike Adamson*
>>> Engineering
>>> +1 650 389 6000  | dat

Re: [VOTE] CEP-25: Trie-indexed SSTable format

2022-12-19 Thread Josh McKenzie

+1

On Mon, Dec 19, 2022, at 11:54 AM, SAURABH VERMA wrote:
> +1
> 
> On Mon, Dec 19, 2022 at 9:36 PM Benjamin Lerer  wrote:
>> +1
>> 
>> Le lun. 19 déc. 2022 à 16:31, Andrés de la Peña  a 
>> écrit :
>>> +1
>>> 
>>> On Mon, 19 Dec 2022 at 15:11, Aleksey Yeshchenko  wrote:
 +1
 
> On 19 Dec 2022, at 13:42, Ekaterina Dimitrova  
> wrote:
> 
> +1
> 
> On Mon, 19 Dec 2022 at 8:30, J. D. Jordan  
> wrote:
>> +1 nb
>> 
>> > On Dec 19, 2022, at 7:07 AM, Brandon Williams  wrote:
>> > 
>> > +1
>> > 
>> > Kind Regards,
>> > Brandon
>> > 
>> >> On Mon, Dec 19, 2022 at 6:59 AM Branimir Lambov  
>> >> wrote:
>> >> 
>> >> Hi everyone,
>> >> 
>> >> I'd like to propose CEP-25 for approval.
>> >> 
>> >> Proposal: 
>> >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-25%3A+Trie-indexed+SSTable+format
>> >> Discussion: 
>> >> https://lists.apache.org/thread/3dpdg6dgm3rqxj96cyhn58b50g415dyh
>> >> 
>> >> The vote will be open for 72 hours.
>> >> Votes by committers are considered binding.
>> >> A vote passes if there are at least three binding +1s and no binding 
>> >> vetoes.
>> >> 
>> >> Thank you,
>> >> Branimir
 
> 
> 
> -- 
> Thanks & Regards,
> Saurabh Verma, 
> India
>

Cassandra project status, Year in Review Holiday Edition

2022-12-19 Thread Josh McKenzie

As you may have heard, Cassandra-4.1 is GA! Congrats to everyone that worked 
hard to get this release out the door. I'm certain users of Cassandra are going 
to appreciate the new functionality in the release combined with the robust 
testing and validation we've all done on this so keep the high quality bar we 
set with 4.0

Downloads of the build can be found here: 
https://cassandra.apache.org/_/download.html

So year in review - I figured I'd snapshot a little data to see how we did in 
2022 in aggregate vs. 2021; some of this (plus quite a bit more) I hope to 
cover at the Cassandra Summit next spring in a talk (review board willing), but 
worth taking a moment to look back at the calendar year and see if any trends 
emerge.

Some stats from git:
2021:
35 unique contributors

https://github.com/apache/cassandra/graphs/contributors?from=2021-01-01&to=2021-12-31&type=c
Commits in the year sans Merge commits:
git log --after="2021-01-01" --before="2022-01-01" --oneline | grep -v 
"Merge branch" | wc -l
579
Files changed, additions, and deletions calendar year 2021:
git log --after="2021-01-01" --before="2021-12-31" --shortstat trunk | 
\ 
awk '/^ [0-9]/ { f += $1; i += $4; d += $6 } \  
   
END { printf("%d files changed, %d insertions(+), %d deletions(-)", f, 
i, d) }'
- 5125 files changed, 141894 insertions(+), 73409 deletions(-)% 

2022:
31 unique contributors

https://github.com/apache/cassandra/graphs/contributors?from=2022-01-01&to=2022-12-19&type=c
Commits in the year sans Merge commits:
git log --after="2022-01-01" --before="2022-12-19" --oneline | grep -v 
"Merge branch" | wc -l
564
Files changed, additions, and deletions calendar year 2022:
git log --after="2022-01-01" --before="2022-12-19" --shortstat trunk | 
\ 
awk '/^ [0-9]/ { f += $1; i += $4; d += $6 } \  
   
END { printf("%d files changed, %d insertions(+), %d deletions(-)", f, 
i, d) }'
- 5974 files changed, 321989 insertions(+), 63759 deletions(-)%

All told, some interesting vanity stats. Looks like things are continuing at a 
healthy pace on the project with a marked uptick in additions over the past 
year compared to the year prior. Marked as in "double". ;)


[To Pay Attention To]
We have three tickets that need committer attention: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20resolution%20%3D%20unresolved%20and%20status%20%3D%20%22Needs%20Committer%22.
 The first 2 are a repeat from the last status email and the third is new:

CASSANDRA-17997: Improve git branch handling for CircleCI generate.sh
CASSANDRA-17861: Update Python test framework from nose to pytest in CCM
CASSANDRA-17797: All system properties and environment variables should be 
accessed via the new CassandraRelevantProperties and CassandraRelevantEnv 
classes

If you're a committer with some spare time please take a look at one of the 
above and see if you can help unstick them.

We have 20 tickets marked 4.0.x that could use a reviewer and 35 on 4.x: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&selectedIssue=CASSANDRA-17251&quickFilter=2259


[New Contributors Getting Started]
We currently have a curated list of 12 starter tickets that are unassigned on 
our current patch release version and the link can be found here: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2162&quickFilter=2454&quickFilter=2160

Another good option if you're looking to engage with the ecosystem, we have the 
official Cassandra Sidecar JIRA and open issues can be found here: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRASC%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20assignee%20DESC%2C%20priority%20DESC%2C%20updated%20DESC

The project can be cloned from the github repo here: 
https://github.com/apache/cassandra-sidecar

We hang out in #cassandra-dev on https://the-asf.slack.com and there's a 
@cassandra_mentors alias you can use to reach a bunch of us that have 
volunteered to help newcomers get situated. If you need an invite to the slack 
channel feel free to reply to just me on this email and I'll get you set up.

Here's reference explaining the various types of contribution: 
https://cassandra.apache.org/_/community.html#how-to-contribute
An overview of the C* architecture: 
https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html
The getting started contributing guide: 
https://cassandra.apache.org/_/development/index.html


[Dev list Digest]
https://lists.apache.org/list?dev@cassandra.apache.org:lte=23d:
We had a fairly thorough discussion about what constitutes an API change and 
when to bring things to the dev list proactively vs. lazy consensus that can be 
found here: https://lists.apache.org/thread/3n8lb1syrsmx89d10kyqz94zzdqhz3o5. I 
plan to poke th

[DISCUSS] Taking another(other(other)) stab at performance testing

2022-12-30 Thread Josh McKenzie

There was a really interesting presentation from the Lucene folks at ApacheCon 
about how they're doing perf regression testing. That combined with some recent 
contributors wanting to get involved on some performance work and not having 
much direction or clarity on how to get involved led some of us to come 
together and riff on what we might be able to take away from that presentation 
and context.

Lucene presentation: "Learning from 11+ years of Apache Lucene benchmarks": 
https://docs.google.com/presentation/d/1Tix2g7W5YoSFK8jRNULxOtqGQTdwQH3dpuBf4Kp4ouY/edit#slide=id.p

Their nightly indexing benchmark site: 
https://home.apache.org/~mikemccand/lucenebench/indexing.html

I've checked in with a handful of performance minded contributors in early 
December and we came up with a first draft, then some others of us met on an 
adhoc call on the 12/9 (which was recorded; ping on this thread if you'd like 
that linked - I believe Joey Lynch has that).

Here's where we landed after the discussions earlier this month (1st page, 
estimated reading time 5 minutes): 
https://docs.google.com/document/d/1X5C0dQdl6-oGRr9mXVPwAJTPjkS8lyt2Iz3hWTI4yIk/edit#

Curious to hear what other perspectives there are out there on the topic.

Early Happy New Years everyone!

~Josh

Re: [EXTERNAL] [DISCUSS] Taking another(other(other)) stab at performance testing

2023-01-03 Thread Josh McKenzie

> more things in reference test suite... increasing the load until latency 
> hits... operations and measures... test matrix... checking in complete 
> cassandra.yaml... different hardware... different tests...
All great things. For v2+. :)

Perf testing is a deep, deep rabbit hole. What's tripped us up in the past has 
(IMO) predominantly been due to us biting off more than we could chew to 
consensus. I immediately agree at face value with most of the things you've 
asked about in your reply but I think we'll need to build up to that and/or 
include some of that in the "community benchmarks" rather than "reference 
benchmarks" as outlined in the doc.

~Josh

On Tue, Jan 3, 2023, at 12:57 PM, German Eichberger via dev wrote:
> All,
> 
> This is a great idea and I am looking forward to it.
> 
>  Having dedicated consistent hardware is a good way to find regressions in 
> the code but orthogonal to that is "certifying" new hardware to run with 
> Cassandra, e.g. is there a performance regression when running on AMD? ARM64? 
> What about more RAM? faster SSD?
> 
> What has limited us in perf testing in the past was some "representative" 
> benchmark with clear recommendations so I am hoping that this work will 
> produce a reference test suite with at least some hardware recommendation for 
> the machine running the tests to make things more comparable. Additionally, 
> some perf tests keep increasing the load until latency hits a certain 
> threshold and others do some operations and measure how long it took. What 
> types of tests where you aiming for?
> 
> The proposal also doesn't talk much about the test matrix. Will all supported 
> Cassandra versions be tested with the same tests or will there be version 
> specific tests? 
> 
> I understand that we need to account for variances in configuration hardware 
> but I am wondering if we can have more than just the sha. For example the 
> complete cassandra.yaml for a test should be checked in as well - also we 
> shoudl encourage people not to change too much from the reference test. 
> Different hardware, different cassandra.yaml, and different tests will just 
> create numbers which are hard to make sense of.
> 
> Really excited about this - thanks for the great work,
> German
> 
> 
> 
> *From:* Josh McKenzie 
> *Sent:* Friday, December 30, 2022 7:41 AM
> *To:* dev 
> *Subject:* [EXTERNAL] [DISCUSS] Taking another(other(other)) stab at 
> performance testing 
>  
> There was a really interesting presentation from the Lucene folks at 
> ApacheCon about how they're doing perf regression testing. That combined with 
> some recent contributors wanting to get involved on some performance work and 
> not having much direction or clarity on how to get involved led some of us to 
> come together and riff on what we might be able to take away from that 
> presentation and context.
> 
> Lucene presentation: "Learning from 11+ years of Apache Lucene benchmarks": 
> https://docs.google.com/presentation/d/1Tix2g7W5YoSFK8jRNULxOtqGQTdwQH3dpuBf4Kp4ouY/edit#slide=id.p
>  
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fpresentation%2Fd%2F1Tix2g7W5YoSFK8jRNULxOtqGQTdwQH3dpuBf4Kp4ouY%2Fedit%23slide%3Did.p&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7C53bc172f6ff44b7b0f7008daea7c5724%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638080117008027125%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ghifEezYq4XqP%2FDG4lm8ztUD41Ud%2Fzn3%2BC7M%2FDaUmYE%3D&reserved=0>
> 
> Their nightly indexing benchmark site: 
> https://home.apache.org/~mikemccand/lucenebench/indexing.html 
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhome.apache.org%2F~mikemccand%2Flucenebench%2Findexing.html&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7C53bc172f6ff44b7b0f7008daea7c5724%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638080117008027125%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Cs5A2UaMEI6pPC0AUEkNqBsm7LDMiK%2FzF0fENFgIzm4%3D&reserved=0>
> 
> I've checked in with a handful of performance minded contributors in early 
> December and we came up with a first draft, then some others of us met on an 
> adhoc call on the 12/9 (which was recorded; ping on this thread if you'd like 
> that linked - I believe Joey Lynch has that).
> 
> Here's where we landed after the discussions earlier this month (1st page, 
> estimated reading time 5 minutes): 
> https://docs.google.com/document/d/1X5C0dQdl6-oGRr9mXVPwAJTPjkS8lyt2Iz3hWTI4yIk/

Re: Cassandra CI Status 2023-01-07

2023-01-10 Thread Josh McKenzie

> I don't believe it warrants a CEP, speak up if you disagree. 
I agree with this but I'm also biased having been working w/you on this for a 
bit.

My instinct is that most folks on the project want CI that works consistently, 
quickly, and is minimally complex to modify. So the less disruptive and more 
well documented and streamlined we can make interacting with this process the 
better. 

On Mon, Jan 9, 2023, at 2:06 PM, Mick Semb Wever wrote:
>Happy 2023 everyone!
> 
> With only four months in front of us before the first 5.0 release I'm
> hoping we can re-energize our focus on CI and Stable Trunk.
> 
> This post covers the following
> * Recap of CI improvements
> * State of Affair
> * The Butler (Build Lead)
> * Proposal for a Repeatable Containerised CI
> 
> and it calls for the following actions
> ** we need you to sign up for a week's rotation as Build Lead !
> ** please reply in-thread any CI issues I've forgotten,
> ** does CASSANDRA-18137 warrant a CEP?
> 
> 
> *** Recap of CI improvements
> 
> It's been over two years since my last CI Status post, with Adam and
> Josh covering much of it in their general Status emails (which are
> deeply appreciated).  I'm hoping we can continue with both, given
> their importance to a successful 5.0 release and the debt cost we face
> otherwise going from the initial alpha release to the eventual GA.
> 
> 
> We have made good efforts on moving towards a Stable Trunk.
> Special mentions to
> - improving parity between CircleCI and ci-cassandra.a.o (CASSANDRA-17930)
> - introducing Butler and the Build Lead role
> - pre-commit workflow, and automated multiplexing, in CircleCI
> (CASSANDRA-16625)
> - single digit flaky failures per build on 4.0, 4.1 and trunk
> ci-cassandra.a.o !!
> - CircleCI is as stable on Large as XLarge containers (CASSANDRA-18127)
> 
> 
> *** State of Affair
> 
> None of our CI systems are consistently green yet.  Flakies occur in
> both CircleCI and ci-cassandra.a.o  . We had to lower the 4.1 release
> CI criteria to accept three consequential green runs on CircleCI, as
> it would have been unlikely to achieve the same on ci-cassandra.a.o.
> While the flakey rate is lower than 4.0, the higher number of tests we
> run is making it harder to get those green runs.
> 
> Despite the overhead we continue to face with flakies and getting
> major releases out, 4.1 saw fewer releases to GA than 4.0, I think all
> will agree things are improving.  But the challenge in front of us up
> to the 5.0 release is huge with nine CEPs slated to land.  Pre-commit
> and post-commit CI needs investing in if we want our stable trunk
> efforts to continue to improve.
> 
> 
> *** The Butler (Build Lead)
> 
> The introduction of Butler and the Build Lead was a wonderful
> improvement to our CI efforts.  It has brought a lot of hygiene in
> listing out flakies as they happened.  Noted that this has in-turn
> increased the burden in getting our major releases out, but that's to
> be seen as a one-off cost.  This initiative lost traction and
> volunteers mid last year.
> 
> We really need you to take part in the Build Lead weekly rotation.
> 
> I've signed myself up for this week, please jump in and sign yourself
> up for the weeks ahead.  If you are a coach/manager for a team, please
> permit and encourage your engineers to be involved in this activity,
> it shouldn't be more than an hour over the week.  Further instructions
> found at https://cwiki.apache.org/confluence/display/CASSANDRA/Build+Lead
> 
> If it's your first time being a Build Lead the community is here to
> help you, just reach out.  It's also a great way into our community
> for newcomers!
> 
> When it comes to Butler it's UX of history is a bit clumsy.  TIL that
> you can indeed list the full history of failures per test, see 'Full
> History' under a test page*.  Please use this information to help
> create jira tickets on flakies, specifically the versions it applies
> to and the rough rate of failure so far observed.
> 
> *) e.g. 
> https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-trunk/failure/snapshot_test/TestArchiveCommitlog/test_archive_commitlog_point_in_time_ln
> 
> 
> *** Proposal for a Repeatable Containerised CI
> 
> Building on what Josh writes in his "Cassandra project status, Year in
> Review Holiday Edition" post, and many discussions offline with many
> folk, I've written up the ticket epic for creating a reproducible
> containerised ci-cassandra.a.o
> 
> Please read https://issues.apache.org/jira/browse/CASSANDRA-18137
> 
> The tl;dr of it is to create a script that, using the jenkins k8s
> operator, can set up a ci-cassandra.a.o clone in your k8s context.
> 
> The ticket is lengthy, despite being in bullet form.  I don't believe
> it warrants a CEP, speak up if you disagree.  The idea is to provide
> us a turnkey solution: the jenkins k8s operator based script (create
> ci-cassandra.a.o clone, run pipeline, save results, tear down clone);
> to bring our ex

Re: Should we change 4.1 to G1 and offheap_objects ?

2023-01-12 Thread Josh McKenzie

Potential compromise: We change it in trunk, and we NEWS.txt in the minor about 
that change in trunk, why, and recommend users consider qualifying the same 
change on their 4.1 release.

In case it's not clear from me:
+1 to changing on trunk for 5.0 here
-1 to changing on minor release given how little (i.e. nonexistent) perf 
testing we have on the OSS project right now.

On Thu, Jan 12, 2023, at 11:47 AM, Paulo Motta wrote:
> I tend to agree with Aleksey's sentiment. Why do we need to change the 
> default in a minor release if we already provide G1 options for users that 
> want to opt-in?
> 
> On Thu, Jan 12, 2023 at 9:46 AM Aleksey Yeshchenko  wrote:
>> Switching a major default in a minor release is way worse than doing it in a 
>> GA - less notice and visibility, many folks don’t even read minor version 
>> NEWS.txt before upgrading.
>> 
>> Trunk is fine by me though.
>> 
>> > On 12 Jan 2023, at 13:14, Mick Semb Wever  wrote:
>> > 
>> >> Ok, wrt G1 default, this is won't go ahead for 4.1-rc1
>> >> 
>> >> We can revisit it for 4.1.x
>> >> 
>> >> We have a lot of voices here adamantly positive for it, and those of us 
>> >> that have done the performance testing over the years know why. But being 
>> >> called to prove it is totally valid, if you have data to any such tests 
>> >> please add them to the ticket 18027
>> > 
>> > 
>> > Revisiting. Are there any vetoes to making G1 the default (and
>> > updating the G1 settings, see the patch on
>> > https://issues.apache.org/jira/browse/CASSANDRA-18027 ) for 4.1.1 ?
>> > 
>> > IIUC , the summary of this thread till now was: there were no vetoes
>> > to the change in trunk, but there were vetoes to 4.1.0 (because we
>> > were inside the beta to GA window), and there was a desire to have
>> > benchmarking data presented.
>> > 
>> > WRT benchmarking, we have a separate thread for performance testing in
>> > the project.  The ticket admittedly does not do its due diligence on
>> > data presentation and analysis of smaller heaps: a precedent we should
>> > be creating; but instead relies upon experience from many. Are we ok
>> > with this this time around, or shall the patch only be applied to
>> > trunk (where we have no choice w/ jdk17 landing)?
>>

Re: Intra-project dependencies

2023-01-16 Thread Josh McKenzie

>  - permanence from a git SHA no longer exists
With the caveat that I haven't worked w/submodules before and only know about 
them from a cursory search, it looks like git-submodule status would show us 
the sha for submodules and we could have parent projects reference specific 
shas to pull for submodules to build? 
https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203

It seems like our use case is one of the primary ones git submodules are 
designed to address.

On Mon, Jan 16, 2023, at 6:40 AM, Benedict wrote:
> 
> I guess option 5 is what we have today in cep-15, have the build file grab 
> the relevant SHA for the library. This way you maintain a precise SHA for 
> builds and scripts don’t have to be modified.
> 
> I believe this is also possible with git submodules, but I’m happy to bake 
> this into our build file instead with a script.
> 
> > As the library itself no longer has an explicit version, what I presume you 
> > meant by logical version.
> 
> I mean that we don’t want to duplicate work and risk diverging functionality 
> maintaining what is logically (meant to be) the same code. As a developer, 
> managing all of the branches is already a pain. Libraries naturally have a 
> different development cadence to the main project, and tying the development 
> to C* versions is just an unnecessary ongoing burden (and risk) that we can 
> avoid.
> 
> There’s also an additional penalty: we reduce the likelihood of outside 
> contributions to the libraries only. Accord in particular I hope will attract 
> outside interest if it is maintained as a separate library, as it has broad 
> applicability, and is likely of academic interest. Tying it to C* version and 
> more tightly coupling with C* codebase makes that less likely. We might also 
> see folk interested in our utilities, or our simulator framework, if they 
> were to be maintained separately, which could be valuable.
> 
> 
> 
> 
>> On 16 Jan 2023, at 10:49, Mick Semb Wever  wrote:
>> 
>>> I think (4) is the only sensible option. It permits different development 
>>> branches to easily reference different versions of a library and also to 
>>> easily co-develop them - from within the same IDE project, even.
>>> 
>> 
>> 
>> I've only heard horror stories about submodules. The challenges they bring 
>> should be listed and checked.
>> 
>> Some examples
>>  - you can no longer just `git clone …`  (and we clone automatically in a 
>> number of places)
>>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>>  - permanence from a git SHA no longer exists
>>  - our releases get more complicated (our source tarballs are the asf 
>> releases)
>>  - handling patches cover submodules
>>  - switching branches, and using git worktrees, during dv
>> 
>> I see (4) as a valid option, but concerned with the amount of work required 
>> to adapt to it, and whether it will only make it more complicated for the 
>> new contributor to the project. For example the first two points are 
>> addressed by remembering to do `git clone --recurse-submodules …` . And who 
>> would be fixing our build/test/release scripts to accommodate?
>> 
>> Not blockers, just concerns we need to raise and address.
>> 
>>  
>>> We might even be able to avoid additional release votes as a matter of 
>>> course, by compiling the library source as part of the C* release, so that 
>>> they adopt the C* release vote (or else we may periodically release the 
>>> library as we do other releases)
>>> 
>> 
>> 
>> Yes. Today we do a combination of first (3) and then (1). Having to make a 
>> release of these libraries every time a patch (/feature branch) is 
>> completing is a horror story in itself.
>> 
>> 
>>> I might be missing something, does anyone have any other bright ideas for 
>>> approaching this problem? I’m sure there are plenty of opinions out there.
>>> 
>> 
>> 
>> Looking at the problem with these libraries, 
>>  - we don't need releases
>>  - we don't have a clean version/branch parity to in-tree
>>  - codebase parity between branches is important for upgrade tests (shared 
>> classloaders)
>> 
>>  For (2) you mention drift of the "same" version, isn't this only a problem 
>> for dtest-api in the way it requires the "same version" of a codebase for 
>> compatibility when running upgrade tests? As the library itself no longer 
>> has an explicit version, what I presume you meant by logical version.
>> 
>> To begin with, I'm leaning towards (2) because it is a cognitive re-use of 
>> our release branches, and the problems around classpath compatibility can be 
>> solved with tests. I'm sure I'm not seeing the whole picture though…
>>

Re: Merging CEP-15 to trunk

2023-01-16 Thread Josh McKenzie

Did we document this or is it in an email thread somewhere?

I don't see it on the confluence wiki nor does a cursory search of ponymail 
turn it up.

What was it for something flagged experimental?
1. Same tests pass on the branch as to the root it's merging back to
2. 2 committers eyes on (author + reviewer or 2 reviewers, etc)
3. Disabled by default w/flag to enable

So really only the 3rd thing is different right? Probably ought to add an 
informal step 4 which Benedict is doing here which is "hit the dev ML w/a 
DISCUSS thread about the upcoming merge so it's on people's radar and they can 
coordinate".

On Mon, Jan 16, 2023, at 11:08 AM, Benedict wrote:
> My goal isn’t to ask if others believe we have the right to merge, only to 
> invite feedback if there are any specific concerns. Large pieces of work like 
> this cause headaches and concerns for other contributors, and so it’s only 
> polite to provide notice of our intention, since probably many haven’t even 
> noticed the feature branch developing.
> 
> The relevant standard for merging a feature branch, if we want to rehash 
> that, is that it is feature- and bug-neutral by default, ie that a release 
> could be cut afterwards while maintaining our usual quality standards, and 
> that the feature is disabled by default, yes. It is not however 
> feature-complete or production read as a feature; that would prevent any 
> incremental merging of feature development.
> 
> > On 16 Jan 2023, at 15:57, J. D. Jordan  wrote:
> > 
> > I haven’t been following the progress of the feature branch, but I would 
> > think the requirements for merging it into master would be the same as any 
> > other merge.
> > 
> > A subset of those requirements being:
> > Is the code to be merged in releasable quality? Is it disabled by a feature 
> > flag by default if not?
> > Do all the tests pass?
> > Has there been review and +1 by two committer?
> > 
> > If the code in the feature branch meets all of the merging criteria of the 
> > project then I see no reason to keep it in a feature branch for ever.
> > 
> > -Jeremiah
> > 
> > 
> >> On Jan 16, 2023, at 3:21 AM, Benedict  wrote:
> >> 
> >> Hi Everyone, I hope you all had a lovely holiday period. 
> >> 
> >> Those who have been following along will have seen a steady drip of 
> >> progress into the cep-15-accord feature branch over the past year. We 
> >> originally discussed that feature branches would merge periodically into 
> >> trunk, and we are long overdue. With the release of 4.1, it’s time to 
> >> rectify that. 
> >> 
> >> Barring complaints, I hope to merge the current state to trunk within a 
> >> couple of weeks. This remains a work in progress, but will permit users to 
> >> experiment with the alpha version of Accord and provide feedback, as well 
> >> as phase the changes to trunk.
> 
>

Re: Intra-project dependencies

2023-01-17 Thread Josh McKenzie

Is there any reason we couldn't "bundle" a release vote to include both an 
Accord release and ASF C* in one voting round as a combined release? My reading 
of the release process w/the ASF doesn't speak to that (if anything it implies 
this might be a valid approach):

https://www.apache.org/legal/release-policy.html#release-approval

> Every ASF release MUST contain one or more source packages,

On Tue, Jan 17, 2023, at 4:03 PM, Henrik Ingo wrote:
> Hi Derek
> 
> Somewhat of a newcomer myself, it seems the answers to your excellent 
> questions are:
> 
>  * We don't all agree with the premise that Accord will attract substantial 
> outside interest. Even so, my personal opinion is that whether that happens 
> or not, it's not wrong to aspire toward or plan for such a future.
> 
>  * Yes, just using Accord as a library dependency would be the normal thing 
> to do, but that introduces a need to create Accord releases to match 
> Cassandra releases. Since ASF mandates a 3 day voting process to release 
> software artifacts, this creates a lot of bureaucratic overhead, which is why 
> this otherwise sane alternative is nobody's favorite. (Cassandra releases 
> cannot or should not depend on snapshot releases of libraries.
> 
>  * So we are discussing various alternatives that keep Accord separate, while 
> at the same time recording some link about which exact version of Accord was 
> checked out.
> 
> henrik
> 
> On Tue, Jan 17, 2023 at 7:23 PM Derek Chen-Becker  
> wrote:
>> Actually, re-reading the thread, I think I missed the initial point
>> brought up and got lost in the discussion specific to submodules. What
>> is the technical reason for bringing Accord in-tree? While I think
>> submodules are the best way to include source in-tree, I'm not sure
>> this is actually the correct thing to do in this case. Don't we
>> already have mechanisms to deal with snapshot versions of library
>> dependencies in the build? Do we need release votes for snapshots?
>> 
>> Thanks,
>> 
>> Derek
>> 
>> On Tue, Jan 17, 2023 at 9:25 AM Derek Chen-Becker  
>> wrote:
>> >
>> > I'd like to go back to Benedict's initial point: if we have a new
>> > consensus protocol that other projects would potentially be interested
>> > in, then by all means it should be its own project. Let's start with
>> > that as a basis for discussion, because from my reading it seems like
>> > people might be disagreeing with that initial premise.
>> >
>> > If we agree that Accord should be independent, I'm +1 for git
>> > submodules primarily because that's a standard way of doing things and
>> > I don't think we need yet another bespoke solution to a problem that
>> > hundreds, if not thousands of other software projects encounter. I've
>> > worked with lots of projects using submodules and while they're not a
>> > panacea, they've never been a significant problem to work with.
>> >
>> > It's also a little confusing to see people argue about HEAD in
>> > relation to any of this, since that's just an alias to the latest
>> > commit for a given branch. In every project I've worked with that uses
>> > submodules you would never use HEAD, because the submodule itself
>> > already records the *exact* commit associated with the parent.
>> >
>> > Cheers,
>> >
>> > Derek
>> >
>> > On Tue, Jan 17, 2023 at 2:28 AM Benedict  wrote:
>> > >
>> > > The answer to all your questions is “like any other library” - this is a 
>> > > procedural hack to ease development. There are alternative isomorphic 
>> > > hacks, like compiling source jars from Accord and including them in the 
>> > > C* tree, if it helps your mental model.
>> > >
>> > > > you stated that a goal was to avoid maintaining multiple branches.
>> > >
>> > > No, I stated that a goal was to *decouple* development of Accord from 
>> > > C*. I don’t see why you would take that to mean there are no branches of 
>> > > Accord, as that would quite clearly be incompatible with the C* release 
>> > > strategy.
>> > >
>> > >
>> > >
>> > > On 17 Jan 2023, at 07:36, Mick Semb Wever  wrote:
>> > >
>> > > 
>> > >>
>> > >> … extrapolating this experience to multiple C* versions
>> > >
>> > >
>> > > To include forward-merging, bisecting old history, etc etc. that's a 
>> > > leap of faith that I believe deserves the discussion.
>> > >
>> > >> - patches are off submodule SHAs, not the submodule's HEAD,
>> > >>
>> > >>
>> > >> A SHA would point to the HEAD of a given branch, at the time of merge, 
>> > >> just by SHA? I’ve no idea what you imagine here, but this just ensures 
>> > >> that a given SHA of the importing project continues to compile 
>> > >> correctly when it is no longer HEAD. It does not mean there’s no HEAD 
>> > >> that corresponds directly to the SHA of the importing project’s HEAD.
>> > >
>> > >
>> > >
>> > > That wasn't my concern. Rather that you need to know in advance when the 
>> > > SHA is not HEAD. You can't commit off a past SHA. Once you find out (and 
>> > > how does this happen?) that the

Re: Intra-project dependencies

2023-01-17 Thread Josh McKenzie

> Josh, bundling releases gets tricky in that you need to include the library 
> sources, because the cassandra release is essentially being voted on (because 
> it has been built) with non-released dependencies.
Arguably, one shouldn't vote on a release of Accord unless there's something 
that's integrated it and shown it's working. Through that lens it doesn't make 
sense to release those dependencies w/out the parent, nor the parent without 
the dependency.

Not a hill I'm willing to die on but at least out of the gate, seems like a way 
we could streamline the process of cutting releases until someone / something 
external starts exerting influence on Accord.

On Tue, Jan 17, 2023, at 4:39 PM, Mick Semb Wever wrote:
>> Regarding the use of snapshots, this isn’t impossible Henrik - I floated 
>> this as an option. But besides the additional overhead during development, 
>> this does not maintain reproducible builds, as the snapshot can change. 
> 
> 
> You would reference the snapshot dependency by the timestamped snapshot. This 
> makes it a reproducible build.
> 
> We have done this with dtest-api already, and there's already a comment 
> explaining it:
> https://github.com/apache/cassandra/blob/trunk/.build/build-resolver.xml#L59-L60
>  
> 
> It introduces some overhead when bisecting to go from the snapshot's 
> timestamp to the other repo's SHA (this is easily solvable by putting the SHA 
> inside the jarfile).
> 
> I don't see the problem of letting trunk use snapshots during the annual 
> development cycle, if we accept the overhead of cutting all library releases 
> before we cut the first alpha/beta.
> 
> FTR, i'm sitting on the fence between this and submodules. There's many dev 
> tasks we do, and different approaches have different pain points. The amount 
> of dev happening in the library also matters. I also agree with Derek that 
> linking in the source code into in-tree is a significant thing to do, just to 
> avoid the rigamaroles of dependency management.
> 
> Josh, bundling releases gets tricky in that you need to include the library 
> sources, because the cassandra release is essentially being voted on (because 
> it has been built) with non-released dependencies.

Re: Merging CEP-15 to trunk

2023-01-24 Thread Josh McKenzie

Zooming out a bit, I think Accord is the first large body of work we've done 
post introduction of the CEP system with multiple people collaborating on a 
feature branch like this. This discussion seems to have surfaced a few 
sentiments:

1. Some contributors seem to feel that work on a feature branch doesn't have 
the same inherent visibility as work on trunk
2. There's a lack of clarity on our review process when it comes to significant 
(in either time or size) rebases
3. We might be treating Ninja commits a bit differently on a feature branch 
than trunk, which intersects with 1 and 2

My personal opinions are:
I disagree with 1 - it simply takes the added effort of actively following that 
branch and respective JIRAs if you're interested. I think having a feature 
branch in the ASF git repo w/commits and JIRAs tracking that work is perfectly 
fine, and the existing bar (2 committers +1, green tests before merge to trunk) 
when applied to a feature branch is also not just well within the "letter of 
the law" on the project but also logically sufficient to retain our bar of 
quality and stability.

For 2 (reviews required after rebase) I don't think we should over-prescribe 
process on this. If all tests are green pre-rebase, and all tests are green 
post-rebase, and a committer is confident they didn't materially modify the 
functioning of the logical flow or data structures of their code during a 
rebase, I don't see there being any value added by adding another review based 
on those grounds.

If the subtext is actually that some folks feel we need a discussion about 
whether we should have a different bar for review on CEP feature branches (3 
committers? 1+ pmc members? more diversity in reviewers or committers as 
measured by some as yet unspoken metric), perhaps we could have that 
discussion. FWIW I'm against changes there as well; we all wear our Apache Hats 
here, and if the debate is between work like this happening in a feature branch 
affording contributors increased efficiency and locality vs. all that happening 
on trunk and repeatedly colliding with everyone everywhere, feature branches 
are a clear win IMO.

And for 3 - I think we've all broadly agreed we shouldn't ninja commit unless 
it's a comment fix, typo, forgotten git add, or something along those lines. 
For any commit that doesn't qualify it should go through the review process.

And a final note - Ekaterina alluded to something valuable in her email earlier 
in this thread. I think having a "confirm green on all the test suites that are 
green on merge target" bar for large feature branches (rather than strictly the 
"pre-commit subset") before merge makes a lot of sense.

On Tue, Jan 24, 2023, at 1:41 PM, Caleb Rackliffe wrote:
> Just FYI, I'm going to be posting a Jira (which will have some dependencies 
> as well) to track this merge, hopefully some time today...
> 
> On Tue, Jan 24, 2023 at 12:26 PM Ekaterina Dimitrova  
> wrote:
>> I actually see people all the time making a final check before merge as part 
>> of the review. And I personally see it only as a benefit when it comes to 
>> serious things like Accord, as an example. Even as a help for the author who 
>> is overwhelmed with the big amount of work already done - someone to do 
>> quick last round of review. Team work after all.
>> 
>> Easy rebase - those are great news. I guess any merge conflicts that were 
>> solved will be documented and confirmed with reviewers before merge on the 
>> ticket where the final CI push will be posted. I also assumed that even 
>> without direct conflicts a check that there is no contradiction with any 
>> post-September commits is done as part of the rebase. (Just adding here for 
>> completeness)
>> 
>> One thing that I wanted to ask for is when you push to CI, you or whoever 
>> does it, to approve all jobs. Currently we have pre-approved the minimum 
>> required jobs in the pre-commit workflow. I think in this case with a big 
>> work approving all jobs in CircleCI will be of benefit. (I also do it for 
>> bigger bodies of work to be on the safe side) Just pointing in case you use 
>> a script or something to push only the pre-approved ones. Please ping me in 
>> Slack if It’s not clear what I mean, happy to help with that
>> 
>> On Tue, 24 Jan 2023 at 11:52, Benedict  wrote:
>>> 
>>> Perhaps the disconnect is that folk assume a rebase will be difficult and 
>>> have many conflicts? 
>>> 
>>> We have introduced primarily new code with minimal integration points, so I 
>>> decided to test this. I managed to rebase locally in around five minutes; 
>>> mostly imports. This is less work than for a rebase of fairly typical 
>>> ticket of average complexity.
>>> 
>>> Green CI is of course a requirement. There is, however, no good procedural 
>>> or technical justification for a special review of the rebase.
>>> 
>>> Mick is encouraged to take a look at the code before and after rebase, and 
>>> will be afforded plenty of time to

Re: Merging CEP-15 to trunk

2023-01-24 Thread Josh McKenzie

t's the 
backbone of us scaling stably as a project community.


On Tue, Jan 24, 2023, at 4:41 PM, Henrik Ingo wrote:
> Thanks Josh
> 
> Since you mentioned the CEP process, I should also mention one sentiment you 
> omitted, but worth stating explicitly:
> 
> 4. The CEP itself should not be renegotiated at this point. However, the 
> reviewers should rather focus on validating that the implementation matches 
> the CEP. (Or if not, that the deviation is of a good reason and the reviewer 
> agrees to approve it.)
> 
> 
> Although I'm not personally full time working on either producing Cassandra 
> code or reviewing it, I'm going to spend one more email defending your point 
> #1, because I think your proposal would lead to a lot of inefficiencies in 
> the project, and that does happen to be my job to care about: 
> 
>  - Even if you could be right, from some point of view, it's nevertheless the 
> case that those contributors who didn't actively work on Accord, have assumed 
> that they will be invited to review now, when the code is about to land in 
> trunk. Not allowing that to happen would make them feel like they weren't 
> given the opportunity and that the process in Cassandra Project Governance 
> was bypassed. We can agree to work differently in the future, but this is the 
> reality now.
> 
>  - Although those who have collaborated on Accord testify that the code is of 
> the highest quality and ready to be merged to trunk, I don't think that can 
> be expected of every feature branch all the time. In fact, I'm pretty sure 
> the opposite must have been the case also for the Accord branch at some 
> point. After all, if it had been ready to merge to trunk already a year ago, 
> why wasn't it? It's kind of the point of using a feature branch that the code 
> in it is NOT ready to be merged yet. (For example, the existing code might be 
> of high quality, but the work is incomplete, so it shouldn't be merged to 
> trunk.)
> 
>  - Uncertainty: It's completely ok that some feature branches may be 
> abandoned without ever merging to trunk. Requiring the community (anyone 
> potentially interested, anyways) to review such code would obviously be a 
> waste of precious talent.
> 
>  - Time uncertainty: Also - and this is also true for Accord - it is unknown 
> when the merge will happen if it does. In the case of Accord it is now over a 
> year since the CEP was adopted. If I remember correctly an initial target 
> date for some kind of milestone may have been Summer of 2022? Let's say 
> someone in October 2021 was invested in the quality of Cassandra 4.1 release. 
> Should this person now invest in reviewing Accord or not? It's impossible to 
> know. Again, in hindsight we know that the answer is no, but your suggestion 
> again would require the person to review all active feature branches just in 
> case.
> 
> 
> As for 2 and 3, I certainly observe an assumption that contributors have 
> expected to review after a rebase. But I don't see this as a significant 
> topic to argue about. If indeed the rebase is as easy as Benedict advertised, 
> then we should just do the rebase because apparently it can be done faster 
> than it took me to write this email :-) (But yes, conversely, it seems then 
> that the rebase is not a big reason to hold off from reviewing either.)
> 
> henrik
> 
> 
> On Tue, Jan 24, 2023 at 9:29 PM Josh McKenzie  wrote:
>> __
>> Zooming out a bit, I think Accord is the first large body of work we've done 
>> post introduction of the CEP system with multiple people collaborating on a 
>> feature branch like this. This discussion seems to have surfaced a few 
>> sentiments:
>> 
>> 1. Some contributors seem to feel that work on a feature branch doesn't have 
>> the same inherent visibility as work on trunk
>> 2. There's a lack of clarity on our review process when it comes to 
>> significant (in either time or size) rebases
>> 3. We might be treating Ninja commits a bit differently on a feature branch 
>> than trunk, which intersects with 1 and 2
>> 
>> My personal opinions are:
>> I disagree with 1 - it simply takes the added effort of actively following 
>> that branch and respective JIRAs if you're interested. I think having a 
>> feature branch in the ASF git repo w/commits and JIRAs tracking that work is 
>> perfectly fine, and the existing bar (2 committers +1, green tests before 
>> merge to trunk) when applied to a feature branch is also not just well 
>> within the "letter of the law" on the project but also logically sufficient 
>> to retain our bar of quality and stabil

[ANNOUNCE] Evolving governance in the Cassandra Ecosystem

2023-01-26 Thread Josh McKenzie

The Cassandra PMC is pleased to announce that we're evolving our governance
procedures to better foster subprojects under the Cassandra Ecosystem's
umbrella. Astute observers among you may have noticed that the Cassandra
Sidecar is already a subproject of Apache Cassandra as of CEP-1
(https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224) and
Cassandra-14395 (https://issues.apache.org/jira/browse/CASSANDRASC-24), however
up until now we haven't had any structure to accommodate raising committers on
specific subprojects or clarity on the addition or governance of future
subprojects.

Further, with the CEP for the driver donation in motion
(https://docs.google.com/document/d/1e0SsZxjeTabzrMv99pCz9zIkkgWjUd4KL5Yp0GFzNnY/edit#heading=h.xhizycgqxoyo),
the need for a structured and sustainable way to expand the Cassandra
Ecosystem is pressing.

We'll document these changes in the confluence wiki as well as the sidecar as
our first formal subproject after any discussion on this email thread. The new
governance process is as follows:
-

Subproject Governance
1. The Apache Cassandra PMC is responsible for governing the broad Cassandra
Ecosystem.
2. The PMC will vote on inclusion of new interested subprojects using the
existing procedural change vote process documented in the confluence wiki
(Super majority voting: 66% of votes must be in favor to pass. Requires 50%
participation of roll call).
3. New committers for these subprojects will be nominated and raised, both at
inclusion as a subproject and over time. Nominations can be brought to
priv...@cassandra.apache.org. Typically we're looking for a mix of commitment
and contribution to the community and project, be it through code,
documentation, presentations, or other significant engagement with the project.
4. While the commit-bit is ecosystem wide, code modification rights and voting
rights (technical contribution, binding -1, CEP's) are granted per subproject
4a. Individuals are trusted to exercise prudence and only commit or claim
binding votes on approved subprojects. Repeated violations of this social
contract will result in losing committer status.
4b. Members of the PMC have commit and voting rights on all subprojects.
5. For each subproject, the PMC will determine a trio of PMC members that will
be responsible for all PMC specific functions (release votes, driving CVE
response, marketing, branding, policing marks, etc) on the subproject.
-

Curious to see what thoughts we have as a community!

Thanks!

~Josh

Cassandra project status, 2023-01-26

2023-01-26 Thread Josh McKenzie

After a bit of time away, I'm ready to regale you with tales of things you've
already seen on the dev list and JIRA. ;)

Let's start with calling out that registrations for the Cassandra Summit are
open. Patrick did a better job than I ever could summarizing this in his email
poetically titled "Cassandra Summit update for 2023-01-24", which you can find
here: https://lists.apache.org/thread/7roz6z8nvj9cz8o2jwwo1httl85mwjcs. If you
haven't registered yet and are in the area or receptive to travel, you should
seriously consider going - it's always great to be at a conference with other
people brainstorming, lamenting, and celebrating our shared experiences with
this software project.

>From a technical perspective, there's 2 things I want to call out. One: I want
>to draw everyone's attention to is the epic Mick has put together for an
>effort to make ASF CI not only stable, but also repeatable on other
>containerized cloud-native environments:
>https://issues.apache.org/jira/browse/CASSANDRA-18137

There's a lot of context there, but the high level 4 goals are:
1. Reproducible reference ASF CI environment so contributors can clone it.
2. An accepted “test result output” format that will certify a commit
regardless of CI env.
3. Turnaround times as fast as circleci (cloned environment scales to capacity).
4. Intuitive CI implementation accessible to new contributors.

Ultimately, the ideal best-case would be that we could get away from having 2
CI systems, one of which is a paid-for service, and have a "reproducible
runnable CI" deterministic Thing contributors can run to get insight into their
contributions and their stability. Taking this a logical step further, those of
us that are currently spending money on a paid-for CI system could potentially
better spend that money on a shared CI infrastructure that the entire project
could use and benefit from.

Quite a bit of work has fallen out from that epic and is linked from the
ticket; please take 5 minutes to scan through the ticket and some of the
sub-tasks so it's at least on your radar. Stable public CI is something we've
struggled with for _years_, but we've made huge strides in the past year and my
intuition tells me there's a light at the end of this tunnel.

Mick also hit up the dev ML w/a thread on this offering more context:
https://lists.apache.org/thread/fqdvqkjmz6w8c864vw98ymvb1995lcy4

The second thing: The Build Lead role! We need volunteers:
https://cwiki.apache.org/confluence/display/CASSANDRA/Build+Lead. So the TL;DR
on this and why you should consider it: it takes 30-60 minutes *for the entire
week*, it helps us stay on top of our CI infrastructure and test failures,
you'll receive the undying gratitude of many of us on the project, and you also
get some insight into interesting dark corners of the CI infra and testing
system you might otherwise never have known about. You don't need to triage or
attribute failures in the role unless you really want to; getting them
reflected in JIRA is the low hanging fruit here.

[New Contributors Getting Started]
(Unassigned) (Starter Tickets): this is the set of filters you want to pull
from on our project's Kanban board:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2162&quickFilter=2160

We have 26 issues in 4.x (22 really; looks like there's 4 either in progress or
review that need assignee tidied up). 8 issues in 4.0.x, and another 5 floating
around there.

If any of those catch your fancy, join us in the #cassandra-dev channel on
https://the-asf.slack.com (reply to me on this email if you need an invite for
your account), and hit up the @cassandra_mentors alias to reach 13 of us just
waiting with bated breath to help you get oriented. :)

And hey, if any of these _don't_ catch your fancy but you're still interested
in the project and are looking for something interesting to get involved with,
just hop in the slack channel and raise the :batsignal:

[Dev mailing list]
So it's been... a bit. Since I sent the last project status update. Thankfully
it's the holiday season so we didn't accumulate a crushing load of things I
have to summarize for us here:
https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2022-12-19|dto=2023-1-23:

The vote for the Trie-indexed SSTable format passed about a month ago -
congratulations Branimir and team!
https://lists.apache.org/thread/d4sr3jkt4xjn86xrf9h708y6s7lc53v5

I sent out an email discussing taking the smallest concievable baby steps in
formalizing performance testing for the project here:
https://lists.apache.org/thread/kzbv632tm0j99mg10z24wb8f09z0r81z. It seems like
the general consensus is that there's _a lot_ of appetite to engage on this
topic and interesting ideas, and most people aren't all that interested in (nor
disagreeing with) the bare bones v1 of "get a repeatable test with a repeatable
runtime env setup and iterate from there". I think the real challenge h

Re: [ANNOUNCE] Evolving governance in the Cassandra Ecosystem

2023-01-27 Thread Josh McKenzie

I'm told my email came through with a white background and gray font. Apologies 
for that; contortions between editors and trying to get my email client to 
format consistently after copy/paste led to some kind of pathologically bad 
case.

Also: you should be using a dark background on your browser to save your eyes. 
Part of why I didn't realize. :)

(It's on the internet so it must be true: 
https://www.wired.co.uk/article/dark-mode-chrome-android-ios-science)

~Josh

On Thu, Jan 26, 2023, at 6:57 PM, C. Scott Andreas wrote:
> Josh and all PMC members, thank you for your work on this!
> 
> Supportive of the changes and grateful to have scaffolding in place to 
> accommodate current/incoming subprojects.
> 
> – Scott
> 
>> On Jan 26, 2023, at 1:21 PM, Josh McKenzie  wrote:
>> 
>> 
>> The Cassandra PMC is pleased to announce that we're evolving our governance 
>> procedures to better foster subprojects under the Cassandra Ecosystem's 
>> umbrella. Astute observers among you may have noticed that the Cassandra 
>> Sidecar is already a subproject of Apache Cassandra as of CEP-1 
>> (https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224) 
>> and Cassandra-14395 (https://issues.apache.org/jira/browse/CASSANDRASC-24), 
>> however up until now we haven't had any structure to accommodate raising 
>> committers on specific subprojects or clarity on the addition or governance 
>> of future subprojects.
>> 
>> Further, with the CEP for the driver donation in motion 
>> (https://docs.google.com/document/d/1e0SsZxjeTabzrMv99pCz9zIkkgWjUd4KL5Yp0GFzNnY/edit#heading=h.xhizycgqxoyo),
>>  the need for a structured and sustainable way to expand the Cassandra 
>> Ecosystem is pressing.
>> 
>> We'll document these changes in the confluence wiki as well as the sidecar 
>> as our first formal subproject after any discussion on this email thread. 
>> The new governance process is as follows:
>> -
>> 
>> Subproject Governance
>> 1. The Apache Cassandra PMC is responsible for governing the broad Cassandra 
>> Ecosystem.
>> 2. The PMC will vote on inclusion of new interested subprojects using the 
>> existing procedural change vote process documented in the confluence wiki 
>> (Super majority voting: 66% of votes must be in favor to pass. Requires 50% 
>> participation of roll call).
>> 3. New committers for these subprojects will be nominated and raised, both 
>> at inclusion as a subproject and over time. Nominations can be brought to 
>> priv...@cassandra.apache.org. Typically we're looking for a mix of 
>> commitment and contribution to the community and project, be it through 
>> code, documentation, presentations, or other significant engagement with the 
>> project. 
>> 4. While the commit-bit is ecosystem wide, code modification rights and 
>> voting rights (technical contribution, binding -1, CEP's) are granted per 
>> subproject
>>  4a. Individuals are trusted to exercise prudence and only commit or 
>> claim binding votes on approved subprojects. Repeated violations of this 
>> social contract will result in losing committer status.
>>  4b. Members of the PMC have commit and voting rights on all subprojects.
>> 5. For each subproject, the PMC will determine a trio of PMC members that 
>> will be responsible for all PMC specific functions (release votes, driving 
>> CVE response, marketing, branding, policing marks, etc) on the subproject.
>> -
>> 
>> Curious to see what thoughts we have as a community!
>> 
>> Thanks!
>> 
>> ~Josh
>> 
> 
>

Re: Merging CEP-15 to trunk

2023-01-27 Thread Josh McKenzie

>> One thing that I wanted to ask for is when you push to CI, you or whoever 
>>>>> does it, to approve all jobs.
>>>> 
>>>> Thanks Ekaterina, we will be sure to fully qualify the CI result, and I 
>>>> will make sure we also run your flaky test runner on the newly introduced 
>>>> tests.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 24 Jan 2023, at 21:42, Henrik Ingo  wrote:
>>>>> 
>>>>> Thanks Josh
>>>>> 
>>>>> Since you mentioned the CEP process, I should also mention one sentiment 
>>>>> you omitted, but worth stating explicitly:
>>>>> 
>>>>> 4. The CEP itself should not be renegotiated at this point. However, the 
>>>>> reviewers should rather focus on validating that the implementation 
>>>>> matches the CEP. (Or if not, that the deviation is of a good reason and 
>>>>> the reviewer agrees to approve it.)
>>>>> 
>>>>> 
>>>>> Although I'm not personally full time working on either producing 
>>>>> Cassandra code or reviewing it, I'm going to spend one more email 
>>>>> defending your point #1, because I think your proposal would lead to a 
>>>>> lot of inefficiencies in the project, and that does happen to be my job 
>>>>> to care about: 
>>>>> 
>>>>>  - Even if you could be right, from some point of view, it's nevertheless 
>>>>> the case that those contributors who didn't actively work on Accord, have 
>>>>> assumed that they will be invited to review now, when the code is about 
>>>>> to land in trunk. Not allowing that to happen would make them feel like 
>>>>> they weren't given the opportunity and that the process in Cassandra 
>>>>> Project Governance was bypassed. We can agree to work differently in the 
>>>>> future, but this is the reality now.
>>>>> 
>>>>>  - Although those who have collaborated on Accord testify that the code 
>>>>> is of the highest quality and ready to be merged to trunk, I don't think 
>>>>> that can be expected of every feature branch all the time. In fact, I'm 
>>>>> pretty sure the opposite must have been the case also for the Accord 
>>>>> branch at some point. After all, if it had been ready to merge to trunk 
>>>>> already a year ago, why wasn't it? It's kind of the point of using a 
>>>>> feature branch that the code in it is NOT ready to be merged yet. (For 
>>>>> example, the existing code might be of high quality, but the work is 
>>>>> incomplete, so it shouldn't be merged to trunk.)
>>>>> 
>>>>>  - Uncertainty: It's completely ok that some feature branches may be 
>>>>> abandoned without ever merging to trunk. Requiring the community (anyone 
>>>>> potentially interested, anyways) to review such code would obviously be a 
>>>>> waste of precious talent.
>>>>> 
>>>>>  - Time uncertainty: Also - and this is also true for Accord - it is 
>>>>> unknown when the merge will happen if it does. In the case of Accord it 
>>>>> is now over a year since the CEP was adopted. If I remember correctly an 
>>>>> initial target date for some kind of milestone may have been Summer of 
>>>>> 2022? Let's say someone in October 2021 was invested in the quality of 
>>>>> Cassandra 4.1 release. Should this person now invest in reviewing Accord 
>>>>> or not? It's impossible to know. Again, in hindsight we know that the 
>>>>> answer is no, but your suggestion again would require the person to 
>>>>> review all active feature branches just in case.
>>>>> 
>>>>> 
>>>>> As for 2 and 3, I certainly observe an assumption that contributors have 
>>>>> expected to review after a rebase. But I don't see this as a significant 
>>>>> topic to argue about. If indeed the rebase is as easy as Benedict 
>>>>> advertised, then we should just do the rebase because apparently it can 
>>>>> be done faster than it took me to write this email :-) (But yes, 
>>>>> conversely, it seems then that the rebase is not a big reason to hold off 
>>>>> from reviewing either.)
>>>>> 
>>>>> henrik
>>>&g

Re: [DISCUSSION] Framework for Internal Collection Exposure and Monitoring API Alignment

2023-01-28 Thread Josh McKenzie

First off - thanks so much for putting in this effort Maxim! This is excellent 
work.

Some thoughts on the CEP and responses in thread:

> *Considering that JMX is usually not used and disabled in production 
> environments for various performance and security reasons, the operator may 
> not see the same picture from various of Dropwizard's metrics exporters and 
> integrations as Cassandra's JMX metrics provide [1][2].*
I don't think this assertion is true. Cassandra is running in a *lot* of places 
in the world, and JMX has been in this ecosystem for a long time; we need data 
that is basically impossible to get to claim "JMX is usually not used in C* 
environments in prod".

> I also wonder about if we should care about JMX?  I know many wish to migrate 
> (its going to be a very long time) away from JMX, so do we need a wrapper to 
> make JMX and vtables consistent?
If we can move away from a bespoke vtable or JMX based implementation and 
instead have a templatized solution each of these is generated from, that to me 
is the superior option. There's little harm in adding new JMX endpoints (or 
hell, other metrics framework integration?) as a byproduct of adding new vtable 
exposed metrics; we have the same maintenance obligation to them as we have to 
the vtables and if it generates from the same base data, we shouldn't have any 
further maintenance burden due to its presence right?

> we wish to move away from JMX
I do, and you do, and many people do, but I don't believe *all* people on the 
project do. The last time this came up in slack the conclusion was "Josh should 
go draft a CEP to chart out a path to moving off JMX while maintaining 
backwards-compat w/existing JMX metrics for environments that are using them" 
(so I'm excited to see this CEP pop up before I got to it! ;)). Moving to a 
system that gives us a 0-cost way to keep JMX and vtable in sync over time on 
new metrics seems like a nice compromise for folks that have built out 
JMX-based maintenance infra on top of C*. Plus removing the boilerplate toil on 
vtables. win-win.

> If we add a column to the end of the JMX row did we just break users?  
I *think* this is arguably true for a vtable / CQL-based solution as well from 
the "you don't know how people are using your API" perspective. Unless we have 
clear guidelines about discretely selecting the columns you want from a vtable 
and trust users to follow them, if people have brittle greedy parsers pulling 
in all data from vtables we could very well break them as well by adding a new 
column right? Could be wrong here; I haven't written anything that consumes 
vtable metric data and maybe the obvious idiom in the face of that is robust in 
the presence of column addition. /shrug

It's certainly more flexible and simpler to write to w/out detonating compared 
to JMX, but it's still an API we'd be revving.

On Sat, Jan 28, 2023, at 4:24 PM, Ekaterina Dimitrova wrote:
> Overall I have similar thoughts and questions as David.
> 
> I just wanted to add a reminder about this thread from last summer[1]. We 
> already have issues with the alignment of JMX and Settings Virtual Table. I 
> guess this is how Maxim got inspired to suggest this framework proposal which 
> I want to thank him for! (I noticed he assigned CASSANDRA-15254)
> 
> Not to open the Pandora box, but to me the most important thing here is to 
> come into agreement about the future of JMX and what we will do or not as a 
> community. Also, how much time people are able to invest. I guess this will 
> influence any directions to be taken here.
> 
> [1] 
> https://lists.apache.org/thread/8mjcwdyqoobpvw2262bqmskkhs76pp69
> 
> 
> On Thu, 26 Jan 2023 at 12:41, David Capwell  wrote:
>> I took a look and I see the result is an interface that looks like the 
>> vtable interface, that is then used by vtables and JMX?  My first thought is 
>> why not just use the vtable logic?
>> 
>> I also wonder about if we should care about JMX?  I know many wish to 
>> migrate (its going to be a very long time) away from JMX, so do we need a 
>> wrapper to make JMX and vtables consistent?  I am cool with something like 
>> the following
>> 
>>> registerWithJMX(jmxName, query(“SELECT * FROM system_views.streaming”));
>> 
>> So if we want to have a JMX view that matches the table then that’s cool by 
>> me, but one thing that has been brought up in reviews is backwards 
>> compatibility with regard to adding columns… If we add a column to the end 
>> of the JMX row did we just break users?  
>> 
>>> Considering that JMX is usually not used and disabled in production 
>>> environments for various performance and security reasons, the operator may 
>>> not see the same picture from various of Dropwizard's metrics exporters
>> If this is a real problem people are hitting, we can always add the ability 
>> to push metrics to common systems with a pluggable way to add non-standard 
>> solutions.  Dropwizard already support this so would be low hangi

Re: Merging CEP-15 to trunk

2023-01-31 Thread Josh McKenzie

> Don't we follow a principle of always shippable trunk? This was actually a 
> reason why I sidelined the talk about post-merge review, because it implies 
> that the code wasn't "good enough"/perfect when it was first merged.
We follow a principle of "always shippable trunk according to circleci" as of 
cutting 4.1, which has been adhered to in this case. There's every likelihood 
that ASF CI will detonate when this is merged in, but it's also already 
*currently* detonating repeatedly which we're working on so... /shrug

If we block Accord merging on green ASF CI, we'll be in the same boat we were 
in with 4.1 and never ship until we tear certain things down to the studs and 
rebuild them. I don't think it's reasonable to put that burden (ASF CI must be 
green) on *any* ticket right now, much less one that has a potentially high 
integration cost to the entire project if it stagnates over time.

> plenty of experience with situations where even if an engineer, including 
> myself sometimes, wanted to work on fixing some technical debt, the 
> employer's project management processes simply wouldn't prioritize that work 
> and it was left for years
Seems like it's a healthy mix of debt and bikeshedding for us historically. The 
former we don't want to sneak in, the latter, well, that I'm less concerned 
about personally. :)

And I think part of the "two committers +1" bar is about trying to keep our 
debt low.

On Tue, Jan 31, 2023, at 2:45 AM, Benedict wrote:
> 
> Do you mean as part of a blocking review, or just in general?
> 
> I don’t honestly understand the focus on ninja fixes or rebasing, in either 
> context. Why would a project rebase its work until ready to merge? Why would 
> you worry that a team well resourced with experienced contributors (30+yrs 
> between them) don’t know how to work correctly with ninja fixes? These are 
> all minor details, surely?
> 
> I understand your concern around flaky tests, particularly since it seems 
> other work has let some slip through. I don’t believe this is a blocking 
> review concern, as it represents its own blocking requirement. I believe I 
> have responded positively to this query already though?
> 
> 
>> On 31 Jan 2023, at 01:46, Ekaterina Dimitrova  wrote:
>> 
>> 
>> Benedict, Is it an issue to ask whether flaky tests will be addressed as per 
>> our project agreement? Or about ninja fixes and why a branch was not rebased 
>> during development? What did I miss? 
>> 
>> By the way I do not ask these questions to block anyone, even suggested to 
>> help with CI…it’s a pity this part was dismissed… 
>> 
>>  I can see that Caleb is handling the things around the merge ticket with 
>> high attention to the details as always. I can only thank him! 
>> 
>> At this point I see this thread mostly as - this is the first feature branch 
>> since quite some time, people are unsure about certain things - let’s see 
>> how we handle this for the next time to be more efficient and clear.  I 
>> think you already took some actions in your suggestion earlier today with 
>> the document around comments. 
>> 
>> On Mon, 30 Jan 2023 at 20:16, Benedict  wrote:
>>> 
>>> Review should primarily ask: "is this correct?" and "could this be done 
>>> differently (clearer, faster, more correct, etc)?" Blocking reviews 
>>> especially, because why else would a reasonable contributor want to block 
>>> progress?
>>> 
>>> These questions can of course be asked of implementation details for any 
>>> CEP. 
>>> 
>>> I have said before, a proposal to conduct a blocking review of this kind - 
>>> if very late in my view - would be valid, though timeline would have to be 
>>> debated.
>>> 
>>> Reviewers with weaker aspirations have plenty of time available to them - 
>>> two weeks have already passed, and another couple will likely yet (there 
>>> isn't a rush). But it is novel to propose that such optional reviews be 
>>> treated as blocking.
>>> 
>>> 
 On 30 Jan 2023, at 23:04, Henrik Ingo  wrote:
 
 Ooops, I missed copy pasting this reply into my previous email:
 
 On Fri, Jan 27, 2023 at 11:21 PM Benedict  wrote:
> 
>> I'm realizing in retrospect this leaves ambiguity
> 
> Another misreading at least of the *intent* of these clauses, is that 
> they were to ensure that concerns about a *design and approach* are 
> listened to, and addressed to the satisfaction of interested parties. It 
> was essentially codifying the project’s long term etiquette around pieces 
> of work with either competing proposals or fundamental concerns. It has 
> historically helped to avoid escalation to vetoes, or reverting code 
> after commit. 
> 
> It wasn’t intended that *any* reason might be invoked, as seems to have 
> been inferred, and perhaps this should be clarified, though I had hoped 
> it would be captured by the word “reasonable". Minor concerns that 
> haven’t been caught by the initial review process can a

Re: Merging CEP-15 to trunk

2023-01-31 Thread Josh McKenzie

> Don't we follow a principle of always shippable trunk? 
Also, I feel compelled to call out: we don't have perf regression testing we're 
doing publicly on the project, don't have code coverage analysis, and don't 
have long-running harry / nemeses-based soak testing to suss out subtle timing 
issues at this time. So calling what we're doing on the ASF side "always 
shippable trunk" is leaving a lot of lift and toil up to folks working on these 
tags on their own infra that goes into each release.

Which vexes me.

On Tue, Jan 31, 2023, at 11:20 AM, Josh McKenzie wrote:
>> Don't we follow a principle of always shippable trunk? This was actually a 
>> reason why I sidelined the talk about post-merge review, because it implies 
>> that the code wasn't "good enough"/perfect when it was first merged.
> We follow a principle of "always shippable trunk according to circleci" as of 
> cutting 4.1, which has been adhered to in this case. There's every likelihood 
> that ASF CI will detonate when this is merged in, but it's also already 
> *currently* detonating repeatedly which we're working on so... /shrug
> 
> If we block Accord merging on green ASF CI, we'll be in the same boat we were 
> in with 4.1 and never ship until we tear certain things down to the studs and 
> rebuild them. I don't think it's reasonable to put that burden (ASF CI must 
> be green) on *any* ticket right now, much less one that has a potentially 
> high integration cost to the entire project if it stagnates over time.
> 
>> plenty of experience with situations where even if an engineer, including 
>> myself sometimes, wanted to work on fixing some technical debt, the 
>> employer's project management processes simply wouldn't prioritize that work 
>> and it was left for years
> Seems like it's a healthy mix of debt and bikeshedding for us historically. 
> The former we don't want to sneak in, the latter, well, that I'm less 
> concerned about personally. :)
> 
> And I think part of the "two committers +1" bar is about trying to keep our 
> debt low.
> 
> On Tue, Jan 31, 2023, at 2:45 AM, Benedict wrote:
>> 
>> Do you mean as part of a blocking review, or just in general?
>> 
>> I don’t honestly understand the focus on ninja fixes or rebasing, in either 
>> context. Why would a project rebase its work until ready to merge? Why would 
>> you worry that a team well resourced with experienced contributors (30+yrs 
>> between them) don’t know how to work correctly with ninja fixes? These are 
>> all minor details, surely?
>> 
>> I understand your concern around flaky tests, particularly since it seems 
>> other work has let some slip through. I don’t believe this is a blocking 
>> review concern, as it represents its own blocking requirement. I believe I 
>> have responded positively to this query already though?
>> 
>> 
>>> On 31 Jan 2023, at 01:46, Ekaterina Dimitrova  wrote:
>>> 
>>> 
>>> Benedict, Is it an issue to ask whether flaky tests will be addressed as 
>>> per our project agreement? Or about ninja fixes and why a branch was not 
>>> rebased during development? What did I miss? 
>>> 
>>> By the way I do not ask these questions to block anyone, even suggested to 
>>> help with CI…it’s a pity this part was dismissed… 
>>> 
>>>  I can see that Caleb is handling the things around the merge ticket with 
>>> high attention to the details as always. I can only thank him! 
>>> 
>>> At this point I see this thread mostly as - this is the first feature 
>>> branch since quite some time, people are unsure about certain things - 
>>> let’s see how we handle this for the next time to be more efficient and 
>>> clear.  I think you already took some actions in your suggestion earlier 
>>> today with the document around comments. 
>>> 
>>> On Mon, 30 Jan 2023 at 20:16, Benedict  wrote:
>>>> 
>>>> Review should primarily ask: "is this correct?" and "could this be done 
>>>> differently (clearer, faster, more correct, etc)?" Blocking reviews 
>>>> especially, because why else would a reasonable contributor want to block 
>>>> progress?
>>>> 
>>>> These questions can of course be asked of implementation details for any 
>>>> CEP. 
>>>> 
>>>> I have said before, a proposal to conduct a blocking review of this kind - 
>>>> if very late in my view - would be valid, though timeline would have to be

Re: [DISCUSS] API modifications and when to raise a thread on the dev ML

2023-02-02 Thread Josh McKenzie

Things I think of as API's:
1. nodetool output (user tooling couples with this)
2. CQL syntax
3. JMX
4. VTables
5. Potential future refactored and deliberately exposed API interfaces 
(SSTables, custom indexes, etc)

API's persist; I don't think lazy consensus to favor velocity is the right 
tradeoff for them given their weight. They're effectively one-way doors.

I vote B.

On Thu, Feb 2, 2023, at 9:01 AM, Ekaterina Dimitrova wrote:
> “ Only that it locks out of the conversation anyone without a Jira login”
> Very valid point I forgot about - since recently people need invitation in 
> order to create account…
> Then I would say C until we clarify the scope. Thanks
> 
> On Thu, 2 Feb 2023 at 8:54, Benedict  wrote:
>> 
>> I think lazy consensus is fine for all of these things. If a DISCUSS thread 
>> is crickets, or just positive responses, then definitely it can proceed 
>> without further ceremony.
>> 
>> I think “with heads-up to the mailing list” is very close to B? Only that it 
>> locks out of the conversation anyone without a Jira login.
>> 
>> 
>>> On 2 Feb 2023, at 13:46, Ekaterina Dimitrova  wrote:
>>> 
>>> While I do agree with you, I am thinking that if we include many things 
>>> that we would expect lazy consensus on I would probably have different 
>>> preference. 
>>> 
>>> I definitely don’t mean to stall this though so in that case:
>>> I’d say combination of A+C (jira with heads up on the ML if someone is 
>>> interested into the jira) and regular log on API changes separate from 
>>> CHANGES.txt or we can just add labels to entries in CHANGES.txt as some 
>>> other projects. (I guess this is a detail we can agree on later on, how to 
>>> implement it, if we decide to move into that direction)
>>> 
>>> On Thu, 2 Feb 2023 at 8:12, Benedict  wrote:
 
 I think it’s fine to separate the systems from the policy? We are agreeing 
 a policy for systems we want to make guarantees about to our users 
 (regarding maintenance and compatibility)
 
 For me, this is (at minimum) CQL and virtual tables. But I don’t think the 
 policy differs based on the contents of the list, and given how long this 
 topic stalled for. Given the primary point of contention seems to be the 
 *policy* and not the list, I think it’s time to express our opinions 
 numerically so we can move the conversation forwards.
 
 This isn’t binding, it just reifies the community sentiment.
 
 
> On 2 Feb 2023, at 13:02, Ekaterina Dimitrova  
> wrote:
> 
> “ So we can close out this discussion, let’s assume we’re only discussing 
> any interfaces we want to make promises for. We can have a separate 
> discussion about which those are if there is any disagreement.”
> May I suggest we first clear this topic and then move to voting? I would 
> say I see confusion, not that much of a disagreement. Should we raise a 
> discussion for every feature flag for example? In another thread virtual 
> tables were brought in. I saw also other examples where people expressed 
> uncertainty. I personally feel I’ll be able to take a more informed 
> decision and vote if I first see this clarified. 
> 
> I will be happy to put down a document and bring it for discussion if 
> people agree with that
> 
> 
> 
> On Thu, 2 Feb 2023 at 7:33, Aleksey Yeshchenko  wrote:
>> Bringing light to new proposed APIs no less important - if not more, for 
>> reasons already mentioned in this thread. For it’s not easy to change 
>> them later.
>> 
>> Voting B.
>> 
>> 
>>> On 2 Feb 2023, at 10:15, Andrés de la Peña  wrote:
>>> 
>>> If it's a breaking change, like removing a method or property, I think 
>>> we would need a DISCUSS API thread prior to making changes. However, if 
>>> the change is an addition, like adding a new yaml property or a JMX 
>>> method, I think JIRA suffices.

Re: [DISCUSS] API modifications and when to raise a thread on the dev ML

2023-02-02 Thread Josh McKenzie

> if a patch adds, say, a single JMX method to expose the
> metric, having an ML thread for it may seem redundant
My fear is someone missing that there's an idiom or pattern within the codebase 
for metrics they miss then we end up with inconsistent metric names / groups 
exposed to users.

Especially if we move forward with a "vtable + jmx metric" scaffolding we add 
things on. I think it's worth the tax of a quick email to the dev list to 
prevent our users from getting further buried as API's continue to evolve and 
gain complexity.

On Thu, Feb 2, 2023, at 11:03 AM, Maxim Muzafarov wrote:
> Hello everyone,
> 
> 
> I would say that having a CEP and a well-defined set of major public
> API changes is a must, and the corresponding discussion of CEP is also
> well-defined here [1]. This ensures that we do not miss any important
> changes. Everything related to the public API is also described in the
> CEP template [2].
> 
> However, if a patch adds, say, a single JMX method to expose the
> metric, having an ML thread for it may seem redundant, and may shift
> the focus away from the really important issues on the dev list. In
> this case, I think we can add to the JIRA issue the `public API
> changed` label and mention all these issues on a weekly or monthly
> basis in a Cassandra status update e-mail. This will help keep the
> balance between important changes and routine.
> 
> 
> [1] 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652201#CassandraEnhancementProposals(CEP)-TheProcess
> [2] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-Template#CEPTemplate-NeworChangedPublicInterfaces
> 
> On Thu, 2 Feb 2023 at 16:56, Jeremiah D Jordan
>  wrote:
> >
> > I think we need a DISCUSS thread at minimum for API changes.  And for 
> > anything changing CQL syntax, I think a CEP is warranted.  Even if it is 
> > only a small change to the syntax.
> >
> > On Feb 2, 2023, at 9:32 AM, Patrick McFadin  wrote:
> >
> > API changes are near and dear to my world. The scope of changes could be 
> > minor or major, so I think B is the right way forward.
> >
> > Not to throw off the momentum, but could this even warrant a separate CEP 
> > in some cases? For example, CEP-15 is a huge change, but the CQL syntax 
> > will continuously evolve with more use. Being judicious in those changes is 
> > good for end users. It's also a good reference to point back to after the 
> > fact.
> >
> > Patrick
> >
> > On Thu, Feb 2, 2023 at 6:01 AM Ekaterina Dimitrova  
> > wrote:
> >>
> >> “ Only that it locks out of the conversation anyone without a Jira login”
> >> Very valid point I forgot about - since recently people need invitation in 
> >> order to create account…
> >> Then I would say C until we clarify the scope. Thanks
> >>
> >> On Thu, 2 Feb 2023 at 8:54, Benedict  wrote:
> >>>
> >>> I think lazy consensus is fine for all of these things. If a DISCUSS 
> >>> thread is crickets, or just positive responses, then definitely it can 
> >>> proceed without further ceremony.
> >>>
> >>> I think “with heads-up to the mailing list” is very close to B? Only that 
> >>> it locks out of the conversation anyone without a Jira login.
> >>>
> >>> On 2 Feb 2023, at 13:46, Ekaterina Dimitrova  
> >>> wrote:
> >>>
> >>> 
> >>>
> >>> While I do agree with you, I am thinking that if we include many things 
> >>> that we would expect lazy consensus on I would probably have different 
> >>> preference.
> >>>
> >>> I definitely don’t mean to stall this though so in that case:
> >>> I’d say combination of A+C (jira with heads up on the ML if someone is 
> >>> interested into the jira) and regular log on API changes separate from 
> >>> CHANGES.txt or we can just add labels to entries in CHANGES.txt as some 
> >>> other projects. (I guess this is a detail we can agree on later on, how 
> >>> to implement it, if we decide to move into that direction)
> >>>
> >>> On Thu, 2 Feb 2023 at 8:12, Benedict  wrote:
> 
>  I think it’s fine to separate the systems from the policy? We are 
>  agreeing a policy for systems we want to make guarantees about to our 
>  users (regarding maintenance and compatibility)
> 
>  For me, this is (at minimum) CQL and virtual tables. But I don’t think 
>  the policy differs based on the contents of the list, and given how long 
>  this topic stalled for. Given the primary point of contention seems to 
>  be the *policy* and not the list, I think it’s time to express our 
>  opinions numerically so we can move the conversation forwards.
> 
>  This isn’t binding, it just reifies the community sentiment.
> 
>  On 2 Feb 2023, at 13:02, Ekaterina Dimitrova  
>  wrote:
> 
>  
> 
>  “ So we can close out this discussion, let’s assume we’re only 
>  discussing any interfaces we want to make promises for. We can have a 
>  separate discussion about which those are if there is any disagreement.”
>  May I suggest we f

Re: Welcome Patrick McFadin as Cassandra Committer

2023-02-02 Thread Josh McKenzie

Congrats Patrick! Well deserved.

On Thu, Feb 2, 2023, at 5:25 PM, Molly Monroy wrote:
> Congrats, Patrick... much deserved!
> 
> On Thu, Feb 2, 2023 at 1:59 PM Derek Chen-Becker  
> wrote:
>> Congrats!
>> 
>> On Thu, Feb 2, 2023 at 10:58 AM Benjamin Lerer  wrote:
>>> The PMC members are pleased to announce that Patrick McFadin has accepted
>>> the invitation to become committer today.
>>> 
>>> Thanks a lot, Patrick, for everything you have done for this project and 
>>> its community through the years.
>>> 
>>> Congratulations and welcome!
>>> 
>>> The Apache Cassandra PMC members
>> 
>> 
>> --
>> +---+
>> | Derek Chen-Becker |
>> | GPG Key available at https://keybase.io/dchenbecker and   |
>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>> +---+
>>

[DISCUSS] Merging incremental feature work

2023-02-03 Thread Josh McKenzie

The topic of how we handle merging large complex bodies of work came up 
recently with the CEP-15 merge and JDK17, and we've faced this question in the 
past as well (CASSANDRA-8099 comes to mind).

The times we've done large bodies of work separately from trunk and then merged 
them in have their own benefits and costs, and the examples I can think of 
where we've merged in work to trunk incrementally with something flagged 
experimental have markedly different cost/benefits. Further, the two approaches 
have shaped the *way* we approached work quite differently with how we 
architected and tested things.

My current thinking: I'd like to propose we all agree to move to merge work 
into trunk incrementally if it's either:
1) New JDK support
2) An approved CEP

The bar for merging anything into trunk should remain:
1) 2 +1's from committers
2) Green CI (presently circle or ASF, in the future ideally ASF or an ASF 
analog env)

I don't know if this is a generally held opinion and we just haven't discussed 
it and switched our general behavior yet, or if this is more controversial, so 
I won't burden this email with enumerating pros and cons of the two approaches 
until I get a gauge of the community's temperature.

So - what do we think?

Re: [DISCUSS] Merging incremental feature work

2023-02-03 Thread Josh McKenzie

Anything we either a) have to do (JDK support) or b) have all agreed up front 
we think we should do (CEP). I.e. things with a lower risk of being left dead 
in the codebase partially implemented.

I don't think it's a coincidence we've set up other processes to help de-risk 
and streamline the consensus building portion of this work given our history 
with it. We haven't taken steps to optimize the tactical execution of it yet.

On Fri, Feb 3, 2023, at 7:09 AM, Brandon Williams wrote:
> On Fri, Feb 3, 2023 at 6:06 AM Josh McKenzie  wrote:
> >
> > My current thinking: I'd like to propose we all agree to move to merge work 
> > into trunk incrementally if it's either:
> > 1) New JDK support
> > 2) An approved CEP
> 
> So basically everything?  I'm not sure what large complex bodies of
> work would be left.
>

Re: [DISCUSS] Merging incremental feature work

2023-02-03 Thread Josh McKenzie

The deeply coupled nature of some areas of our codebase does have some 
constraints it imposes on us here to your point. Without sensible internal APIs 
a lot of this type of work expands into two phases, one to refactor out said 
APIs and the other to introduce new functionality. 

It probably depends on what systems we’re extending or replacing and how 
tightly coupled the original design is as to which approach is feasible given 
resourcing.

On Fri, Feb 3, 2023, at 7:48 AM, Sam Tunnicliffe wrote:
> This is quite timely as we're just gearing up to begin pushing the work we've 
> been doing on CEP-21 into the public domain. 
> 
> This CEP is a slightly different from others that have gone before in that it 
> touches almost every area of the system. This presents a few implementation 
> challenges, most obviously around feature flagging and incremental merging. 
> When we began prototyping and working on the design presented in CEP-21 it 
> quickly became apparent that doing things incrementally would push an already 
> large changeset into gargantuan proportions. Keeping changes isolated and 
> abstracted would itself have required a vast amount of refactoring and rework 
> of existing code and tests. 
> 
> I'll go into more detail in a CEP-21 specific mail shortly, but the plan we 
> were hoping to follow was to work in a long lived topic branch, with JIRAs, 
> sensible commit history and CI, and defer merging to trunk until the work as 
> a whole is useable and meets all the existing bars for quality, review and 
> the like. 
> 
> 
>> On 3 Feb 2023, at 12:43, Josh McKenzie  wrote:
>> 
>> Anything we either a) have to do (JDK support) or b) have all agreed up 
>> front we think we should do (CEP). I.e. things with a lower risk of being 
>> left dead in the codebase partially implemented.
>> 
>> I don't think it's a coincidence we've set up other processes to help 
>> de-risk and streamline the consensus building portion of this work given our 
>> history with it. We haven't taken steps to optimize the tactical execution 
>> of it yet.
>> 
>> On Fri, Feb 3, 2023, at 7:09 AM, Brandon Williams wrote:
>>> On Fri, Feb 3, 2023 at 6:06 AM Josh McKenzie  wrote:
>>> >
>>> > My current thinking: I'd like to propose we all agree to move to merge 
>>> > work into trunk incrementally if it's either:
>>> > 1) New JDK support
>>> > 2) An approved CEP
>>> 
>>> So basically everything?  I'm not sure what large complex bodies of
>>> work would be left.

Re: [DISCUSS] Merging incremental feature work

2023-02-03 Thread Josh McKenzie

> worse performance than Java 11, or some other blocking problem is 
> encountered. But in practice we probably estimate that this risk is small. In 
> such a case JDK17 support could indeed be developed with small patches 
> directly against trunk, but this would be an exception to the rule!
> 
> 2) To take an example of an approved CEP, surprisingly enough, the humongous 
> Accord patch is actually very clean and self contained, and would be easy to 
> remove (or disable with a feature flag, which has been done). So it could 
> have been developed against trunk. (But I'm not sure that was obvious in the 
> beginning of development?)
> 
> 3) The work on SSTable tries and Memtable tries was even explicitly split 
> into separate CEPs for the API refactor and the new functionality.
> 
> 
> Perhaps Linus Torvalds said the above more succintly than me: 
> 
>> *So the name of the game is to _avoid_ decisions, at least the big and 
>> painful ones. Making small and non-consequential decisions is fine, and 
>> makes you look like you know what you're doing, so what a kernel manager 
>> needs to do is to turn the big and painful ones into small things where 
>> nobody really cares.
>> 
>> It helps to realize that the key difference between a big decision and a 
>> small one is whether you can fix your decision afterwards. *Any decision can 
>> be made small by just always making sure that if you were wrong (and you 
>> _will_ be wrong), you can always undo the damage later by backtracking.* 
>> Suddenly, you get to be doubly managerial for making _two_ inconsequential 
>> decisions - the wrong one _and_ the right one.* 
> 
>> https://www.openlife.cc/onlinebook/epilogue-linux-kernel-management-style-linus-torvalds
> 
> (I particularly like the last sentence!)
> 
> henrik
> 
> 
> On Fri, Feb 3, 2023 at 2:06 PM Josh McKenzie  wrote:
>> __
>> The topic of how we handle merging large complex bodies of work came up 
>> recently with the CEP-15 merge and JDK17, and we've faced this question in 
>> the past as well (CASSANDRA-8099 comes to mind).
>> 
>> The times we've done large bodies of work separately from trunk and then 
>> merged them in have their own benefits and costs, and the examples I can 
>> think of where we've merged in work to trunk incrementally with something 
>> flagged experimental have markedly different cost/benefits. Further, the two 
>> approaches have shaped the *way* we approached work quite differently with 
>> how we architected and tested things.
>> 
>> My current thinking: I'd like to propose we all agree to move to merge work 
>> into trunk incrementally if it's either:
>> 1) New JDK support
>> 2) An approved CEP
>> 
>> The bar for merging anything into trunk should remain:
>> 1) 2 +1's from committers
>> 2) Green CI (presently circle or ASF, in the future ideally ASF or an ASF 
>> analog env)
>> 
>> I don't know if this is a generally held opinion and we just haven't 
>> discussed it and switched our general behavior yet, or if this is more 
>> controversial, so I won't burden this email with enumerating pros and cons 
>> of the two approaches until I get a gauge of the community's temperature.
>> 
>> So - what do we think?
> 
> 
> --
> 
> 
> 
> 
> *Henrik Ingo*
> 
> *c*. +358 40 569 7354 
> 
> *w*. _www.datastax.com_
> 
> __ <https://www.facebook.com/datastax>  __ <https://twitter.com/datastax>  __ 
> <https://www.linkedin.com/company/datastax/>  __ 
> <https://github.com/datastax/>
>

Re: Implicitly enabling ALLOW FILTERING on virtual tables

2023-02-03 Thread Josh McKenzie

> they would start to set ALLOW FILTERING here and there in order to not think 
> twice about their data model so they can just call it a day.
Setting this on a per-table basis or having users set this on specific queries 
that hit tables and forgetting they set it are 6 of one and half-a-dozen of 
another.

I like the table property idea personally. That communicates an intent about 
the data model and expectation of the size and usage of data in the modeling of 
the schema that embeds some context and intent there's currently no mechanism 
to communicate.

On Fri, Feb 3, 2023, at 5:00 PM, Miklosovic, Stefan wrote:
> Yes, there would be discrepancy. I do not like that either. If it was only 
> about "normal tables vs virtual tables", I could live with that. But the fact 
> that there are going to be differences among vtables themselves, that starts 
> to be a little bit messy. Then we would need to let operators know what 
> tables are always allowed to be filtered on and which do not and that just 
> complicates it. Putting that information to comment so it is visible in 
> DECSCRIBE is nice idea.
> 
> That flag we talk about ... that flag would be used purely internally, it 
> would not be in schema to be gossiped.
> 
> Also, I am starting to like the suggestion to have something like ALLOW 
> FILTERING ON in CQLSH so it would be turned on whole CQL session. That leaves 
> tables as they are and it should not be a big deal for operators to set. We 
> would have to make sure to add "ALLOW FILTERING" clause to every SELECT 
> statement (to virtual tables only?) a user submits. I am not sure if this is 
> doable yet though.
> 
> 
> From: David Capwell 
> Sent: Friday, February 3, 2023 22:42
> To: dev
> Cc: Maxim Muzafarov
> Subject: Re: Implicitly enabling ALLOW FILTERING on virtual tables
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> I don't think the assumption that "virtual tables will always be small and 
> always fit in memory" is a safe one.
> 
> Agree, there is a repair ticket to have the coordinating node do network 
> queries to peers to resolve the table (rather than operator querying 
> everything, allow the coordinator node to do it for you)… so this assumption 
> may not be true down the line.
> 
> I could be open to a table property that says ALLOW FILTERING on by default 
> or not… then we can pick and choose vtables (or have vtables opt-out)…. I 
> kinda like like the lack of consistency with this approach though
> 
> On Feb 3, 2023, at 11:24 AM, C. Scott Andreas  wrote:
> 
> There are some ideas that development community members have kicked around 
> that may falsify the assumption that "virtual tables are tiny and will fit in 
> memory."
> 
> One example is CASSANDRA-14629: Abstract Virtual Table for very large result 
> sets
> https://issues.apache.org/jira/browse/CASSANDRA-14629
> 
> Chris's proposal here is to enable query results from virtual tables to be 
> streamed to the client rather than being fully materialized. There are some 
> neat possibilities suggested in this ticket, such as debug functionality to 
> dump the contents of a raw SSTable via the CQL interface, or the contents of 
> the database's internal caches. One could also imagine a feature like this 
> providing functionality similar to a foreign data wrapper in other databases.
> 
> I don't think the assumption that "virtual tables will always be small and 
> always fit in memory" is a safe one.
> 
> I don't think we should implicitly add "ALLOW FILTERING" to all queries 
> against virtual tables because of this, in addition to concern with departing 
> from standard CQL semantics for a type of tables deemed special.
> 
> – Scott
> 
> On Feb 3, 2023, at 6:52 AM, Maxim Muzafarov  wrote:
> 
> 
> Hello Stefan,
> 
> Regarding the decision to implicitly enable ALLOW FILTERING for
> virtual tables, which also makes sense to me, it may be necessary to
> consider changing the clustering columns in the virtual table metadata
> to regular columns as well. The reasons are the same as mentioned
> earlier: the virtual tables hold their data in memory, thus we do not
> benefit from the advantages of ordered data (e.g. the ClientsTable and
> its ClusteringColumn(PORT)).
> 
> Changing the clustering column to a regular column may simplify the
> virtual table data model, but I'm afraid it may affect users who rely
> on the table metadata.
> 
> 
> 
> On Fri, 3 Feb 2023 at 12:32, Andrés de la Peña  wrote:
> 
> I think removing the need for ALLOW FILTERING on virtual tables makes sense 
> and would be quite useful for operators.
> 
> That guard exists for performance issues that shouldn't occur on virtual 
> tables. We also have a flag in case some future virtual table implementation 
> has limitations regarding filtering, although it seems it's not the case with 
> any of

Re: [VOTE] CEP-21 Transactional Cluster Metadata

2023-02-06 Thread Josh McKenzie

+1

On Mon, Feb 6, 2023, at 2:53 PM, Dinesh Joshi wrote:
> +1
> 
>> 
>> On Feb 6, 2023, at 8:16 AM, Sam Tunnicliffe  wrote:
>> 
>> Hi everyone,
>> 
>> I would like to start a vote on this CEP.
>> 
>> Proposal:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
>> 
>> Discussion:
>> https://lists.apache.org/thread/h25skwkbdztz9hj2pxtgh39rnjfzckk7
>> 
>> The vote will be open for 72 hours.
>> A vote passes if there are at least three binding +1s and no binding vetoes.
>> 
>> Thanks,
>> Sam

Re: [VOTE] Release Apache Cassandra 4.0.8

2023-02-13 Thread Josh McKenzie

+1

On Fri, Feb 10, 2023, at 3:13 AM, Tommy Stendahl via dev wrote:
> +1 (nb)
> 
> -Original Message-
> *From*: Berenguer Blasi  >
> *Reply-To*: dev@cassandra.apache.org
> *To*: dev@cassandra.apache.org
> *Subject*: Re: [VOTE] Release Apache Cassandra 4.0.8
> *Date*: Thu, 09 Feb 2023 13:49:01 +0100
> 
> +1
> 
> On 9/2/23 12:50, Brandon Williams wrote:
>> +1
>> 
>> Kind Regards,
>> Brandon
>> 
>> On Thu, Feb 9, 2023 at 4:56 AM Miklosovic, Stefan
>> <
>> stefan.mikloso...@netapp.com
>> > wrote:
>>> Proposing the test build of Cassandra 4.0.8 for release.
>>> 
>>> sha1: 32c56df067b72da8593c1ddaaf143fe8668459dd
>>> Git: 
>>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0.8-tentative
>>> 
>>> Maven Artifacts: 
>>> https://repository.apache.org/content/repositories/orgapachecassandra-1283/org/apache/cassandra/cassandra-all/4.0.8/
>>> 
>>> 
>>> The Source and Build Artifacts, and the Debian and RPM packages and 
>>> repositories, are available here: 
>>> https://dist.apache.org/repos/dist/dev/cassandra/4.0.8/
>>> 
>>> 
>>> The vote will be open for 72 hours (longer if needed). Everyone who has 
>>> tested the build is invited to vote. Votes by PMC members are considered 
>>> binding. A vote passes if there are at least three binding +1s and no -1's.
>>> 
>>> [1]: CHANGES.txt: 
>>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0.8-tentative
>>> 
>>> [2]: NEWS.txt: 
>>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0.8-tentative
>>>

Re: Downgradability

2023-02-22 Thread Josh McKenzie

> why not implement backwards write compatibility?
+1 to this from a philosophical perspective. Keeping prior releases completely 
in the dark about new release sstable formats is a clean approach, and we 
should already have the code around to ser/deser the prior version's data on 
the next version.

On Wed, Feb 22, 2023, at 10:07 AM, Jeff Jirsa wrote:
> When people are serious about this requirement, they’ll build the downgrade 
> equivalents of the upgrade tests and run them automatically, often, so people 
> understand what the real gap is and when something new makes it break 
> 
> Until those tests exist, I think collectively we should all stop pretending 
> like this is dogma. Best effort is best effort. 
> 
> 
> 
>> On Feb 22, 2023, at 6:57 AM, Branimir Lambov  
>> wrote:
>> 
>> > 1. Major SSTable changes should begin with forward-compatibility in a 
>> > prior release.
>> 
>> This requires "feature" changes, i.e. new non-trivial code for previous 
>> patch releases. It also entails porting over any further format modification.
>> 
>> Instead of this, in combination with your second point, why not implement 
>> backwards write compatibility? The opt-in is then clearer to define (i.e. 
>> upgrades start with e.g. a "4.1-compatible" settings set that includes file 
>> format compatibility and disabling of new features, new nodes start with 
>> "current" settings set). When the upgrade completes and the user is happy 
>> with the result, the settings set can be replaced.
>> 
>> Doesn't this achieve what you want (and we all agree is a worthy goal) with 
>> much less effort for everyone? Supporting backwards-compatible writing is 
>> trivial, and we even have a proof-of-concept in the stats metadata 
>> serializer. It also simplifies by a serious margin the amount of work and 
>> thinking one has to do when a format improvement is implemented -- e.g. the 
>> TTL patch can just address this in exactly the way the problem was addressed 
>> in earlier versions of the format, by capping to 2038, without any need to 
>> specify, obey or test any configuration flags.
>> 
>> >> It’s a commitment, and it requires every contributor to consider it as 
>> >> part of work they produce.
>> 
>> > But it shouldn't be a burden. Ability to downgrade is a testable problem, 
>> > so I see this work as a function of the suite of tests the project is 
>> > willing to agree on supporting.
>> 
>> I fully agree with this sentiment, and I feel that the current "try to not 
>> introduce breaking changes" approach is adding the burden, but not the 
>> benefits -- because the latter cannot be proven, and are most likely already 
>> broken.
>> 
>> Regards,
>> Branimir
>> 
>> On Wed, Feb 22, 2023 at 1:01 AM Abe Ratnofsky  wrote:
>>> Some interesting existing work on this subject is "Understanding and 
>>> Detecting Software Upgrade Failures in Distributed Systems" - 
>>> https://dl.acm.org/doi/10.1145/3477132.3483577 
>>> ,
>>>  also summarized by Andrey Satarin here: 
>>> https://asatarin.github.io/talks/2022-09-upgrade-failures-in-distributed-systems/
>>>  
>>> 
>>> 
>>> They specifically tested Cassandra upgrades, and have a solid list of 
>>> defects that they found. They also describe their testing mechanism 
>>> DUPTester, which includes a component that confirms that the leftover state 
>>> from one version can start up on the next version. There is a wider scope 
>>> of upgrade defects highlighted in the paper, beyond SSTable version support.
>>> 
>>> I believe the project would benefit from expanding our test suite 
>>> similarly, by parametrizing more tests on upgrade version pairs.
>>> 
>>> Also, per Benedict's comment:
>>> 
>>> > It’s a commitment, and it requires every contributor to consider it as 
>>> > part of work they produce.
>>> 
>>> But it shouldn't be a burden. Ability to downgrade is a testable problem, 
>>> so I see this work as a function of the suite of tests the project is 
>>> willing to agree on supporting.
>>> 
>>> Specifically - I agree with Scott's proposal to emulate the HDFS 
>>> upgrade-then-finalize approach. I would also support automatic finalization 
>>> based on a time threshold or similar, to balance the priorities of safe and 
>>> straightforward upgrades. Users need to be aware of the range of SSTable 
>>> formats supported by a given version, and how to handle when their SSTables 
>>> wouldn't be supported by an upcoming upgrade.
>>> 
>>> --
>>> Abe
>> 
>> 
>> --
>> Branimir Lambov
>> e. branimir.lam...@datastax.com
>> w. www.datastax.com
>>

Cassandra project status, 2023-03-02

2023-03-02 Thread Josh McKenzie

Trying out a monthly-ish cadence as traffic over the holiday season was a bit
sparse. Let's see how this goes.

Congratulations to Patrick McFadin on being made a committer on the project! So
many of us on the project have benefited from your assistance, input, guidance,
and efforts over the years. Well deserved!

Some of us have even had the opportunity to have to get physical therapy thanks
to you. (≖_≖ )

So that aside, there was the disappointing but understandable news that the
Cassandra Summit is being delayed until later in the year, December 12-13 to be
exact. See Patrick's email on the topic here:
https://lists.apache.org/thread/g4rsqjysn4fqw34z6ro5q6ywml6mnrzd.

In lieu of the summit, DataStax is sponsoring a virtual session covering some
of what's coming up in Cassandra 5.0 from a variety of participants in the
community. See here for details:
https://www.cassandrasummit.org/cassandra-forward

The world Post-Covid remains unpredictable; it's good to see our community
rising to the challenge and driving forward.

There are a _lot_ of these flagged as good starter tickets; focus on the column
marked "To Do" on the far left.

If there's anything else specifically you're interested in on the project, just
hop in the slack channel and let everyone know.

[Dev mailing list]
Let's see whether this time span is sustainable for a summary:
https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-1-23|dto=2023-3-2:

46 Topics. This is going to be interesting.

Benedict reached out about our contribution guidelines looking for feedback:
https://lists.apache.org/thread/v4wn4jh23nv3xpvmrky0zwt2sdbkb5xh. The gdoc with
quite a bit of engagement can be found here:
https://docs.google.com/document/d/1qt7xmvEAXm_w9ARb2lVutapB_3il2WWEWuq_zYlu_9s/edit#heading=h.it3eez49gg10

Maxim Muzafarov has been doing some interesting work around building a
framework to keep our VTables and JMX params in sync and make maintenance lower
effort: https://lists.apache.org/thread/26j9hhy39okw0wy79mtylb753w6xjclg. Some
more about the progression of this work can be found in this thread here:
https://lists.apache.org/thread/8kywzv24n0dp07mhvch7hwhjypssoh0l and on this
jira: https://issues.apache.org/jira/browse/CASSANDRA-15254.

The discussion surrounding review of long-lived feature branches continued
(CEP-15 prompted the initial discussion):
https://lists.apache.org/thread/rotjm3dr36mbx24wbflkt35g7z7wz7ks. This prompted
a follow up discussion about how we want to manage long-lived feature branches:
https://lists.apache.org/thread/c39yrbdszgz9s34vr17wpjdhv6h2oo61. While we
didn't hit a formal (or informal) consensus on the topic, Henrik did summarize
both his position and many of the conversation points succintly and I think
it's worth reiterating here:

"Basically what you want to avoid is to paint yourself into a corner, and
particularly the wrong corner. So the way I would answer this question is that
large bodies of work should:
- Refactoring that is a) harmless, and/or b) improves the codebase anyway,
should be merged early into trunk.
- The main body of the new functionality should be developed in a feature branch
up until some kind of MVP stage. This means that by the time it is proposed for
merge, a) it has been tested to both be of good quality and that it actually
provides the benefit it set out to implement. This means that merging it to
trunk will be a net improvement, always.
- After that first big MVP merge, additional functionality of course could be
developed directly against trunk.
- For patches that are very clean and self contained, for example in their own
Java package, it doesn't really matter, because they are easy to roll back if
needed. They can be developed against trunk."

As my partner has told me for years "Very little is black and white in this
world. The right answer is nuanced and somewhere in the middle". I think if we
keep taking this on a case-by-case basis and try and be proactive when we think
we might be doing something where we might disrupt others, we'll be good.

Lorina reached out about a documentation plan for C* 5.0:
https://lists.apache.org/thread/on116v3thbqjhp001z1ovko2kdf97z6n. The gdoc with
this work can be found here:
https://docs.google.com/document/d/1FAACcAxtV9qLJS05RLt85S_6Gb4C4t4g9DhmfLufjOk/edit#heading=h.f2d8nyksbd13.

Stefan Miklosovic opened up a discussion about how we want to handle ALLOW
FILTERING on VTables:
https://lists.apache.org

Re: [EXTERNAL] Re: [DISCUSS] Next release date

2023-03-04 Thread Josh McKenzie

(for convenience sake, I'm referring to both Major and Minor semver releases as 
"major" in this email)

> The big feature from our perspective for 5.0 is ACCORD (CEP-15) and I would 
> advocate to delay until this has sufficient quality to be in production. 
This approach can be pretty unpredictable in this domain; often unforeseen 
things come up in implementation that can give you a long tail on something 
being production ready. For the record - I don't intend to single Accord out 
*at all* on this front, quite the opposite given how much rigor's gone into the 
design and implementation. I'm just thinking from my personal experience: 
everything I've worked on, overseen, or followed closely on this codebase 
always has a few tricks up its sleeve along the way to having edge-cases 
stabilized.

Much like on some other recent topics, I think there's a nuanced middle ground 
where we take things on a case-by-case basis. Some factors that have come up in 
this thread that resonated with me:

For a given potential release date 'X':
1. How long has it been since the last release?
2. How long do we expect qualification to take from a "freeze" (i.e. no new 
improvement or features, branch) point?
3. What body of merged production ready work is available?
4. What body of new work do we have high confidence will be ready within Y time?

I think it's worth defining a loose "minimum bound and upper bound" on release 
cycles we want to try and stick with barring extenuating circumstances. For 
instance: try not to release sooner than maybe 10 months out from a prior 
major, and try not to release later than 18 months out from a prior major. Make 
exceptions if truly exceptional things land, are about to land, or bugs are 
discovered around those boundaries.

Applying the above framework to what we have in flight, our last release date, 
expectations on CI, etc - targeting an early fall freeze (pending CEP status) 
and mid to late fall or December release "feels right" to me.

With the exception, of course, that if something merges earlier, is stable, and 
we feel is valuable enough to cut a major based on that, we do it.

~Josh

On Fri, Mar 3, 2023, at 7:37 PM, German Eichberger via dev wrote:
> Hi,
> 
> We shouldn't release just for releases sake. Are there enough new features 
> and are they working well enough (quality!).
> 
> The big feature from our perspective for 5.0 is ACCORD (CEP-15) and I would 
> advocate to delay until this has sufficient quality to be in production. 
> 
> Just because something is released doesn't mean anyone is gonna use it. To 
> add some operator perspective: Every time there is a new release we need to 
> decide 
> 1) are we supporting it 
> 2) which other release can we deprecate 
> 
> and potentially migrate people - which is also a tough sell if there are no 
> significant features and/or breaking changes.  So from my perspective less 
> frequent releases are better - after all we haven't gotten around to support 
> 4.1 🙂
> 
> The 5.0 release is also coupled with deprecating  3.11 which is what a 
> significant amount of people are using - given 4.1 took longer I am not sure 
> how many people are assuming that 5 will be delayed and haven't made plans 
> (OpenJDK support for 8 is longer than Java 17 🙂) . So being a bit more 
> deliberate with releasing 5.0 and having a longer beta phase are all things 
> we should consider.
> 
> My 2cts,
> German
> 
> *From:* Benedict 
> *Sent:* Wednesday, March 1, 2023 5:59 AM
> *To:* dev@cassandra.apache.org 
> *Subject:* [EXTERNAL] Re: [DISCUSS] Next release date
>  
> 
> It doesn’t look like we agreed to a policy of annual branch dates, only 
> annual releases and that we would schedule this for 4.1 based on 4.0’s branch 
> date. Given this was the reasoning proposed I can see why folk would expect 
> this would happen for the next release. I don’t think there was a strong 
> enough commitment here to be bound by, it if we think different maths would 
> work better.
> 
> I recall the goal for an annual cadence was to ensure we don’t have lengthy 
> periods between releases like 3.x to 4.0, and to try to reduce the pressure 
> certain contributors might feel to hit a specific release with a given 
> feature.
> 
> I think it’s better to revisit these underlying reasons and check how they 
> apply than to pick a mechanism and stick to it too closely. 
> 
> The last release was quite recent, so we aren’t at risk of slow releases 
> here. Similarly, there are some features that the *project* would probably 
> benefit from landing prior to release, if this doesn’t push release back too 
> far.
> 
> 
> 
> 
> 
>> On 1 Mar 2023, at 13:38, Mick Semb Wever  wrote:
>> 
>>> My thoughts don't touch on CEPs inflight. 
>> 
>> 
>> 
>> For the sake of broadening the discussion, additional questions I think 
>> worthwhile to raise are…
>> 
>> 1. What third parties, or other initiatives, are invested and/or working 
>> against the May deadline? and what are th

Re: [EXTERNAL] Re: [DISCUSS] Next release date

2023-03-09 Thread Josh McKenzie

Added an "Epics" quick filter; could help visualize what our high priority 
features are for given releases:

https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2649

Our cumulative flow diagram of 5.0 related tickets is pretty large. Probably 
not a great indicator for the body of what we expect to land in the release:

https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&projectKey=CASSANDRA&view=reporting&chart=cumulativeFlowDiagram&swimlane=1212&swimlane=1412&swimlane=1413&column=2116&column=2117&column=2118&column=2130&column=2133&column=2124&column=2127&from=2021-12-20&to=2023-03-09

One place we've been weak historically is in distinguishing between tickets we 
consider "nice to have" and things that are "blockers". We don't have any 
metadata that currently distinguishes those two, so determining what our 
burndown leading up to 5.0 looks like is a lot more data massaging and 
hand-waving than I'd prefer right now.

I've been deep on some other issues for awhile but hope to get more involved in 
this space + ci within the next month or so.

On Thu, Mar 9, 2023, at 9:15 AM, Mick Semb Wever wrote:
>> I've also found some useful Cassandra's JIRA dashboards for previous
>> releases to track progress and scope, but we don't have anything
>> similar for the next release. Should we create it?
>> Cassandra 4.0GAScope
>> Cassandra 4.1 GA scope
> 
> 
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484  
>

Re: [DISCUSS] Enhanced Disk Error Handling

2023-03-09 Thread Josh McKenzie

> Personally, I'd like to see the fix for this issue come after CEP-21. It 
> could be feasible to implement a fix before then, that detects bit-errors on 
> the read path and refuses to respond to the coordinator, implicitly having 
> speculative execution handle the retry against another replica while repair 
> of that range happens. But that feels suboptimal to me when a better 
> framework is on the horizon.
I originally typed something in agreement with you but the more I think about 
this, the more a node-local "reject queries for specific token ranges" 
degradation profile seems like it _could_ work. I don't see an obvious way to 
remove the need for a human-in-the-loop on fixing things in a pre-CEP-21 world 
without opening pandora's box (Gossip + TMD + non-deterministic agreement on 
ownership state cluster-wide /cry).

And even in a post CEP-21 world you're definitely in the "at what point is it 
better to declare a host dead and replace it" fuzzy territory where there's no 
immediately correct answers.

A system_distributed table of corrupt token ranges that are currently being 
rejected by replicas with a mechanism to kick off a repair of those ranges 
could be interesting.

On Thu, Mar 9, 2023, at 1:45 PM, Abe Ratnofsky wrote:
> Thanks for proposing this discussion Bowen. I see a few different issues here:
> 
> 1. How do we safely handle corruption of a handful of tokens without taking 
> an entire instance offline for re-bootstrap? This includes refusal to serve 
> read requests for the corrupted token(s), and correct repair of the data.
> 2. How do we expose the corruption rate to operators, in a way that lets them 
> decide whether a full disk replacement is worthwhile?
> 3. When CEP-21 lands it should become feasible to support ownership draining, 
> which would let us migrate read traffic for a given token range away from an 
> instance where that range is corrupted. Is it worth planning a fix for this 
> issue before CEP-21 lands?
> 
> I'm also curious whether there's any existing literature on how different 
> filesystems and storage media accommodate bit-errors (correctable and 
> uncorrectable), so we can be consistent with those behaviors.
> 
> Personally, I'd like to see the fix for this issue come after CEP-21. It 
> could be feasible to implement a fix before then, that detects bit-errors on 
> the read path and refuses to respond to the coordinator, implicitly having 
> speculative execution handle the retry against another replica while repair 
> of that range happens. But that feels suboptimal to me when a better 
> framework is on the horizon.
> 
> --
> Abe
> 
>> On Mar 9, 2023, at 8:23 AM, Bowen Song via dev  
>> wrote:
>> 
>> Hi Jeremiah,
>> 
>> I'm fully aware of that, which is why I said that deleting the affected 
>> SSTable files is "less safe".
>> 
>> If the "bad blocks" logic is implemented and the node abort the current read 
>> query when hitting a bad block, it should remain safe, as the data in other 
>> SSTable files will not be used. The streamed data should contain the 
>> unexpired tombstones, and that's enough to keep the data consistent on the 
>> node.
>> 
>> 
>> Cheers,
>> Bowen
>> 
>> 
>> 
>> On 09/03/2023 15:58, Jeremiah D Jordan wrote:
>>> It is actually more complicated than just removing the sstable and running 
>>> repair.
>>> 
>>> In the face of expired tombstones that might be covering data in other 
>>> sstables the only safe way to deal with a bad sstable is wipe the token 
>>> range in the bad sstable and rebuild/bootstrap that range (or wipe/rebuild 
>>> the whole node which is usually the easier way).  If there are expired 
>>> tombstones in play, it means they could have already been compacted away on 
>>> the other replicas, but may not have compacted away on the current replica, 
>>> meaning the data they cover could still be present in other sstables on 
>>> this node.  Removing the sstable will mean resurrecting that data.  And 
>>> pulling the range from other nodes does not help because they can have 
>>> already compacted away the tombstone, so you won’t get it back.
>>> 
>>> Tl;DR you can’t just remove the one sstable you have to remove all data in 
>>> the token range covered by the sstable (aka all data that sstable may have 
>>> had a tombstone covering).  Then you can stream from the other nodes to get 
>>> the data back.
>>> 
>>> -Jeremiah
>>> 
 On Mar 8, 2023, at 7:24 AM, Bowen Song via dev  
 wrote:
 
 At the moment, when a read error, such as unrecoverable bit error or data 
 corruption, occurs in the SSTable data files, regardless of the 
 disk_failure_policy configuration, manual (or to be precise, external) 
 intervention is required to recover from the error.
 
 Commonly, there's two approach to recover from such error:
 
  1. The safer, but slower recover strategy: replace the entire node.
  2. The less safe, but faster recover strategy: shut down the node, delete 
 the a

Re: [EXTERNAL] Re: [DISCUSS] Next release date

2023-03-09 Thread Josh McKenzie

> We do have the metadata, but yes it requires some work…
My wording was poor; we have the *potential* to have this metadata, but to my 
knowledge we don't have a muscle of consistently setting this, or any kind of 
heuristic to determine when something should block a release or not. At least 
on 4.0 and 4.1, it seemed this was a bridge we crossed informally in the run up 
to a date trying to figure out what to include or discard.

> The project previously made an agreement to one release a year,
I don't recall the details (and searching our... rather active threads is an 
undertaking) - our site has a blog post here: 
https://cassandra.apache.org/_/blog/Apache-Cassandra-Changelog-7-May-2021.html, 
that states: "The community has agreed to one release every year, plus periodic 
trunk snapshots". While it reads like "one a calendar year" to me, at the end 
of the day what's important to me is we do right by our users. So whether we 
interpret that as every 12 months, once per calendar year, once every July with 
a freeze in May train style, all fine by me actually. I more or less stand by 
"just not 'release monthly' and not 'release once every three years'. :) Got 
any clarity there?

> I (and others) wish to do the exercise of running through our 5.x list and 
> pushing out everything we can see with no commitment or activity (and also 
> closing out old and now irrelevant/inapplicable tickets) (and this will be 
> done via a proposed filter). But a question here is the fixVersion can infer 
> where a ticket can be applied (appropriateness) or where we foresee it 
> landing (roadmap). 
I'm +1 to this. If people want something to be different they can just toggle 
it back and bring it to the ML or slack.

For everything not urgent or a blocker, does it matter whether something has a 
fixver of where we think it's going to land or where we'd like to see it land? 
At the end of the day, neither of those scenarios will actually shift a release 
date if we're proactively putting "blocker / urgent" status on new features, 
improvements, and bugs we think are significant enough to delay a release right?

On Thu, Mar 9, 2023, at 3:17 PM, Mick Semb Wever wrote:
>> One place we've been weak historically is in distinguishing between tickets 
>> we consider "nice to have" and things that are "blockers". We don't have any 
>> metadata that currently distinguishes those two, so determining what our 
>> burndown leading up to 5.0 looks like is a lot more data massaging and 
>> hand-waving than I'd prefer right now.
> 
> 
> We distinguish "blockers" with `Priority=Urgent` or `Severity=Critical`, or 
> by linking the ticket as blocking to a specific ticket that spells it out. We 
> do have the metadata, but yes it requires some work…
> 
> The project previously made an agreement to one release a year, akin to a 
> release train model, which helps justify why fixVersion 5.x has just fallen 
> to be "next". (And then there is no "burn-down" in such a model.) 
> 
> Our release criteria, especially post-branch, demonstrates that we do 
> introduce and rely on "blockers". If we agree that certain exceptional CEPs 
> are "blockers", a la warrant delaying the release date, using this approach 
> seems to fit in appropriately.
> 
> When I (just) folded fixVersion 4.2 into 5.0 (and 4.x into 5.x), I also 
> created 5.1.x and 6.x.  I (and others) wish to do the exercise of running 
> through our 5.x list and pushing out everything we can see with no commitment 
> or activity (and also closing out old and now irrelevant/inapplicable 
> tickets) (and this will be done via a proposed filter). But a question here 
> is the fixVersion can infer where a ticket can be applied (appropriateness) 
> or where we foresee it landing (roadmap). For example we mark bugs with the 
> fixVersions ideally they should be applied to, regardless of whether anyone 
> comes to address them or not. 
> 
> 
>

Re: [DISCUSS] Enhanced Disk Error Handling

2023-03-09 Thread Josh McKenzie

> I'm not seeing any reasons why CEP-21 would make this more difficult to 
> implement
I think I communicated poorly - I was just trying to point out that there's a 
point at which a host limping along is better put down and replaced than 
piecemeal flagging range after range dead and working around it, and there's no 
immediately obvious "Correct" answer to where that point is regardless of what 
mechanism we're using to hold a cluster-wide view of topology.

> ...CEP-21 makes this sequencing safe...
For sure - I wouldn't advocate for any kind of "automated corrupt data repair" 
in a pre-CEP-21 world.

On Thu, Mar 9, 2023, at 2:56 PM, Abe Ratnofsky wrote:
> I'm not seeing any reasons why CEP-21 would make this more difficult to 
> implement, besides the fact that it hasn't landed yet.
> 
> There are two major potential pitfalls that CEP-21 would help us avoid:
> 1. Bit-errors beget further bit-errors, so we ought to be resistant to a high 
> frequency of corruption events
> 2. Avoid token ownership changes when attempting to stream a corrupted token
> 
> I found some data supporting (1) - 
> https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2014/20140806_T1_Hetzler.pdf
> 
> If we detect bit-errors and store them in system_distributed, then we need a 
> capacity to throttle that load and ensure that consistency is maintained.
> 
> When we attempt to rectify any bit-error by streaming data from peers, we 
> implicitly take a lock on token ownership. A user needs to know that it is 
> unsafe to change token ownership in a cluster that is currently in the 
> process of repairing a corruption error on one of its instances' disks. 
> CEP-21 makes this sequencing safe, and provides abstractions to better expose 
> this information to operators.
> 
> --
> Abe
> 
>> On Mar 9, 2023, at 10:55 AM, Josh McKenzie  wrote:
>> 
>>> Personally, I'd like to see the fix for this issue come after CEP-21. It 
>>> could be feasible to implement a fix before then, that detects bit-errors 
>>> on the read path and refuses to respond to the coordinator, implicitly 
>>> having speculative execution handle the retry against another replica while 
>>> repair of that range happens. But that feels suboptimal to me when a better 
>>> framework is on the horizon.
>> I originally typed something in agreement with you but the more I think 
>> about this, the more a node-local "reject queries for specific token ranges" 
>> degradation profile seems like it _could_ work. I don't see an obvious way 
>> to remove the need for a human-in-the-loop on fixing things in a pre-CEP-21 
>> world without opening pandora's box (Gossip + TMD + non-deterministic 
>> agreement on ownership state cluster-wide /cry).
>> 
>> And even in a post CEP-21 world you're definitely in the "at what point is 
>> it better to declare a host dead and replace it" fuzzy territory where 
>> there's no immediately correct answers.
>> 
>> A system_distributed table of corrupt token ranges that are currently being 
>> rejected by replicas with a mechanism to kick off a repair of those ranges 
>> could be interesting.
>> 
>> On Thu, Mar 9, 2023, at 1:45 PM, Abe Ratnofsky wrote:
>>> Thanks for proposing this discussion Bowen. I see a few different issues 
>>> here:
>>> 
>>> 1. How do we safely handle corruption of a handful of tokens without taking 
>>> an entire instance offline for re-bootstrap? This includes refusal to serve 
>>> read requests for the corrupted token(s), and correct repair of the data.
>>> 2. How do we expose the corruption rate to operators, in a way that lets 
>>> them decide whether a full disk replacement is worthwhile?
>>> 3. When CEP-21 lands it should become feasible to support ownership 
>>> draining, which would let us migrate read traffic for a given token range 
>>> away from an instance where that range is corrupted. Is it worth planning a 
>>> fix for this issue before CEP-21 lands?
>>> 
>>> I'm also curious whether there's any existing literature on how different 
>>> filesystems and storage media accommodate bit-errors (correctable and 
>>> uncorrectable), so we can be consistent with those behaviors.
>>> 
>>> Personally, I'd like to see the fix for this issue come after CEP-21. It 
>>> could be feasible to implement a fix before then, that detects bit-errors 
>>> on the read path and refuses to respond to the coordinator, implicitly 
>>> having speculative execution handle the retry ag

Re: [DISCUSS] New dependencies with Chronicle-Queue update

2023-03-13 Thread Josh McKenzie

> I think we should we use the most recent versions of all libraries where 
> possible?”
To clarify, are we talking "most recent versions of all libraries *when we have 
to update them anyway for a dependency*"? Not *all libraries all libraries*...

If the former, I agree. If the latter, here be dragons. :)

On Mon, Mar 13, 2023, at 1:13 PM, Ekaterina Dimitrova wrote:
> “ > Given we need to upgrade to support JDK17 it seems fine to me.  The only 
> concern I have is that some of those libraries are already pretty old, for 
> example the most recent jna-platform is 5.13.0 and 5.5.0 is almost 4 years 
> old.  I think we should we use the most recent versions of all libraries 
> where possible?”
> +1
> 
> On Mon, 13 Mar 2023 at 12:10, Brandon Williams  wrote:
>> I know it was just an example but we upgraded JNA to 5.13 in
>> CASSANDRA-18050 as part of the JDK17 effort, so at least that is taken
>> care of.
>> 
>> Kind Regards,
>> Brandon
>> 
>> On Mon, Mar 13, 2023 at 10:39 AM Jeremiah D Jordan
>>  wrote:
>> >
>> > Given we need to upgrade to support JDK17 it seems fine to me.  The only 
>> > concern I have is that some of those libraries are already pretty old, for 
>> > example the most recent jna-platform is 5.13.0 and 5.5.0 is almost 4 years 
>> > old.  I think we should we use the most recent versions of all libraries 
>> > where possible?
>> >
>> > > On Mar 13, 2023, at 7:42 AM, Mick Semb Wever  wrote:
>> > >
>> > > JDK17 requires us to update our chronicle-queue dependency: 
>> > > CASSANDRA-18049
>> > >
>> > > We use chronicle-queue for both audit logging and fql.
>> > >
>> > > This update pulls in a number of new transitive dependencies.
>> > >
>> > > affinity-3.23ea1.jar
>> > > asm-analysis-9.2.jar
>> > > asm-commons-9.2.jar
>> > > asm-tree-9.2.jar
>> > > asm-util-9.2.jar
>> > > jffi-1.3.9.jar
>> > > jna-platform-5.5.0.jar
>> > > jnr-a64asm-1.0.0.jar
>> > > jnr-constants-0.10.3.jar
>> > > jnr-ffi-2.2.11.jar
>> > > jnr-x86asm-1.0.2.jar
>> > > posix-2.24ea4.jar
>> > >
>> > >
>> > > More info here:
>> > > https://issues.apache.org/jira/browse/CASSANDRA-18049?focusedCommentId=17699393&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17699393
>> > >
>> > >
>> > > Objections?
>> >

Re: Should we cut some new releases?

2023-03-14 Thread Josh McKenzie

+1

On Tue, Mar 14, 2023, at 7:50 AM, Aleksey Yeshchenko wrote:
> +1
> 
>> On 14 Mar 2023, at 05:50, Berenguer Blasi  wrote:
>> 
>> +1
>> 
>> On 13/3/23 21:25, Jacek Lewandowski wrote:
>>> +1
>>> 
>>> pon., 13 mar 2023, 20:36 użytkownik Miklosovic, Stefan 
>>>  napisał:
 Yes, I was waiting for CASSANDRA-18125 to be in.

 I can release 4.1.1 to staging tomorrow morning CET if nobody objects that.

 Not sure about 4.0.9. We released 4.0.8 just few weeks ago. I would do 
 4.1.1 first.

 From: Ekaterina Dimitrova 
 Sent: Monday, March 13, 2023 18:12
 To: dev@cassandra.apache.org
 Subject: Re: Should we cut some new releases?

 NetApp Security WARNING: This is an external email. Do not click links or 
 open attachments unless you recognize the sender and know the content is 
 safe.

 +1

 On Mon, 13 Mar 2023 at 12:23, Benjamin Lerer 
 mailto:ble...@apache.org>> wrote:
 Hi everybody,

 Benedict and Jon recently committed the patch for 
 CASSANDRA-18125 
 which fixes some serious problems at the memtable/flush level. Should we 
 consider cutting some releases that contain this fix?

Re: [DISCUSS] Drop support for sstable formats m* (in trunk)

2023-03-14 Thread Josh McKenzie

It's always seemed a little odd to me that we drop all the "read old format" 
code given how little maintenance that code takes over time. The ability to 
have a C* node read older format SSTables into perpetuity *seems* like a pretty 
compelling usability feature to me (for some of the reasons mentioned in this 
thread).

So personally, -1 to removing the code for 3.0, and generally think we should 
reconsider how long we maintain support specifically for reading older format 
files as we move forward given this is long-lived infra software. Probably 
pain-points I'm not thinking of here, but worth a deeper discussion IMO.

On Tue, Mar 14, 2023, at 12:02 PM, Brandon Williams wrote:
> On Mon, Mar 13, 2023 at 5:54 PM Mick Semb Wever  wrote:
> 
> > Personally I am not in favour of keeping, or recommending users use,
> > code we don't test.
> 
> How much effort would it be to have some simple smoke tests?  I think
> we should make sure nothing gets indirectly broken if we're going to
> keep it around.
>

Re: [DISCUSS] Change the useage of nodetool tablehistograms

2023-03-16 Thread Josh McKenzie

We could also consider augmenting the tool with new named arguments with the 
functionality you described and leave the positional usage intact.

On Thu, Mar 16, 2023, at 6:43 AM, Bowen Song via dev wrote:
> The documented command options are:
> 
>> nodetool tablehistograms [  | ]
>> 
> 
> 
> That means one parameter will be treated as dot separated keyspace and table. 
> Alternatively, two parameters will be treated as the keyspace and table 
> respectively.
> 
> To remain compatible with the documented behaviour, my suggestion is to 
> change the command options to:
> 
>> nodetool tablehistograms [  [ [...]] | 
>>  [ [...]]]
>> 
> Feel free to add the "all except ..." feature to the above.
> 
> This doesn't break backward compatibility in documented ways. It only changes 
> the undocumented behaviour. If someone is using the undocumented behaviour, 
> they must know things may break when the software is upgraded. We can just 
> add a line to the NEWS.txt and let them update their scripts.
> 
> 
> On 16/03/2023 08:53, guo Maxwell wrote:
>> Hello everyone :
>> The nodetool tablehistograms have one argument which you can fill with only 
>> one table name with the format "keyspace_name.table_name /keyspace_name 
>> table_name", so that you can get the table histograms of the specied table.
>> 
>> And  if none arguments is set, all the tables' histograms will be print 
>> out.And if more than 2 arguments (nomatter the format is right or wrong) are 
>> set , all the tables' histograms will also be print out too(Which is a bug 
>> In my mind).
>> 
>> So the usage of nodetool tablehistograms has some usage restrictions, That 
>> is either output one , or all informations.
>> 
>> As CASSANDRA-18296  
>> described , I will change the usage of nodetool tablehistograms, which 
>> support the feature below:
>> 1. nodetool tablehistograms ks.tb1 ks.tb2  //print out list of tables' 
>> histograms with format keyspace.table
>> 2.nodetool tablehistograms ks1 ks2 ks3 ... //print out list of keyspaces 
>> histograms
>> 3.nodetool tablehistograms -i ks1 ks2  //print out list of table 
>> histograms except for the keyspaces list behind the option -i
>> 4.nodetool tablehistograns -i ks ks.tb // print out list tables' histograms 
>> except for table in keyspace ks and ks.tb table.
>> 5.none option specified ,then all tables histograms will be print out.
>> 
>> The usage will breaks compatibility with how it was done previously, and as 
>> this is a user facing tool.
>> 
>> So, What do you think? 
>> 
>> Thanks~~~
>>

Re: [DISCUSS] CEP-26: Unified Compaction Strategy

2023-03-17 Thread Josh McKenzie

Could we get a JIRA for this too so we can get some reviewers collaborating on 
this? Only see Lorina's ticket for documenting it in JIRA atm.

On Fri, Mar 17, 2023, at 9:53 AM, Branimir Lambov wrote:
> The prototype of UCS can now be found in this pull request: 
> https://github.com/apache/cassandra/pull/2228
> 
> Its description is given in the included markdown documentation: 
> https://github.com/blambov/cassandra/blob/UCS-density/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md
> 
> The latest code includes some new elements compared to the link Henrik 
> posted, including density levelling, bucketing based solely on overlap, and 
> output splitting by expected density. It goes a little further than what is 
> described in the CEP-26 proposal as prototyping showed that we can make the 
> selection of sstables to compact and the sharding decisions independent of 
> each other. This makes the strategy more stable and better able to react to 
> changes in configuration and environment.
> 
> Regards,
> Branimir
> 
> On Wed, Dec 21, 2022 at 10:01 AM Benedict  wrote:
>> 
>> I’m personally very excited by this work. Compaction could do with a spring 
>> clean and this feels to formalise things much more cleanly, but density 
>> tiering in particular is something I’ve wanted to incorporate for years now, 
>> as it should significantly improve STCS behaviour (most importantly reducing 
>> read amplification and the amount of disk space required, narrowing the 
>> performance delta to LCS in these important dimensions), and simplifies 
>> re-levelling of LCS, making large streams much less painful.
>> 
>> 
>>> On 21 Dec 2022, at 07:19, Henrik Ingo  wrote:
>>> 
>>> I noticed the CEP doesn't link to this, so it should be worth mentioning 
>>> that the UCS documentation is available here: 
>>> https://github.com/datastax/cassandra/blob/ds-trunk/doc/unified_compaction.md
>>> 
>>> Both of the above seem to do a poor job referencing the literature we've 
>>> been inspired by. I will link to Mark Callaghan's blog on the subject:
>>> 
>>> http://smalldatum.blogspot.com/2018/07/tiered-or-leveled-compaction-why-not.html?m=1
>>>  
>>> 
>>> 
>>> ...and lazily will also borrow from Mark a post that references a bunch of 
>>> LSM (not just UCS related) academic papers: 
>>> http://smalldatum.blogspot.com/2018/08/name-that-compaction-algorithm.html?m=1
>>>  
>>> 
>>> 
>>> Finally, it's perhaps worth mentioning that UCS has been in production in 
>>> our Astra Serverless cloud service since it was launched in March 2021. The 
>>> version described by the CEP therefore already incorporates some 
>>> improvements based on observed production behaviour.
>>> 
>>> Henrik 
>>> 
>>> On Mon, 19 Dec 2022, 15:41 Branimir Lambov,  wrote:
 Hello everyone,
 
 I would like to open the discussion on our proposal for a unified 
 compaction strategy that aims to solve well-known problems with compaction 
 and improve parallelism to permit higher levels of sustained write 
 throughput.
 
 The proposal is here: 
 https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy
 
 The strategy is based on two main observations:
 - that tiered and levelled compaction can be generalized as the same thing 
 if one observes that both form exponentially-growing levels based on the 
 size of sstables (or non-overlapping sstable runs) and trigger a 
 compaction when more than a given number of sstables are present on one 
 level;
 - that instead of "size" in the description above we can use "density", 
 i.e. the size of an sstable divided by the width of the token range it 
 covers, which permits sstables to be split at arbitrary points when the 
 output of a compaction is written and still produce a levelled hierarchy.
 
 The latter allows us to shard the compaction space into progressively 
 higher numbers of shards as data moves to the higher levels of the 
 hierarchy, improving parallelism, space requirements and the duration of 
 compactions, and the former allows us to cover the existing strategies, as 
 well as hybrid mixtures that can prove more efficient for some workloads.
 
 Thank you,
 Branimir

Re: [VOTE] Release Apache Cassandra 4.1.1

2023-03-17 Thread Josh McKenzie

+1

On Fri, Mar 17, 2023, at 12:18 PM, Aleksey Yeshchenko wrote:
> +1
> 
>> On 17 Mar 2023, at 13:54, Mick Semb Wever  wrote:
>> 
>>> The vote will be open for 72 hours (longer if needed). Everyone who has 
>>> tested the build is invited to vote. Votes by PMC members are considered 
>>> binding. A vote passes if there are at least three binding +1s and no -1's.
>> 
>> 
>>  
>> +1
>> Checked
>> - signing correct
>> - checksums are correct
>> - source artefact builds (JDK 8+11)
>> - binary artefact runs (JDK 8+11)
>> - debian package runs (JDK 8+11)
>> - debian repo runs (JDK 8+11)
>> - redhat* package runs (JDK 8+11)
>> - redhat* repo runs (JDK 8+11)
>>

Re: [DISCUSS] Drop support for sstable formats m* (in trunk)

2023-03-17 Thread Josh McKenzie

> we (including me) have done a lot of stupid shit over the years on this 
> project. Half the time “this is how we’ve historically done X” to me is a 
> strong argument to start doing things differently. 
Oof. The truth (when applied to myself) hurts doesn't it? :)

> I suggest we should have a way to read/write from/to all sstable versions, I 
> absolutely agree this is useful (e.g. backups in storage). And we should be 
> better at thorough testing. 
Having an external library that both C* and other tools could rely on that 
handles SSTable reading and writing could actually be a very clean solution to 
helping encourage a broader ecosystem of things that want to interface with 
Cassandra but don't necessarily want to go through the StorageEngine to do it. 
Nevermind the value it'd bring to the table internally in terms of supporting 
longer upgrade cycles in C*, making what Claude is wrestling with on downgrades 
a lot simpler, etc.

Would probably be much cleaner to test thoroughly and less overhead to continue 
to maintain support for longer term as well.

On Fri, Mar 17, 2023, at 12:28 PM, Jeremiah D Jordan wrote:
> > As for precedent - we (including me) have done a lot of stupid shit over 
> > the years on this project. Half the time “this is how we’ve historically 
> > done X” to me is a strong argument to start doing things differently. This 
> > is one such case.
> 
> +1.  I definitely agree that this is one area of precedent that we should not 
> be following.  The project has historically been fairly hostile towards 
> longer upgrade timelines, I am glad to see all the recent conversations where 
> this seems to be improving.
> 
> -Jeremiah

Cassandra project status, 2023-03-20

2023-03-20 Thread Josh McKenzie

I did say monthly-ish. That goes both earlier and later.

We've had a lot of interesting topics come up on the dev list in the past few
weeks as well as movement on Accord, Transactional Metadata, and SAI, so let's
get to it.

The Cassandra Forward event took place on March 14th with a lot of interesting
talks and attendees (link: https://www.cassandrasummit.org/cassandra-forward).
You can watch recordings of the different talks on the site as well as sign up
for the Cassandra Summit that's been rescheduled to December 12-13th. Hope to
see you there!

[New Contributors Getting Started]
We have a lot of great starter tickets to get started with if you're interested
in diving in on the project - you can see the list on the kanban board w/quick
filters here:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2454&quickFilter=2652&quickFilter=2162&quickFilter=2160.
Anything on this list should be assignee-free and up for grabs so feel free to
take a crack at them.

For assistance on getting set up or orientation working on the code, join
#cassandra-dev channel on https://the-asf.slack.com (reply to me on this email
if you need an invite for your account), and feel free to tag the
@cassandra_mentors alias with questions.

[Dev mailing list]
https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-3-3|dto=2023-3-20:

Have 26 threads this time - a bit more manageable but surprisingly busy for 18
days.

We've had a lot of discussion about downgradability; Branimir originally
brought the topic back up here:
https://lists.apache.org/thread/tcp339k5ph8ql35wxr085to4qgp8tpg7, and that
thread was still kicking since the last status update. Attempting a bit of
editorializing, various points that were brought up and not contended:
- Try and not break sstable format compatibility with a change if it's
reasonable not to
- Users should be able to opt-in to major format upgrades and not have access
to new features until such time as they've opted in
- We should have an offline sstabledowngrade tool
- Nodes should be able to write older version sstables if configured to do so
(how many versions and where that code lives is somewhat unclear still)
- We need simple tests (upgrade tests backwards) to see what works and doesn't
work to know the scope of the problem
- Jacek created the epic https://issues.apache.org/jira/browse/CASSANDRA-18300
to track work on downgradability

There's a good bit we discussed on the thread not yet captured in JIRA;
assuming nobody has significant disagreement with the list above I may create
tickets for the things we haven't yet captured so we don't lose that context.
Also - if I missed something from that thread you brought up you want to see
captured as well, let me know and I'll take care of that.

Another thread that's seen a lot of traffic without yet concluding: "[DISCUSS]
Next release date"
(https://lists.apache.org/thread/fncbr50xg1otw8xtpyn0b3ys02bfnwv1). It seems
like we were headed towards a "set a target release date, back up N weeks based
on how long we think it will take to validate that, and set that as our branch
/ freeze date" conclusion. Jeremiah offered October with a potential September
freeze if we believe ourselves capable of a 4 week validation, and David asked
some pointed questions about why 4.1 took so long to release and whether we
have enough testing to trust trunk today. If you have some thoughts on the
topic, please don't let the thread lie dormant; it's important we come to a
consensus on this and agree on a target to push for.

Stefan created and reminded us of CASSANDRA-18043, "remove deprecated
DateTieredCompactionStratety". It's been deprecated for years now so it's
probably time to go.

Speaking of deprecation, we've been discussing the role of the hadoop
integration code in the codebase (link:
https://lists.apache.org/thread/q34zsscctgn6kpwkflx03859y7nv3y5z). The general
consensus appears to be for deprecation in 4.x and removal in 5.0 given the
code is unmaintained and very, very old.

Stefan brought up the somewhat problematic case with NetworkTopologyStrategy
where RF > number of racks, since the strategy can place things in a way where
you lose QUORUM if you lose a rack (link:
https://lists.apache.org/thread/dntymkm1b9xjs1bognf3w1lpf1mdrzos). The
consensus on that thread was that we should make NTS do the right thing going
forward but also preserve the ability to do things "the old way". See this JIRA
for more details: https://issues.apache.org/jira/browse/CASSANDRA-16203

Bowen Song raised the topic of potentially enhancing how we handle disk errors:
https://lists.apache.org/thread/gwyz9otgokqvmdrq85nw3ds5nyrhz8t3. Some
interesting ideas came up on the thread as well as questions about what we
could potentially do with the current state of the art vs. a future with
transactional metadata. No conclusions quite yet but the notion of having
replicas selectively reject t

Re: [DISCUSS] Change the useage of nodetool tablehistograms

2023-03-22 Thread Josh McKenzie

Agree w/Bowen. I think the straight forward simplicity of "clear inclusion and 
exclusion semantics, default to include all in scope excepting things that are 
explicitly ignored" would be ideal.


On Wed, Mar 22, 2023, at 8:45 AM, Bowen Song via dev wrote:
> TBH, the syntax looks unnecessarily complex and confusing to me.
> 
> For example, for this command:
> 
>> nodetool tablehistograns -ks ks1 -i -tbs ks1.tb1 ks2.tb2
>> 
> Which one of the following should it do?
>  1. all tables in the keyspace ks1,  except the table tb1; or
>  2. all tables in all keyspaces, except any table in the keyspace ks1 and the 
> table tb2 in the keyspace ks2
> 
> 
> I personally would prefer the simplicity of this approach:
> 
>> nodetool tablehistograms ks1 tb1 tb2 tb3
>> 
>> nodetool tablehistograms ks1.tb1 ks1.tb2 ks2.tb3
>> 
>> nodetool tablehistograms -i ks1 -i ks2
>> 
>> nodetool tablehistograms -i ks1.tb1 -i ks2.tb2
>> 
>> 
>> 
> They are self-explanatory. You don't need to read comments to understand what 
> do they do, as long as you know that "-i" means "exclude".
> 
> A more complex and possibly confusing option could be:
> 
> 
> 
> 
> 
>> nodetool tablehistograms ks1 -i ks1.tb1 -i ks1.tb2  # all tables in the 
>> keyspace ks1, except the table tb1 and tb2
>> 
>> nodetool tablehistograms -i ks1.tb1 -i ks1.tb2 ks1  # identical as above, as 
>> -i takes only one parameter
>> 
> To avoid the above confusion, the command could enforce that the "-i" option 
> may only be used after any positional options, thus makes the 2nd command a 
> syntax error.
> 
>> 
>> 
> Beyond that, I don't see why the user can't make multiple invocations of the 
> nodetool tablehistograms command if they have more complex or specific need.
> 
> For example, in this case:
> 
>> *> 6.nodetool tablehistograns -i -tbs ks.tb1 ks.tb2 -ks ks1 // print out 
>> list tables' histograms except for table in ks.tb1 ks.tb2 and all tables in 
>> ks1*
>> 
> The same result can be achieved by concatenating the outputs of the following 
> two commands:
> 
>> nodetool tablehistograms -i ks -i ks1
>> 
>> nodetool tablehistograms ks -i ks.tb1 -i ks.tb2
>> 
> 
> 
> On 22/03/2023 05:12, guo Maxwell wrote:
>> Thanks everyone , So It seems that it is better to add new parameter options 
>> to meet our needs, while keeping the original parameter functions unaffected 
>> to achieve backward compatibility. 
>> So the new options are :
>> 1. nodetool tablehistograms ks.tb1 or ks tb1  ... //this is *one of the old 
>> way *of using tablehistogram. will print out the histograms of tabke ks.tb1 
>> , we keep the old format to print out the table histograms,besides if more 
>> than two arguments is provied, suchu as nodetool tablehistograms 
>> system.local system_schema.columns system_schema.tables then all tables's  
>> histograms will be printed out (I think this is a bug that not as excepted 
>> in the document's decription, we should remind the user that this is an 
>> incorrenct usage)
>> 
>> 2. nodetool tablehistograms -tbs ks.tb1 ks.tb2  //print out list of 
>> tables' histograms with format keyspace.table
>> 3.nodetool tablehistograms -ks ks1 ks2 ks3 ... //print out list of keyspaces 
>> histograms
>> 4.nodetool tablehistograms -i -ks ks1 ks2  //print out list of table 
>> histograms except for the keyspaces list behind the option -i
>> 5.nodetool tablehistograns -i -tbs ks.tb1 ks.tb2 // print out list tables' 
>> histograms except for table in ks.tb1 ks.tb2
>> 6.nodetool tablehistograns -i -tbs ks.tb1 ks.tb2 -ks ks1 // print out list 
>> tables' histograms except for table in ks.tb1 ks.tb2 and all tables in ks1
>> 6.none option specified ,then all tables histograms will be print out.// 
>> this is *another one of the old way* of using tablehistogram.
>> 
>> So we add some more options like "-i", "-ks", "-tbs" , we can combine these 
>> options  and we can also use any of them individually, besides, we can also 
>> use the tool through old way if a table with format ks.tb is provied.
>> 
>> 
>> Jeremiah D Jordan  于2023年3月16日周四 23:14写道：
>>> -1 on any change which breaks the previously documented usage.
>>> +1 any additions to what the tool can do without breaking previously 
>>> documented behavior.
>>> 
>>>> On Mar 16, 2023, at 7:42 AM, Josh McKenzie  wrote:
>>>> 
>>>> We could also cons

Re: Welcome our next PMC Chair Josh McKenzie

2023-03-23 Thread Josh McKenzie

Definitely want to +1 the appreciation for all the work Mick's put into the 
role.

Looking forward to continuing to help out where I can!

On Thu, Mar 23, 2023, at 9:27 AM, J. D. Jordan wrote:
> 
> Congrats Josh!
> 
> And thanks Mick for your time spent as Chair!
> 
>> On Mar 23, 2023, at 8:21 AM, Aaron Ploetz  wrote:
>> 
>> Congratulations, Josh!
>> 
>> And of course, thank you Mick for all you've done for the project while in 
>> the PMC Chair role!
>> 
>> On Thu, Mar 23, 2023 at 7:44 AM Derek Chen-Becker  
>> wrote:
>>> Congratulations, Josh!
>>> 
>>> On Thu, Mar 23, 2023, 4:23 AM Mick Semb Wever  wrote:
>>>> It is time to pass the baton on, and on behalf of the Apache Cassandra 
>>>> Project Management Committee (PMC) I would like to welcome and 
>>>> congratulate our next PMC Chair Josh McKenzie (jmckenzie).
>>>> 
>>>> Most of you already know Josh, especially through his regular and valuable 
>>>> project oversight and status emails, always presenting a balance and 
>>>> understanding to the various views and concerns incoming. 
>>>> 
>>>> Repeating Paulo's words from last year: The chair is an administrative 
>>>> position that interfaces with the Apache Software Foundation Board, by 
>>>> submitting regular reports about project status and health. Read more 
>>>> about the PMC chair role on Apache projects:
>>>> - https://www.apache.org/foundation/how-it-works.html#pmc
>>>> - https://www.apache.org/foundation/how-it-works.html#pmc-chair
>>>> - https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
>>>> 
>>>> The PMC as a whole is the entity that oversees and leads the project and 
>>>> any PMC member can be approached as a representative of the committee. A 
>>>> list of Apache Cassandra PMC members can be found on: 
>>>> https://cassandra.apache.org/_/community.html

Apache TAC: assistance for travel to Berlin Buzzwords

2023-03-24 Thread Josh McKenzie

Cassandra Community!

The Travel Assistance Committee with the Apache Foundation is supporting travel 
to Berlin Buzzwords 2023 (https://2023.berlinbuzzwords.de, 18-20 June 2023) for 
up to 6 people. This conference has lined up pretty well with our project in 
the past and would probably be a great opportunity for folks from our community 
to attend: *"Germany’s most exciting conference on storing, processing, 
streaming and searching large amounts of digital data, with a focus on open 
source software projects"*.

Please see the below message from Gavin McDonald w/the TAC:



Hi All,

The ASF Travel Assistance Committee is supporting taking up to six (6)
people to attend Berlin Buzzwords In June this year.

This includes Conference passes, and travel & accommodation as needed.

Please see our website at https://tac.apache.org for more information and
how to apply.

Applications close on 15th April.

Good luck to those that apply.

Gavin McDonald (VP TAC)



~Josh

Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-24 Thread Josh McKenzie

> making sure that joining and leaving nodes update some state via Paxos 
> instead of via gossip
What kind of a time delivery risk does coupling CEP-15 with CEP-21 introduce 
(i.e. unk-unk on CEP-21 leading to delay cascades to CEP-15)? Seems like having 
a table we CAS state for on epochs wouldn't be *too *challenging, but I'm not 
familiar w/the details so I could be completely off here.

Being able to deliver both of these things on their own timetable seems like a 
pretty valuable thing assuming the lift required would be modest.

On Fri, Mar 24, 2023, at 6:15 AM, Benedict wrote:
> 
> Accord doesn’t have a hard dependency on CEP-21 fwiw, it just needs 
> linearizable epochs. This could be achieved with a much more modest patch, 
> essentially avoiding almost all of the insertion points of cep-21, just 
> making sure that joining and leaving nodes update some state via Paxos 
> instead of via gossip, and assign an epoch as part of the update.
> 
> It would be preferable to use cep-21 since it introduces this functionality, 
> and our intention is to use cep-21 for this. But it isn’t a hard dependency.
> 
> 
>> On 22 Mar 2023, at 20:58, Henrik Ingo  wrote:
>> 
>> Since Accord depends on transactional meta-data... is there really any 
>> alternative than what you propose?
>> 
>> Sure, if there is some subset of Accord that could be merged, while work 
>> continues on a branched that is based on CEP-21 branch, that would be great. 
>> Merging even a prototype of Accord to trunk probably has marketing value. 
>> (Don't laugh, many popular databases have had "atomic transactions, except 
>> if anyone executes DDL simultaneously".)
>> 
>> On Tue, Mar 14, 2023 at 8:39 PM Caleb Rackliffe  
>> wrote:
>>> We've already talked a bit 
>>> 
>>>  about how and when the current Accord feature branch should merge to 
>>> trunk. Earlier today, the cep-21-tcm branch was created 
>>>  to house 
>>> ongoing work on Transactional Metadata.
>>> 
>>> Returning to CASSANDRA-18196 
>>>  (merging Accord to 
>>> trunk) after working on some other issues, I want to propose changing 
>>> direction slightly, and make sure this makes sense to everyone else.
>>> 
>>> 1.) There are a few minor Accord test issues in progress that I'd like to 
>>> wrap up before doing anything, but those shouldn't take long. (See 
>>> CASSANDRA-18302  and 
>>> CASSANDRA-18320 .)
>>> 2.) Accord logically depends on Transactional Metadata.
>>> 3.) The new cep-21-tcm branch is going to have to be rebased to trunk on a 
>>> regular basis.
>>> 
>>> So...
>>> 
>>> 4.) I would like to pause merging cep-15-accord to trunk, and instead 
>>> rebase it on cep-21-tcm until such time as the latter merges to trunk, at 
>>> which point cep-15-accord can be rebased to trunk again and then merged 
>>> when ready, nominally taking up the work of CASSANDRA-18196 
>>>  again.
>>> 
>>> Any objections to this?
>> 
>> 
>> --
>> 
>> 
>> 
>> 
>> *Henrik Ingo*
>> 
>> *c*. +358 40 569 7354 
>> 
>> *w*. _www.datastax.com_
>> 
>> __   __   
>> __   __ 
>> 
>>

Re: [EXTERNAL] Re: [DISCUSS] Next release date

2023-03-24 Thread Josh McKenzie

> I would like to propose a partial freeze of 5.0 in June
My .02:
+1 to:
* partial freeze on an agreed upon date w/agreed upon other things that can 
optionally go in after
* setting a hard limit on when we ship from that frozen branch regardless of 
whether the features land or not

-1 to:
* ever feature freezing trunk again. :)

I worry about the labor involved with having very large work like this target a 
frozen branch and then also needing to pull it up to trunk. That doesn't sound 
fun.

If we resurrected the discussion about cutting alpha snapshots from trunk, 
would that change people's perspectives on the weight of this current decision? 
We'd probably also have to re-open pandora's box talking about the solidity of 
our API's on trunk as well if we positioned those alphas as being stable enough 
to start prototyping and/or building future applications against.

On Fri, Mar 24, 2023, at 9:59 AM, Brandon Williams wrote:
> I am +1 on a 5.0 branch freeze.
> 
> Kind Regards,
> Brandon
> 
> On Fri, Mar 24, 2023 at 8:54 AM Benjamin Lerer  wrote:
> >>
> >> Would that be a trunk freeze, or freeze of a cassandra-5.0 branch?
> >
> >
> > I was thinking of a cassandra-5.0 branch freeze. So branching 5.0 and 
> > allowing only CEP-15 and 21 + bug fixes there.
> > Le ven. 24 mars 2023 à 13:55, Paulo Motta  a 
> > écrit :
> >>
> >> >  I would like to propose a partial freeze of 5.0 in June.
> >>
> >> Would that be a trunk freeze, or freeze of a cassandra-5.0 branch? I agree 
> >> with a branch freeze, but not with trunk freeze.
> >>
> >> I might work on small features after June and would be happy to delay 
> >> releasing these on 5.0+, but delaying merge to trunk until 5.0 is released 
> >> could be disruptive to contributors workflows and I would prefer to avoid 
> >> that if possible.
> >>
> >> On Fri, Mar 24, 2023 at 6:37 AM Mick Semb Wever  wrote:
> >>>
> >>>
>  I would like to propose a partial freeze of 5.0 in June.
> 
>  …
> 
>  This partial freeze will be valid for every new feature except CEP-21 
>  and CEP-15.
> >>>
> >>>
> >>>
> >>> +1
> >>>
> >>> Thanks for summarising the thread this way Benjamin. This addresses my 
> >>> two main concerns: letting the branch/release date slip too much into the 
> >>> unknown, squeezing GA QA efforts, while putting in place exceptional 
> >>> waivers for CEP-21 and CEP-15.
> >>>
> >>> I hope that in the future we will be more willing to commit to the 
> >>> release train model: less concerned about "what the next release 
> >>> contains"; more comfortable letting big features land where they land. 
> >>> But this is opinion and discussion for another day… possibly looping back 
> >>> to the discussion on preview releases…
> >>>
> >>>
> >>> Do we have yet from anyone a (rough) eta on CEP-15 (post CEP-21) landing 
> >>> in trunk?
> >>>
> >>>
>

Re: [EXTERNAL] [DISCUSS] Next release date

2023-03-24 Thread Josh McKenzie

freeze will allow us to find those early and facilitate the 
>>>> integration of CEP 21 and 15 in my opinion. 
>>>> 
>>>> Le ven. 24 mars 2023 à 16:23, Jeremiah D Jordan 
>>>>  a écrit :
>>>>> Given the fundamental change to how cluster operations work coming from 
>>>>> CEP-21, I’m not sure what freezing early for “extra QA time” really buys 
>>>>> us?  I wouldn’t trust any multi-node QA done pre commit.
>>>>> What “stabilizing” do we expect to be doing during this time?  How much 
>>>>> of it do we just have to do again after those things merge?  I for one do 
>>>>> not like to have release branches cut months before their expected 
>>>>> release.  It just adds extra merge forward and “where should this go” 
>>>>> questions/overhead.  It could make sense to me to branch branch when 
>>>>> CEP-21 merges and only let in CEP-15 after that.  CEP-15 is mostly “net 
>>>>> new stuff” and not “changes to existing stuff” from my understanding?  So 
>>>>> no QA effort wasted if it is done before it merges.
>>>>> 
>>>>> -Jeremiah
>>>>> 
>>>>>> On Mar 24, 2023, at 9:38 AM, Josh McKenzie  wrote:
>>>>>> 
>>>>>>> I would like to propose a partial freeze of 5.0 in June
>>>>>> My .02:
>>>>>> +1 to:
>>>>>> * partial freeze on an agreed upon date w/agreed upon other things that 
>>>>>> can optionally go in after
>>>>>> * setting a hard limit on when we ship from that frozen branch 
>>>>>> regardless of whether the features land or not
>>>>>> 
>>>>>> -1 to:
>>>>>> * ever feature freezing trunk again. :)
>>>>>> 
>>>>>> I worry about the labor involved with having very large work like this 
>>>>>> target a frozen branch and then also needing to pull it up to trunk. 
>>>>>> That doesn't sound fun.
>>>>>> 
>>>>>> If we resurrected the discussion about cutting alpha snapshots from 
>>>>>> trunk, would that change people's perspectives on the weight of this 
>>>>>> current decision? We'd probably also have to re-open pandora's box 
>>>>>> talking about the solidity of our API's on trunk as well if we 
>>>>>> positioned those alphas as being stable enough to start prototyping 
>>>>>> and/or building future applications against.
>>>>>> 
>>>>>> On Fri, Mar 24, 2023, at 9:59 AM, Brandon Williams wrote:
>>>>>>> I am +1 on a 5.0 branch freeze.
>>>>>>> 
>>>>>>> Kind Regards,
>>>>>>> Brandon
>>>>>>> 
>>>>>>> On Fri, Mar 24, 2023 at 8:54 AM Benjamin Lerer  
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> Would that be a trunk freeze, or freeze of a cassandra-5.0 branch?
>>>>>>> >
>>>>>>> >
>>>>>>> > I was thinking of a cassandra-5.0 branch freeze. So branching 5.0 and 
>>>>>>> > allowing only CEP-15 and 21 + bug fixes there.
>>>>>>> > Le ven. 24 mars 2023 à 13:55, Paulo Motta  
>>>>>>> > a écrit :
>>>>>>> >>
>>>>>>> >> >  I would like to propose a partial freeze of 5.0 in June.
>>>>>>> >>
>>>>>>> >> Would that be a trunk freeze, or freeze of a cassandra-5.0 branch? I 
>>>>>>> >> agree with a branch freeze, but not with trunk freeze.
>>>>>>> >>
>>>>>>> >> I might work on small features after June and would be happy to 
>>>>>>> >> delay releasing these on 5.0+, but delaying merge to trunk until 5.0 
>>>>>>> >> is released could be disruptive to contributors workflows and I 
>>>>>>> >> would prefer to avoid that if possible.
>>>>>>> >>
>>>>>>> >> On Fri, Mar 24, 2023 at 6:37 AM Mick Semb Wever  
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>> I would like to propose a partial freeze of 5.0 in June.
>>>>>>> >>>>
>>>>>>> >>>> …
>>>>>>> >>>>
>>>>>>> >>>> This partial freeze will be valid for every new feature except 
>>>>>>> >>>> CEP-21 and CEP-15.
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> +1
>>>>>>> >>>
>>>>>>> >>> Thanks for summarising the thread this way Benjamin. This addresses 
>>>>>>> >>> my two main concerns: letting the branch/release date slip too much 
>>>>>>> >>> into the unknown, squeezing GA QA efforts, while putting in place 
>>>>>>> >>> exceptional waivers for CEP-21 and CEP-15.
>>>>>>> >>>
>>>>>>> >>> I hope that in the future we will be more willing to commit to the 
>>>>>>> >>> release train model: less concerned about "what the next release 
>>>>>>> >>> contains"; more comfortable letting big features land where they 
>>>>>>> >>> land. But this is opinion and discussion for another day… possibly 
>>>>>>> >>> looping back to the discussion on preview releases…
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Do we have yet from anyone a (rough) eta on CEP-15 (post CEP-21) 
>>>>>>> >>> landing in trunk?
>>>>>>> >>>
>>>>>>> >>>

Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-24 Thread Josh McKenzie

> FWIW, I'd still rather just integrate w/ TCM ASAP, avoiding integration risk 
> while accepting the possible delivery risk.
What does the chain of rebases against trunk look like here? cep-21-tcm rebase, 
then cep-15 on cep-21-tcm, then cep-7 on cep-21-tcm, then a race on whichever 
of 15 or 7 merge after 21 goes into trunk? Or 7 based on 15, the other way 
around...

I'm not actively working on any of these branches so take my perspective with 
that appropriate grain of salt, but the coupling of these things seems to have 
it's own kind of breed of integration pain to upkeep over time depending on how 
frequently we're rebasing against trunk.

> the question we want to answer is whether or not we build a throwaway patch 
> for linearizable epochs
Do we have an informed opinion on how long we think this would take? Seems like 
that'd help clarify whether or not there's contributors with the bandwidth and 
desire to even do that or whether everyone depending on cep-21 is our option.

On Fri, Mar 24, 2023, at 1:30 PM, Caleb Rackliffe wrote:
> I actually did a dry run rebase of cep-15-accord on top of cep-21-tcm here: 
> https://github.com/apache/cassandra/pull/2227
> 
> It wasn't too terrible, and I was actually able to get the main CQL-based 
> Accord tests working as long as I disabled automatic forwarding of CAS and 
> SERIAL read operations to Accord. The bigger issue was test stability in 
> cep-21-tcm. I'm sure that will mature quickly here, and I created a few 
> issues to attach to the Transactional Metadata epic 
> <https://issues.apache.org/jira/browse/CASSANDRA-18330>.
> 
> In the meantime, I rebased cep-15-accord on trunk at commit 
> 3eb605b4db0fa6b1ab67b85724a9cfbf00aae7de. The option to finish the remaining 
> bits of CASSANDRA-18196 
> <https://issues.apache.org/jira/browse/CASSANDRA-18196> and merge w/o TCM is 
> still available, but it sounds like the question we want to answer is whether 
> or not we build a throwaway patch for linearizable epochs in lieu of TCM?
> 
> FWIW, I'd still rather just integrate w/ TCM ASAP, avoiding integration risk 
> while accepting the possible delivery risk.
> 
> On Fri, Mar 24, 2023 at 9:32 AM Josh McKenzie  wrote:
>> __
>>> making sure that joining and leaving nodes update some state via Paxos 
>>> instead of via gossip
>> What kind of a time delivery risk does coupling CEP-15 with CEP-21 introduce 
>> (i.e. unk-unk on CEP-21 leading to delay cascades to CEP-15)? Seems like 
>> having a table we CAS state for on epochs wouldn't be *too *challenging, but 
>> I'm not familiar w/the details so I could be completely off here.
>> 
>> Being able to deliver both of these things on their own timetable seems like 
>> a pretty valuable thing assuming the lift required would be modest.
>> 
>> On Fri, Mar 24, 2023, at 6:15 AM, Benedict wrote:
>>> 
>>> Accord doesn’t have a hard dependency on CEP-21 fwiw, it just needs 
>>> linearizable epochs. This could be achieved with a much more modest patch, 
>>> essentially avoiding almost all of the insertion points of cep-21, just 
>>> making sure that joining and leaving nodes update some state via Paxos 
>>> instead of via gossip, and assign an epoch as part of the update.
>>> 
>>> It would be preferable to use cep-21 since it introduces this 
>>> functionality, and our intention is to use cep-21 for this. But it isn’t a 
>>> hard dependency.
>>> 
>>> 
>>>> On 22 Mar 2023, at 20:58, Henrik Ingo  wrote:
>>>> 
>>>> Since Accord depends on transactional meta-data... is there really any 
>>>> alternative than what you propose?
>>>> 
>>>> Sure, if there is some subset of Accord that could be merged, while work 
>>>> continues on a branched that is based on CEP-21 branch, that would be 
>>>> great. Merging even a prototype of Accord to trunk probably has marketing 
>>>> value. (Don't laugh, many popular databases have had "atomic transactions, 
>>>> except if anyone executes DDL simultaneously".)
>>>> 
>>>> On Tue, Mar 14, 2023 at 8:39 PM Caleb Rackliffe  
>>>> wrote:
>>>>> We've already talked a bit 
>>>>> <https://lists.apache.org/list?dev@cassandra.apache.org:2023-1:Merging%20CEP-15%20to%20trunk>
>>>>>  about how and when the current Accord feature branch should merge to 
>>>>> trunk. Earlier today, the cep-21-tcm branch was created 
>>>>> <https://lists.apache.org/thread/qkwnhgq02cn12jon2h565kh2gpzp9rry> to 
>>>>&g

Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-24 Thread Josh McKenzie

> If this is in a release, we then need to maintain that feature, so would be 
> against it.
Isn't the argument that cep-21 provides this so we could just remove the 
temporary impl and point to the new facility for this generation?

On Fri, Mar 24, 2023, at 3:22 PM, David Capwell wrote:
>> the question we want to answer is whether or not we build a throwaway patch 
>> for linearizable epochs
> 
> If this is in a release, we then need to maintain that feature, so would be 
> against it.
> 
> If this is for testing, then I would argue the current world is “fine”… 
> current world is hard to use and brittle (users need to tell accord that the 
> cluster changed), but if accord is rebasing on txn metadata then this won’t 
> be that way long (currently blocked from doing that due to txn metadata not 
> passing all tests yet).
> 
>> On Mar 24, 2023, at 12:12 PM, Josh McKenzie  wrote:
>> 
>>> FWIW, I'd still rather just integrate w/ TCM ASAP, avoiding integration 
>>> risk while accepting the possible delivery risk.
>> What does the chain of rebases against trunk look like here? cep-21-tcm 
>> rebase, then cep-15 on cep-21-tcm, then cep-7 on cep-21-tcm, then a race on 
>> whichever of 15 or 7 merge after 21 goes into trunk? Or 7 based on 15, the 
>> other way around...
>> 
>> I'm not actively working on any of these branches so take my perspective 
>> with that appropriate grain of salt, but the coupling of these things seems 
>> to have it's own kind of breed of integration pain to upkeep over time 
>> depending on how frequently we're rebasing against trunk.
>> 
>>> the question we want to answer is whether or not we build a throwaway patch 
>>> for linearizable epochs
>> Do we have an informed opinion on how long we think this would take? Seems 
>> like that'd help clarify whether or not there's contributors with the 
>> bandwidth and desire to even do that or whether everyone depending on cep-21 
>> is our option.
>> 
>> On Fri, Mar 24, 2023, at 1:30 PM, Caleb Rackliffe wrote:
>>> I actually did a dry run rebase of cep-15-accord on top of cep-21-tcm here: 
>>> https://github.com/apache/cassandra/pull/2227
>>> 
>>> It wasn't too terrible, and I was actually able to get the main CQL-based 
>>> Accord tests working as long as I disabled automatic forwarding of CAS and 
>>> SERIAL read operations to Accord. The bigger issue was test stability in 
>>> cep-21-tcm. I'm sure that will mature quickly here, and I created a few 
>>> issues to attach to the Transactional Metadata epic 
>>> <https://issues.apache.org/jira/browse/CASSANDRA-18330>.
>>> 
>>> In the meantime, I rebased cep-15-accord on trunk at commit 
>>> 3eb605b4db0fa6b1ab67b85724a9cfbf00aae7de. The option to finish the 
>>> remaining bits of CASSANDRA-18196 
>>> <https://issues.apache.org/jira/browse/CASSANDRA-18196> and merge w/o TCM 
>>> is still available, but it sounds like the question we want to answer is 
>>> whether or not we build a throwaway patch for linearizable epochs in lieu 
>>> of TCM?
>>> 
>>> FWIW, I'd still rather just integrate w/ TCM ASAP, avoiding integration 
>>> risk while accepting the possible delivery risk.
>>> 
>>> On Fri, Mar 24, 2023 at 9:32 AM Josh McKenzie  wrote:
>>>> __
>>>>> making sure that joining and leaving nodes update some state via Paxos 
>>>>> instead of via gossip
>>>> What kind of a time delivery risk does coupling CEP-15 with CEP-21 
>>>> introduce (i.e. unk-unk on CEP-21 leading to delay cascades to CEP-15)? 
>>>> Seems like having a table we CAS state for on epochs wouldn't be *too 
>>>> *challenging, but I'm not familiar w/the details so I could be completely 
>>>> off here.
>>>> 
>>>> Being able to deliver both of these things on their own timetable seems 
>>>> like a pretty valuable thing assuming the lift required would be modest.
>>>> 
>>>> On Fri, Mar 24, 2023, at 6:15 AM, Benedict wrote:
>>>>> 
>>>>> Accord doesn’t have a hard dependency on CEP-21 fwiw, it just needs 
>>>>> linearizable epochs. This could be achieved with a much more modest 
>>>>> patch, essentially avoiding almost all of the insertion points of cep-21, 
>>>>> just making sure that joining and leaving nodes update some state via 
>>>>> Paxos instead of via gossip, and assign an epoch as part of the update.
>>>>>

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-26 Thread Josh McKenzie

I want to second what Yifan's spoken to, specifically in terms of resource 
isolation and availability.

While the sidecar hasn't seen a ton of traffic and contributions since the 
acceptance into the project and clearance of CEP-1, my intuition is that that's 
due to the entrenched maturity of alternative sidecars out there since we were 
slow as a project to build one, not out of a lack of demand for a fully fleshed 
out sidecar. As functionality shows up in the ASF C* Sidecar, there's going to 
be tension as operators are incentivized to run both their bespoke sidecars 
they may be running alongside the ASF C* one. That's to be expected and a 
necessary pain to take on during a transition that I personally think is sorely 
needed.

Having bulk operations for analytics and for reading and writing SSTables is a 
pretty compelling carrot, and the more folks we can get running the sidecar and 
the more contributors active on it, the more we can expect to see interest and 
work show up there (repair coordination, REST API's, etc - all of which we've 
talked about before on ML or slack).

So I'm a strong +1 to it living in the sidecar.

On Sat, Mar 25, 2023, at 11:05 AM, Brandon Williams wrote:
> Oh, that's significantly different and great news, please do!  Thanks
> for the clarification, Doug!
> 
> Kind Regards,
> Brandon
> 
> On Fri, Mar 24, 2023 at 4:42 PM Doug Rohrer  wrote:
> >
> > I agree that the analytics library will need to support vnodes. To be 
> > clear, there’s nothing preventing the solution from working with vnodes 
> > right now, and no assumptions about a 1:1 topology between a token and a 
> > node. However, we don’t, today, have the ability to test vnode support 
> > end-to-end. We are working towards that, however, and should be able to 
> > remove the caveat from the released analytics library once we can properly 
> > test vnode support.
> > If it helps, I can update the CEP to say something more like “Caveat: 
> > Currently untested with vnodes - work is ongoing to remove this limitation” 
> > if that helps?
> >
> > Doug
> >
> > > On Mar 24, 2023, at 11:43 AM, Brandon Williams  wrote:
> > >
> > > On Fri, Mar 24, 2023 at 10:39 AM Jeremiah D Jordan
> > >  wrote:
> > >>
> > >> I have concerns with the majority of this being in the sidecar and not 
> > >> in the database itself.  I think it would make sense for the server side 
> > >> of this to be a new service exposed by the database, not in the sidecar. 
> > >>  That way it can be able to properly integrate with the authentication 
> > >> and authorization apis, and to make it a first class citizen in terms of 
> > >> having unit/integration tests in the main DB ensuring no one breaks it.
> > >
> > > I don't think this can/should happen until it supports the database's
> > > default configuration with vnodes.
> >
>

Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-03-27 Thread Josh McKenzie

I'll take build lead for the next 2 weeks.

On Sat, Mar 25, 2023, at 4:50 PM, Mick Semb Wever wrote:
>> Here comes Cassandra CI status for  2023-3-13 - 2023-23-179 :
>> 
>> *** CASSANDRA-18338  
>> -  dtest.bootstrap_test.TestBootstrap.test_cleanup trunk
>> ***  CASSANDRA-18338  
>> - test: 
>> org.apache.cassandra.distributed.test.ByteBuddyExamplesTest.countTest , this 
>> failed twice with jdk 8 and jdk 11, on trunk and  4.1
>> others are some timeout exception.
> 
> 
> New failures from Week 12
> *** A possible regression from CASSANDRA-18328 on 2.x to 3.x dtest upgrades
> 
> otherwise all test failures are timeouts.
> 
> We need volunteers for the Build Lead the weeks ahead. 
> 
> 
>

Re: [EXTERNAL] [DISCUSS] Next release date

2023-03-30 Thread Josh McKenzie

So to confirm, let's make sure we all agree on the definition of "stabilize".

Using the definition as "green run of all tests on circleci, no regressions on 
ASF CI" that we used to get 4.1 out the door, and combined with the metric of 
"feature branches don't merge until their CI is green on at least CircleCI and 
don't regress on ASF CI"... that boils down to:

a) do we have test failures on circle on trunk right now, and
b) do we have regressions on trunk on ASF CI compared to 4.1

Whether or not new features land near the cutoff date or not shouldn't impact 
the above right?

I'm receptive to another definition of "stabilize", having a time-boxed on 
calendar window for people to run a beta or RC, or whatever. But my 
understanding is that the above was our general consensus in the 4.1 window.

I definitely could be wrong. :)

On Thu, Mar 30, 2023, at 5:22 AM, Benjamin Lerer wrote:
>> but otherwise I don't recall anything that we could take as an indicator 
>> that a next release would take a comparable amount of time to 4.1?
> 
> Do we have any indicator that proves that it will take less time? We never 
> managed to do a release in 2 or 3 months so far. Until we have actually 
> proven that we could do it, I simply prefer assuming that we cannot and plan 
> for the worst.
> 
> We have a lot of significant features that have or will land soon and our 
> experience suggests that those merges usually bring their set of 
> instabilities. The goal of the proposal was to make sure that we get rid of 
> them before TCM and Accord land to allow us to more easily identify the root 
> causes of problems. Gaining time on the overall stabilization process. I am 
> fine with people not liking the proposal. Nevertheless, simply hoping that it 
> will take us 2 months to stabilize the release seems pretty optimistic to me. 
> Do people have another plan in mind for ensuring a short stabilization period?
> 
> 
> Le lun. 27 mars 2023 à 09:20, Henrik Ingo  a écrit :
>> Not so fast...
>> 
>> There's certainly value in spending that time stabilizing the already done 
>> features. It's valuable triaging information to say this used to work before 
>> CEP-21 and only broke after it.
>> 
>> That said, having a very long freeze of trunk, or alternatively having a 
>> very long lived 5.0 branch that is waiting for Accord and diverging with a 
>> trunk that is not frozen... are both undesirable options. (A month or two 
>> could IMO be discussed though.) So I agree with the concern from that point 
>> of view, I just don't agree that having one batch of big features in 
>> stabilization period is zero value.
>> 
>> 
>> henrik
>> 
>> 
>> 
>> On Fri, Mar 24, 2023 at 5:23 PM Jeremiah D Jordan 
>>  wrote:
>>> Given the fundamental change to how cluster operations work coming from 
>>> CEP-21, I’m not sure what freezing early for “extra QA time” really buys 
>>> us?  I wouldn’t trust any multi-node QA done pre commit.
>>> What “stabilizing” do we expect to be doing during this time?  How much of 
>>> it do we just have to do again after those things merge?  I for one do not 
>>> like to have release branches cut months before their expected release.  It 
>>> just adds extra merge forward and “where should this go” 
>>> questions/overhead.  It could make sense to me to branch branch when CEP-21 
>>> merges and only let in CEP-15 after that.  CEP-15 is mostly “net new stuff” 
>>> and not “changes to existing stuff” from my understanding?  So no QA effort 
>>> wasted if it is done before it merges.
>>> 
>>> -Jeremiah
>>> 
>>>> On Mar 24, 2023, at 9:38 AM, Josh McKenzie  wrote:
>>>> 
>>>>> I would like to propose a partial freeze of 5.0 in June
>>>> My .02:
>>>> +1 to:
>>>> * partial freeze on an agreed upon date w/agreed upon other things that 
>>>> can optionally go in after
>>>> * setting a hard limit on when we ship from that frozen branch regardless 
>>>> of whether the features land or not
>>>> 
>>>> -1 to:
>>>> * ever feature freezing trunk again. :)
>>>> 
>>>> I worry about the labor involved with having very large work like this 
>>>> target a frozen branch and then also needing to pull it up to trunk. That 
>>>> doesn't sound fun.
>>>> 
>>>> If we resurrected the discussion about cutting alpha snapshots from trunk, 
>>>> would that change people's perspectives on the weight of this current 
>>&

Re: [EXTERNAL] [DISCUSS] Next release date

2023-04-01 Thread Josh McKenzie

> in practice we wait and receive bug reports from downstream testing efforts. 
> Such testing isn't necessarily possible pre-commit, e.g. third-party and not 
> feasible to continuously run, nor appropriate to upstream/open-source.
> 
> We want GA releases to be production ready for any cluster at any scale. So I 
> guess in practice for us Stable Trunk != GA, but that's ok
I agree with this sentiment, and I also am uncomfortable with how vague this 
presently is. I'd be happier with something concrete like the following 
expected release flow:

1) We freeze a branch
2) To hit RC, we need green circle + no regression on ASF (or green ASF in the 
future when stable)
3) We need N weeks in this frozen state for people to test it out
4) Once we have both 2 and 3, we RC and GA

I definitely think there should be a time-boxed element of it but as far as I 
know we haven't really settled on that and it's pretty hand-wavy. Probably moot 
as in the past getting tests to green gave us enough calendar time that folks 
had plenty of time to test out the betas and RC's, but in a world where tests 
are stable, that shifts us to needing to set a minimum time on calendar to give 
folks time to test.

Is it too prescriptive to say "we'll be frozen on a branch for at least 8 weeks 
so folks can test out the betas"? (I ask because I know I can get a little 
"structure-happy" at times).


On Fri, Mar 31, 2023, at 9:17 AM, Mick Semb Wever wrote:
> 
>> We have a lot of significant features that have or will land soon and our 
>> experience suggests that those merges usually bring their set of 
>> instabilities. The goal of the proposal was to make sure that we get rid of 
>> them before TCM and Accord land to allow us to more easily identify the root 
>> causes of problems. 
>>> 
>> 
>>  
> Agree with Benjamin that testing in phases, i.e. separate periods of time, 
> has positives that we can take advantage of.
> 
> 
> 
>> a) do we have test failures on circle on trunk right now, and
>> b) do we have regressions on trunk on ASF CI compared to 4.1
>> 
>> Whether or not new features land near the cutoff date or not shouldn't 
>> impact the above right?
> 
> 
> I don't think it can be limited to the above. They are our minimum 
> requirements to getting to beta, to rc, and to GA. But in practice we wait 
> and receive bug reports from downstream testing efforts. Such testing isn't 
> necessarily possible pre-commit, e.g. third-party and not feasible to 
> continuously run, nor appropriate to upstream/open-source.
> 
> We want GA releases to be production ready for any cluster at any scale. So I 
> guess in practice for us Stable Trunk != GA, but that's ok – just being 
> honest to the ideal we are moving towards.
>

Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-04 Thread Josh McKenzie

I think there's competing dynamics here.

1) KEYSPACE isn't that great of a name; it's not a space in which keys are 
necessarily unique, and you can't address things just by key w/out their 
respective tables
2) DATABASE isn't that great of a name either due to the aforementioned 
ambiguity.

Something like "TABLESPACE" or 'TABLEGROUP" would *theoretically* better 
satisfy point 1 and 2 above but subjectively I kind of recoil at both equally. 
So there's that.

On Tue, Apr 4, 2023, at 12:30 PM, Abe Ratnofsky wrote:
> I agree with Bowen - I find Keyspace easier to communicate with. There are 
> plenty of situations where the use of "database" is ambiguous (like "Could 
> you help me connect to database x?"), but Keyspace refers to a single thing. 
> I think more software is moving towards calling these things "namespaces" 
> (like Kubernetes), and while "Keyspaces" is not a term used in this way 
> elsewhere, I still find it leads to clearer communication.
> 
> --
> Abe
> 
> 
>> On Apr 4, 2023, at 9:24 AM, Andrés de la Peña  wrote:
>> 
>> I think supporting DATABASE is a great idea. 
>> 
>> It's better aligned with SQL databases, and can save new users one of the 
>> first troubles they find. 
>> 
>> Probably anyone starting to use Cassandra for the first time is going to 
>> face the what is a keyspace? question in the first minutes. Saving that to 
>> users with a more common name would be a victory for usability IMO.
>> 
>> On Tue, 4 Apr 2023 at 16:48, Mike Adamson  wrote:
>>> Hi,
>>> 
>>> I'd like to propose that we add DATABASE to the CQL grammar as an 
>>> alternative to KEYSPACE. 
>>> 
>>> Background: While TABLE was introduced as an alternative for COLUMNFAMILY 
>>> in the grammar we have kept KEYSPACE for the container name for a group of 
>>> tables. Nearly all traditional SQL databases use DATABASE as the container 
>>> name for a group of tables so it would make sense for Cassandra to adopt 
>>> this naming as well.
>>> 
>>> KEYSPACE would be kept in the grammar but we would update some logging and 
>>> documentation to encourage use of the new name. 
>>> 
>>> Mike Adamson
>>> 
>>> --
>>> DataStax Logo Square 
>>> *Mike Adamson*
>>> Engineering
>>> +1 650 389 6000  | datastax.com 
>>> Find DataStax Online:
>>> LinkedIn Logo 
>>> 
>>>Facebook Logo 
>>> 
>>>Twitter Logo    RSS Feed 
>>>    Github Logo 
>>>

Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-06 Thread Josh McKenzie

> KEYSPACE is fine. If we want to introduce a standard nomenclature like 
> DATABASE that’s also fine. Inventing brand new ones is not fine, there’s no 
> benefit.
I'm with Benedict in principle, with Aleksey in practice; I think KEYSPACE and 
SCHEMA are actually fine enough.

If and when we get to any kind of multi-tenancy, having a more metaphorical 
abstraction that users are familiar with like these becomes more valuable; it's 
pretty clear that things in different keyspaces, different databases, or even 
different schemas could have different access rules, resourcing, etc from one 
another.

While the off-the-cuff logical TABLEGROUP thing is a *literal* statement about 
what the thing is, it'd be another unique term to us;  we have enough things in 
our system where we've charted our own path. My personal .02 is we don't need 
to go adding more. :)

On Thu, Apr 6, 2023, at 8:54 AM, Mick Semb Wever wrote:
> 
>> … but that should be a different discussion about how we evolve config.
> 
>  
> I disagree. Nomenclature being difficult can benefit from holistic and 
> forward thinking.
> Sure you can label this off-topic if you like, but I value our discuss 
> threads being collaborative in an open-mode. Sometimes the best idea is on 
> the tail end of a sequence of bad and/or unpopular ideas.
> 
> 
>> 
>

Re: [VOTE] CEP-26: Unified Compaction Strategy

2023-04-06 Thread Josh McKenzie

+1

On Thu, Apr 6, 2023, at 12:18 PM, Joseph Lynch wrote:
> +1
> 
> This proposal looks really exciting!
> 
> -Joey
> 
> On Wed, Apr 5, 2023 at 2:13 AM Aleksey Yeshchenko  wrote:
> >
> > +1
> >
> > On 4 Apr 2023, at 16:56, Ekaterina Dimitrova  wrote:
> >
> > +1
> >
> > On Tue, 4 Apr 2023 at 11:44, Benjamin Lerer  wrote:
> >>
> >> +1
> >>
> >> Le mar. 4 avr. 2023 à 17:17, Andrés de la Peña  a 
> >> écrit :
> >>>
> >>> +1
> >>>
> >>> On Tue, 4 Apr 2023 at 15:09, Jeremy Hanna  
> >>> wrote:
> 
>  +1 nb, will be great to have this in the codebase - it will make nearly 
>  every table's compaction work more efficiently.  The only possible 
>  exception is tables that are well suited for TWCS.
> 
>  On Apr 4, 2023, at 8:00 AM, Berenguer Blasi  
>  wrote:
> 
>  +1
> 
>  On 4/4/23 14:36, J. D. Jordan wrote:
> 
>  +1
> 
>  On Apr 4, 2023, at 7:29 AM, Brandon Williams  wrote:
> 
>  
>  +1
> 
>  On Tue, Apr 4, 2023, 7:24 AM Branimir Lambov  wrote:
> >
> > Hi everyone,
> >
> > I would like to put CEP-26 to a vote.
> >
> > Proposal:
> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy
> >
> > JIRA and draft implementation:
> > https://issues.apache.org/jira/browse/CASSANDRA-18397
> >
> > Up-to-date documentation:
> > https://github.com/blambov/cassandra/blob/CASSANDRA-18397/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md
> >
> > Discussion:
> > https://lists.apache.org/thread/8xf5245tclf1mb18055px47b982rdg4b
> >
> > The vote will be open for 72 hours.
> > A vote passes if there are at least three binding +1s and no binding 
> > vetoes.
> >
> > Thanks,
> > Branimir
> 
> 
> >
>

Re: [VOTE] Release Apache Cassandra 4.0.9 - SECOND ATTEMPT

2023-04-13 Thread Josh McKenzie

+1

On Thu, Apr 13, 2023, at 3:17 AM, Benjamin Lerer wrote:
> +1
> 
> Le jeu. 13 avr. 2023 à 08:56, Tommy Stendahl via dev 
>  a écrit :
>> +1 (nb)
>> 
>> -Original Message-
>> *From*: Brandon Williams > >
>> *Reply-To*: dev@cassandra.apache.org
>> *To*: dev@cassandra.apache.org
>> *Subject*: Re: [VOTE] Release Apache Cassandra 4.0.9 - SECOND ATTEMPT
>> *Date*: Tue, 11 Apr 2023 05:30:59 -0500
>> 
>> +1
>> 
>> 
>> On Tue, Apr 11, 2023 at 2:54 AM Miklosovic, Stefan
>> <
>> stefan.mikloso...@netapp.com
>> > wrote:
>>> 
>>> Lets just vote on that straight away. Nothing significant has changed 
>>> except zstd-jni update to 1.5.5. If all goes well it would be nice to have 
>>> the vote resolved by this Friday's noon UTC.
>>> 
>>> Proposing the test build of Cassandra 4.0.9 for release.
>>> 
>>> sha1: e9f8f2efa2ba75f223f31ca6801aff3fe2964745
>>> Git: 
>>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0.9-tentative
>>> 
>>> Maven Artifacts: 
>>> https://repository.apache.org/content/repositories/orgapachecassandra-1286/org/apache/cassandra/cassandra-all/4.0.9/
>>> 
>>> 
>>> The Source and Build Artifacts, and the Debian and RPM packages and 
>>> repositories, are available here: 
>>> https://dist.apache.org/repos/dist/dev/cassandra/4.0.9/
>>> 
>>> 
>>> The vote will be open for 72 hours (longer if needed). Everyone who has 
>>> tested the build is invited to vote. Votes by PMC members are considered 
>>> binding. A vote passes if there are at least three binding +1s and no -1's.
>>> 
>>> [1]: CHANGES.txt: 
>>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0.9-tentative
>>> 
>>> [2]: NEWS.txt: 
>>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0.9-tentative

Re: (CVE only) support for 3,11 beyond published EOL

2023-04-13 Thread Josh McKenzie

> We already have an understanding and precedence in place that CVEs on
> the previous unmaintained branch are addressed and released.
Correct me if I'm wrong German, but the question I got from your email was 
effectively "If we  consider formalizing our commitment to fixing CVE's on 
older branches that are out of formal bugfix support as a community, what are 
the benefits and costs to doing that"?

On Thu, Apr 13, 2023, at 2:47 PM, Mick Semb Wever wrote:
> >
> > There have been several discussions on slack [1], [2] to support 3.11 
> > beyond the date stated on the web [3] which is May-July 23 and given it's 
> > April that's an unlikely date.
> >
> 
> 
> Strictly speaking it is maintained until the 5.0 GA release. We should
> update the downloads page accordingly.
> 
> 
> >
> > So we will support anyway but I would like to start a broader discussion if 
> > we, the community, are interested in at a minimum CVE only support, maybe 
> > bug fixes as well,  after 5.0 is released for 3.11 and if so for how long - 
> > something like a Cassandra LTS policy.
> >
> 
> 
> 
> The community's resources are limited, and the statement is intended
> to avoid tying up resources and to avoid letting users down. This is
> open source and "to upgrade" is often our easy and pragmatic answer.
> 
> It is not a statement that fixes to older branches will be rejected. A
> (two) committers can still push to older branches, and a release can
> still happen if you find someone to do it (and three PMCs to +1 it).
> This is why the 2.2 branch is still present on ci-cassandra.a.o. If
> vendors want to provide support for versions longer and can make the
> commitment to upstream those efforts (whether that's bug-fixes and
> releases, or only bug-fixes) the machinery is in place to accept it.
> 
> We already have an understanding and precedence in place that CVEs on
> the previous unmaintained branch are addressed and released.
>

Re: [DISCUSS] Next release date

2023-04-16 Thread Josh McKenzie

> 2. When CEP-15 lands we cut alpha1,
> 2a. The deadline is first week of October, anything not yet in
> cassandra-5.0 is not in 5.0,
> 2b. We expect a minimum two months of testing and beta+rc releases
> to get to GA.
To clarify, is the intent here to say "The deadline for cutoff is 1st week of 
October for everything, including CEP-15"? Or is the intent to say "we don't 
cut alpha1 until CEP-15 lands"?

On Sun, Apr 16, 2023, at 7:11 PM, Mick Semb Wever wrote:
> >
> >> We have a lot of significant features that have or will land soon and our 
> >> experience suggests that those merges usually bring their set of 
> >> instabilities. The goal of the proposal was to make sure that we get rid 
> >> of them before TCM and Accord land to allow us to more easily identify the 
> >> root causes of problems.
> >
> >
> > Agree with Benjamin that testing in phases, i.e. separate periods of time, 
> > has positives that we can take advantage of.
> >
> 
> 
> Where did we land on this?
> 
> With the following intentions:
> - moving towards the goal of annual releases, with a cadence 12±3 months 
> apart,
> - the branch to GA period being 2-3 months,
> - avoiding any type of freeze on trunk,
> - getting a release out by December's Summit,
> - freeing up folk to start QA (that makes sense to start) immediately
> 
> ;I'm going to suggest the following…
> 
> 1. Once all CEPs except CEP-21 and CEP-15 land we branch cassandra-5.0,
> 1a. QA starts on cassandra-5.0,
> 1b. CEP-21 and CEP-15 are waivered to land in cassandra-5.0, and
> forward-merge to trunk,
> 
> 2. When CEP-15 lands we cut alpha1,
> 2a. The deadline is first week of October, anything not yet in
> cassandra-5.0 is not in 5.0,
> 2b. We expect a minimum two months of testing and beta+rc releases
> to get to GA.
> 
> 
> Additional notes,
> 1) "Once all CEPs" includes jdk17 and extending TTL tickets.
> 1) We ask folk to be considerate of what they commit in trunk wrt to
> the inbound CEP-21.
> 1a) There's an understanding of what needs to be re-tested after CEP-21.
> 2) The initial release may be beta1, we make that call at that time.
> 2a) features not complete can still be in 5.0 as experimental and not
> enabled by default.
> 2b) If CEP-15 lands Aug/Sept, then the earliest possible GA release
> date is in October.
> 
> I feel this proposal will give us evidence and help put us back on
> track for a release train model with a shorter QA2GA period, and
> aiming for a 5.1 release a bit earlier in the 2024 year (e.g. Q3).
> 
> If we agree on this proposal I will update our downloads page (ref
> German's thread).
>

Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-04-17 Thread Josh McKenzie

Sorry for the delay on the summary of build lead for those two weeks.

We had quite a few new failures during the 2 week period:

CASSANDRA-18427

Test failure: pdtest: 
dtest-novnode.ttl_test.TestDistributedTTL.test_ttl_is_respected_on_delayed_replication

CASSANDRA-18426

Test failure: pdtest: 
dtest-upgrade.upgrade_tests.cql_tests.TestCQLNodes2RF1_Upgrade_current_2_1_x_To_indev_3_11_x.test_clustering_order_and_functions

CASSANDRA-18425

Test failure: utest: 
org.apache.cassandra.db.RepairedDataTombstonesTest.readTestPartitionTombstones-.jdk11


CASSANDRA-18393

Flaky test: 
org.apache.cassandra.cql3.validation.operations.InsertUpdateIfConditionTest.testConditionalUpdate[1:
 clusterMinVersion=3.11]-compression.jdk1.8 on trunk

CASSANDRA-18392

flaky test 
org.apache.cassandra.net.ConnectionTest.testMessageDeliveryOnReconnect-compression.jdk1.8
 on trunk

CASSANDRA-18391

consistent timeout: 
dtest-upgrade.upgrade_tests.cql_tests.cls.test_cql3_non_compound_range_tombstones
 on trunk


Quite a few other failures that were timeouts as well. Can see the spikiness on 
butler here for trunk: https://butler.cassandra.apache.org/#/, however it looks 
like we've settled back down to ~ 6 failures right now.

On Mon, Mar 27, 2023, at 12:27 PM, Josh McKenzie wrote:
> I'll take build lead for the next 2 weeks.
> 
> On Sat, Mar 25, 2023, at 4:50 PM, Mick Semb Wever wrote:
>>> Here comes Cassandra CI status for  2023-3-13 - 2023-23-179 :
>>> 
>>> *** CASSANDRA-18338 <https://issues.apache.org/jira/browse/CASSANDRA-18338> 
>>> -  dtest.bootstrap_test.TestBootstrap.test_cleanup trunk
>>> ***  CASSANDRA-18338 
>>> <https://issues.apache.org/jira/browse/CASSANDRA-18338> - test: 
>>> org.apache.cassandra.distributed.test.ByteBuddyExamplesTest.countTest , 
>>> this failed twice with jdk 8 and jdk 11, on trunk and  4.1
>>> others are some timeout exception.
>> 
>> 
>> New failures from Week 12
>> *** A possible regression from CASSANDRA-18328 on 2.x to 3.x dtest upgrades
>> 
>> otherwise all test failures are timeouts.
>> 
>> We need volunteers for the Build Lead the weeks ahead. 
>> 
>> 
>>

Re: [DISCUSS] Next release date

2023-04-17 Thread Josh McKenzie

So to bring us back to the goals and alignment here:

> With the following intentions:
> - moving towards the goal of annual releases, with a cadence 12±3 months 
> apart,
> - the branch to GA period being 2-3 months,
> - avoiding any type of freeze on trunk,
> - getting a release out by December's Summit,
> - freeing up folk to start QA (that makes sense to start) immediately
So what I *think* falls out logically:

1. We branch cassandra-5.0 on August 1st
2. We expect an 8-12 week validation cycle which means GA Oct1-Nov1.
3a. If we allow merge of CEP-15 / CEP-21 after branch, we risk invalidating 
stabilization and risk our 2023 GA date
3b. If we don't allow merge of CEP-15 / CEP-21 after branch, we risk needing a 
fast-follow release and don't have functional precedent for the snapshots we 
earlier agreed upon doing.

Does that distill it and match everyone else's understanding?

On Mon, Apr 17, 2023, at 2:20 PM, Mick Semb Wever wrote:
> 
> 
> On Mon, 17 Apr 2023 at 19:31, Caleb Rackliffe  
> wrote:
>> ...or just cutting a 5.0 branch when CEP-21 is ready.
>> 
>> There's nothing stopping us from testing JDK17 and TTL bits in trunk before 
>> that.
>> 
>> On Mon, Apr 17, 2023 at 11:25 AM Caleb Rackliffe  
>> wrote:
>>> > Once all CEPs except CEP-21 and CEP-15 land we branch cassandra-5.0
>>> 
>>> For the record, I'm not convinced this is necessarily better than just 
>>> cutting a cassandra-5.0 branch on 1 October.
> 
> 
> How else would this work without being akin to a feature freeze on trunk.
> 
> We want (need) as much time as possible to test. We have no evidence that it 
> will be quicker than 4.1, we have to create that evidence. Those folk that 
> free up and are ready to get ahead and de-risk our testing efforts should be 
> given a release branch to make their work easier and to give us that evidence 
> in a more controlled manner so that we can plan better next time. Appreciate 
> that there's one too many variables here, but I'm sticking up for the testing 
> efforts here.

Re: [DISCUSS] Next release date

2023-04-17 Thread Josh McKenzie

I failed to address:
> - freeing up folk to start QA (that makes sense to start) immediately
I think there's a pre-freeze set of QA that makes sense to do and a 
post-freeze. What we decide on mergeability of large bodies of work around that 
branch date will inform what qualifies as a "freeze" in this regard.

On Mon, Apr 17, 2023, at 3:06 PM, Josh McKenzie wrote:
> So to bring us back to the goals and alignment here:
> 
>> With the following intentions:
>> - moving towards the goal of annual releases, with a cadence 12±3 months 
>> apart,
>> - the branch to GA period being 2-3 months,
>> - avoiding any type of freeze on trunk,
>> - getting a release out by December's Summit,
>> - freeing up folk to start QA (that makes sense to start) immediately
> So what I *think* falls out logically:
> 
> 1. We branch cassandra-5.0 on August 1st
> 2. We expect an 8-12 week validation cycle which means GA Oct1-Nov1.
> 3a. If we allow merge of CEP-15 / CEP-21 after branch, we risk invalidating 
> stabilization and risk our 2023 GA date
> 3b. If we don't allow merge of CEP-15 / CEP-21 after branch, we risk needing 
> a fast-follow release and don't have functional precedent for the snapshots 
> we earlier agreed upon doing.
> 
> Does that distill it and match everyone else's understanding?
> 
> On Mon, Apr 17, 2023, at 2:20 PM, Mick Semb Wever wrote:
>> 
>> 
>> On Mon, 17 Apr 2023 at 19:31, Caleb Rackliffe  
>> wrote:
>>> ...or just cutting a 5.0 branch when CEP-21 is ready.
>>> 
>>> There's nothing stopping us from testing JDK17 and TTL bits in trunk before 
>>> that.
>>> 
>>> On Mon, Apr 17, 2023 at 11:25 AM Caleb Rackliffe  
>>> wrote:
>>>> > Once all CEPs except CEP-21 and CEP-15 land we branch cassandra-5.0
>>>> 
>>>> For the record, I'm not convinced this is necessarily better than just 
>>>> cutting a cassandra-5.0 branch on 1 October.
>> 
>> 
>> How else would this work without being akin to a feature freeze on trunk.
>> 
>> We want (need) as much time as possible to test. We have no evidence that it 
>> will be quicker than 4.1, we have to create that evidence. Those folk that 
>> free up and are ready to get ahead and de-risk our testing efforts should be 
>> given a release branch to make their work easier and to give us that 
>> evidence in a more controlled manner so that we can plan better next time. 
>> Appreciate that there's one too many variables here, but I'm sticking up for 
>> the testing efforts here.
>

Re: [DISCUSS] Next release date

2023-04-17 Thread Josh McKenzie

> it's (b) for me, and everything minus 21 and 15 is defining enough to warrant 
> the branching and a checkpoint where testing can start
Ok, I don't follow.

There's three different ways I can read what you're saying here:
 1. "Everything we have targeting 5.x is substantial and we can branch when 
it's done", that'd mean 600+ open tickets, so it can't be that: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20cassandra%20and%20fixversion%20%3D%205.x%20and%20resolution%20%3D%20unresolved
 2. "Everything we've *already merged today* targeting 5.x is substantial and 
we can branch now"... I don't think that's quite right? Since that'd put the 
release far too early after December '22.
 3. "Everything we expect to merge by August 1st, regardless of CEP-21 and 
CEP-15, is substantial enough for us to cut a release then" - that's arguing 
for a feature-driven release rather than a train right?
Sorry; I'm definitely not *trying* to be obtuse, I'm just having a hard time 
understanding how what the two of you are saying actually lines up.

My personal .02: I think we should consider branching 5.0 September 1st. That 
gives us basically 12 weeks for folks to do their testing and for us to 
stabilize anything that's flaky in circle or regressed in ASF CI.

If CEP-15 or CEP-21 land shortly after (early September), we can cross that 
bridge when the time comes.

That's my hot take. We move to a train model and stick with it, and we start to 
get comfortable with cutting feature previews or snapshot alphas like we agreed 
to for earlier access to new stuff.

On Mon, Apr 17, 2023, at 4:25 PM, Mick Semb Wever wrote:
> 
>> b.) Cut a 5.0 branch when the major release-defining element (maybe CEP-21?) 
>> merges to trunk, with the shared understanding (possibly what we're 
>> disagreeing about) that very little of what we need to test/de-risk is going 
>> to be inhibited by not cutting that branch earlier (and that certain testing 
>> efforts would be almost wholesale wasted if done beforehand).
> 
> 
> Yup, it's (b) for me, and everything minus 21 and 15 is defining enough to 
> warrant the branching and a checkpoint where testing can start and not be 
> wasted.  I understand that cep-21 changes a lot and that impacts testing, but 
> I wholeheartedly trust testers to be taking this into consideration. 
>

Re: [DISCUSS] Next release date

2023-04-17 Thread Josh McKenzie

> WFM, if that means we branch there and anything not already merged has to wait
I think there might be value in us exploring the "how we cut snapshots" in 
terms of allowing us to fast-follow for big features folks may want to get 
their hands on. If we stick to the same "green circle no regression ASF", I 
suspect we'd be in a pretty good position overall to cut a snapshot from trunk 
quarterly as we discussed.

And to be clear, I am 100% uninterested in us re-opening the Pandora's Box Of 
Sadness that was the semver discussion on this thread :). If we all still agree 
that cutting snapshots is good, and that's a way for us to "have our cake and 
eat it too" when it comes to sticking with a train model, then I think the ends 
justify the means and we can zombie the other thread and power through it.

On Mon, Apr 17, 2023, at 4:39 PM, Caleb Rackliffe wrote:
> > My personal .02: I think we should consider branching 5.0 September 1st. 
> > That gives us basically 12 weeks for folks to do their testing and for us 
> > to stabilize anything that's flaky in circle or regressed in ASF CI.
> 
> WFM, if that means we branch there and anything not already merged has to wait
> 
> 
> On Mon, Apr 17, 2023 at 3:37 PM Josh McKenzie  wrote:
>> __
>>> it's (b) for me, and everything minus 21 and 15 is defining enough to 
>>> warrant the branching and a checkpoint where testing can start
>> Ok, I don't follow.
>> 
>> There's three different ways I can read what you're saying here:
>>  1. "Everything we have targeting 5.x is substantial and we can branch when 
>> it's done", that'd mean 600+ open tickets, so it can't be that: 
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20cassandra%20and%20fixversion%20%3D%205.x%20and%20resolution%20%3D%20unresolved
>>  2. "Everything we've *already merged today* targeting 5.x is substantial 
>> and we can branch now"... I don't think that's quite right? Since that'd put 
>> the release far too early after December '22.
>>  3. "Everything we expect to merge by August 1st, regardless of CEP-21 and 
>> CEP-15, is substantial enough for us to cut a release then" - that's arguing 
>> for a feature-driven release rather than a train right?
>> Sorry; I'm definitely not *trying* to be obtuse, I'm just having a hard time 
>> understanding how what the two of you are saying actually lines up.
>> 
>> My personal .02: I think we should consider branching 5.0 September 1st. 
>> That gives us basically 12 weeks for folks to do their testing and for us to 
>> stabilize anything that's flaky in circle or regressed in ASF CI.
>> 
>> If CEP-15 or CEP-21 land shortly after (early September), we can cross that 
>> bridge when the time comes.
>> 
>> That's my hot take. We move to a train model and stick with it, and we start 
>> to get comfortable with cutting feature previews or snapshot alphas like we 
>> agreed to for earlier access to new stuff.
>> 
>> On Mon, Apr 17, 2023, at 4:25 PM, Mick Semb Wever wrote:
>>> 
>>>> b.) Cut a 5.0 branch when the major release-defining element (maybe 
>>>> CEP-21?) merges to trunk, with the shared understanding (possibly what 
>>>> we're disagreeing about) that very little of what we need to test/de-risk 
>>>> is going to be inhibited by not cutting that branch earlier (and that 
>>>> certain testing efforts would be almost wholesale wasted if done 
>>>> beforehand).
>>> 
>>> 
>>> Yup, it's (b) for me, and everything minus 21 and 15 is defining enough to 
>>> warrant the branching and a checkpoint where testing can start and not be 
>>> wasted.  I understand that cep-21 changes a lot and that impacts testing, 
>>> but I wholeheartedly trust testers to be taking this into consideration. 
>>> 
>>

Re: [DISCUSS] Next release date

2023-04-17 Thread Josh McKenzie

> If this is true, why do we even bother running any CI before the CEP-21 
> merge? It will all be invalidated anyway, right?
I'm referring to manual validation or soak testing in qa environments rather 
than automated. Just because a soft-frozen branch without those features works 
in QA doesn't mean a branch after those features merge will be equally 
qualified.

There's certainly value in rolling out a soft-frozen branch before those 
features merge, but we will also need to be very deliberate and clear about 
having a minimum bound of time for people to perform testing *after* these two 
features merge as well if we go the "freeze but let them merge after" route.

On Mon, Apr 17, 2023, at 5:12 PM, Mick Semb Wever wrote:
>> My personal .02: I think we should consider branching 5.0 September 1st. 
>> That gives us basically 12 weeks for folks to do their testing and for us to 
>> stabilize anything that's flaky in circle or regressed in ASF CI.
> 
> 
> I'm not for this, sorry. I see the real risk here of there being no GA 
> release this year.
> 
> My proposal was based on reading through the thread and gathering what I saw 
> to be the best middle ground for everyone. It's not my first choice, but as a 
> middle ground I can accept it.
> 
> Caleb, you appear to be the only one objecting, and it does not appear that 
> you have made any compromises in this thread. Can I ask that you do?  I (and 
> others) do see that letting testing start as soon as it can, where they can, 
> as an important tactic to de-risking an important 5.0 release.
> 
>

Re: [DISCUSS] [PATCH] Enable Direct I/O For CommitLog Files

2023-04-18 Thread Josh McKenzie

I took the liberty of creating 
https://issues.apache.org/jira/browse/CASSANDRA-18464 linking to this email 
thread w/the contents of your email and applying the patch to that ticket. 
Probably want to have some lower level discussions there when we find you a 
reviewer.

On Tue, Apr 18, 2023, at 2:10 PM, Pawar, Amit wrote:
> [Public]
> 
> 
> Hi,
>  
> I shared my investigation about Commitlog I/O issue on large core count 
> system in my previous email dated July-22 and link to the thread is given 
> below.
> https://lists.apache.org/thread/xc5ocog2qz2v2gnj4xlw5hbthfqytx2n
>  
> Basically, two solutions looked possible to improve the CommitLog I/O.
>  1. Multi-threaded syncing
>  2. Using Direct-IO through JNA
>  
> I worked on 2nd option considering the following benefit compared to the 
> first one
>  1. Direct I/O read/write throughput is very high compared to non-Direct I/O. 
> Learnt through FIO benchmarking.
>  2. Reduces kernel file cache uses which in-turn reduces kernel I/O activity 
> for Commitlog files only.
>  3. Overall CPU usage reduced for flush activity. JVisualvm shows CPU usage < 
> 30% for Commitlog syncer thread with Direct I/O feature
>  4. Direct I/O implementation is easier compared to multi-threaded
>  
> As per the community suggestion, less in code complex is good to have. Direct 
> I/O enablement looked promising but there was one issue.
> Java version 8 does not have native support to enable Direct I/O. So, JNA 
> library usage is must. The same implementation should also work across other 
> versions of Java (like 11 and beyond).
>  
> I have completed Direct I/O implementation and summary of the attached patch 
> changes are given below.
>  1. This implementation is not using Java file channels and file is opened 
> through JNA to use Direct I/O feature.
>  2. New Segment are defined named “DirectIOSegment”  for Direct I/O and 
> “NonDirectIOSegment” for non-direct I/O (NonDirectIOSegment is test purpose 
> only).
>  3. JNA write call is used to flush the changes.
>  4. New helper functions are defined in NativeLibrary.java and platform 
> specific file. Currently tested on Linux only.
>  5. Patch allows user to configure optimum block size  and alignment if 
> default values are not OK for CommitLog disk.
>  6. Following configuration options are provided in Cassandra.yaml file
>1. use_jna_for_commitlog_io : to use jna feature
>2. use_direct_io_for_commitlog : to use Direct I/O feature.
>3. direct_io_minimum_block_alignment: 512 (default)
>4. nvme_disk_block_size: 32MiB (default and can be changed as per the 
> required size)
>  
> Test matrix is complex so CommitLog related testcases and TPCx-IOT benchmark 
> was tested. It works with both Java 8 and 11 versions. Compressed and 
> Encrypted based segments are not supported yet and it can be enabled later 
> based on the Community feedback.
>  
> Following improvement are seen with Direct I/O enablement.
>  1. 32 cores >= ~15%
>  2. 64 cores >= ~80%
>  
> Also, another observation would like to share here. Reading Commitlog files 
> with Direct I/O might help in reducing node bring-up time after the node 
> crash.
>  
> Tested with commit ID: 91f6a9aca8d3c22a03e68aa901a0b154d960ab07
>  
> The attached patch enables Direct I/O feature for Commitlog files. Please 
> check and share your feedback.
>  
> Thanks,
> Amit
> 
> *Attachments:*
>  • UseDirectIOFeatureForCommitLogFiles.patch

Re: [DISCUSS] Next release date

2023-04-19 Thread Josh McKenzie

Let me try to break this down another way:

I see a few competing concerns, each with QA related time requirements 
(asserting 8 weeks minimum, 16 weeks maximum we should plan for to stabilize a 
GA):
 1. A freeze to a branch to stabilize for release (8-16 weeks of QA required 
after we branch)
 2. A freeze to a branch to make room for large complex work to have increased 
velocity on merge due to having a more stable destination (8-16 weeks of QA 
required after they merge)
 3. A commitment to release once a year (for our purposes, we've defined this 
as calendar year) (8-16 weeks of QA required *before*).
If we walk backwards from Dec 1, that means our latest date to freeze and 
validate a 5.0 branch would be Friday August 11; let's go with 1st Friday in 
August for simplicity, 2023-08-04. That would give us just over 16 weeks 
worst-case to stabilize.

So we branch for 5.0 *at the latest* on 2023-08-04; I think we can all agree on 
this?

So the next question: when do we branch for 5.0 *at the earliest*? Pros and 
cons of an earlier branch:
Pros:
 • Earlier start of validation testing on a more stable base (no improvements 
or new features excepting CEP-15 and CEP-21)
 • Theoretically higher velocity of completion of CEP-15 and CEP-21 (the team 
doing this can speak to the degree to which this is true)
Cons:
 • Smaller amount of improvements and new features go into 5.0
 • The rest of the dev community has another branch they need to target with 
bugfixes (annoying but not _too_ costly since bugfixes are often a bit smaller 
in scope)

Through this lens, we are weighing the belief that CEP-15 and CEP-21 will land 
by August 1st and be accelerated by branching early against the belief that 
other improvements and features will go in if we branch later; if we freeze 
today and neither CEP-15 nor CEP-21 land for unforeseen reasons, we will have a 
GA release that had a shortened amount of time for new features and 
improvements to be merged in.

Lastly, as input data to the discussion, here's a list of all the new features 
and improvements in 5.0 as of today; hypothetically were we to freeze 5.0 today 
and worst-case unforeseen things lead to CEP-21 and CEP-15 not landing by 
cutoff, this would be our feature-set for our next GA: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20fixversion%20%3D%205.0%20and%20fixversion%20not%20in%20(4.0.x%2C%204.1.x%2C%204.0%2C%204.1%2C%204.1.1%2C%204.1.2%2C%204.0.8%2C%204.1-alpha1%2C%204.1-alpha1%2C%204.1-beta2%2C%204.1-beta1%2C%204.1-rc1)%20and%20type%20in%20(%22New%20Feature%22%2C%20%22Improvement%22)%20and%20component%20!%3D%20Accord%20order%20by%20type%20desc%2C%20resolved%20desc

Phew. Ok, so using the above framework, I'm personally ok with us freezing 5.0 
earlier than August 1st if the engineers actively on CEP-15 and CEP-21 indicate 
that it will appreciably increase their velocity. The list of improvements and 
features is substantial enough that an earlier freeze would still have enough 
in it to be "meaty" in my opinion; especially given the likelihood of CEP-25 
(Trie-indexed SSTable format) landing relatively soon.

So the next question to me is: "when"? On that I defer to Sam, Alex, Benedict, 
Blake, David, et. al: how much would freezing 5.0 early help in terms of your 
development velocity on TrM and Accord?

On Wed, Apr 19, 2023, at 6:22 AM, Henrik Ingo wrote:
> I'm going to repeat the point from my own thread: rather than thinking of 
> this as some kind of concession to two exceptional CEPs, could we rather take 
> the point of view that they get their own space and time precisely because 
> they are large and invasive and both the merge and testing of them will 
> benefit from everything else in the branch quieting down?
> 
> I'm also not particularly interested in a long feature freeze beyond 1-3 
> months that would serve the above purpose well.
> 
> In short: the proposal should not be that everyone else just have to sit 
> still and wait for two late stragglers. The proposal is merely to organise 
> work such that we maximise velocity and quality for merging cep-15&21. 
> Anything beyond that should be judged differently.
> 
> On Tue, 18 Apr 2023, 23:48 J. D. Jordan,  wrote:
>> 
>> I also don’t really see the value in “freezing with exceptions for two giant 
>> changes to come after the freeze”.
>> 
>> -Jeremiah
>> 
>>> On Apr 18, 2023, at 1:08 PM, Caleb Rackliffe  
>>> wrote:
>>> 
>>> > Caleb, you appear to be the only one objecting, and it does not appear 
>>> > that you have made any compromises in this thread.
>>> 
>>> All I'm really objecting to is making special exceptions for particular 
>>> CEPs in relation to our freeze date. In other words, let's not have a 
>>> pseudo-freeze date and a "real" freeze date, when the thing that makes the 
>>> latter supposedly necessary is a very invasive change to the database that 
>>> risks our desired GA date. Also, again, I don't understand how cutting a 
>>> 5.0 branch

Cassandra project status, 2023-04-25

2023-04-25 Thread Josh McKenzie

We have a town hall coming up! The URL for the meetup can be found here:
https://www.meetup.com/cassandra-global/events/292858262/. This will be held
tomorrow at 12pm EST.

Jon Haddad (https://www.linkedin.com/in/rustyrazorblade/) will be discussing
performance tuning on Apache Cassandra, I'll be chatting about what's going on
with the project leading up to 5.0, and Lorina will be covering how to get
involved contributing to docs on the project. The full agenda can be found
here:
https://docs.google.com/document/d/14U4IGnKn8r7PPxF8Lc_leTcVfD8oW_p9oHW-BRbL3yY/edit#.
Looking forward to seeing you there!

Apache Cassandra 4.0.9 was released back on April 15th - see the release thread
here: https://lists.apache.org/thread/ymr90v3l6fokwr885l1fsmfzr04tgpmn

[New Contributors Getting Started]
We've hand curated tickets we consider good to get started with on the project
- check the list out here:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2454&quickFilter=2652&quickFilter=2162&quickFilter=2160.
We have 30 tickets for the next upcoming major to pick from there, so don't
wait; shop now!

Come hang out with us in the #cassandra-dev channel on
https://the-asf.slack.com (reply to me on this email if you need an invite for
your account), and reach out to the @cassandra_mentors alias with questions.

[Dev mailing list]
https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-3-20|dto=2023-4-21:

Up against 40 threads in the past month. Let's do this.

We've had a long and glorious discussion about our next release date, when
we're going to branch, when we're going to freeze, and what code goes where.
https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3. There's no
real resolution as yet on the thread but I think the tail end of it has a
fairly clear summation of where we got (thanks to yours truly). With the caveat
that some folks are very wary of a date-driven freeze date as our "drop dead
date" for cutting the 5.0 branch.

Jonathan Ellis came forward with a very promising prototype to add ANN
vector-based search (as well as support for the data type) in Cassandra
natively. There's a lot of interesting stuff going on in the ML space right
now, and having the linear scale of Cassandra for a vectorized dataset could
provide some really impressive scaling and augmentation characteristics for
LLM's post-training and model building. Check out the thread here:
https://lists.apache.org/thread/xl8shmknrrxp3w06s7byjlytn260781g

Amit Parwal opened up a thread about using Direct I/O for the CommitLog with
some very promising benchmarks. We opened up the following JIRA to track future
collaboration on the topic
(https://issues.apache.org/jira/browse/CASSANDRA-18464, email thread here:
https://lists.apache.org/thread/j6ny17q2rhkp7jxvwxm69dd6v1dozjrg)

German Eichberger had some questions around how we as a project community
intend to support branches that fall out of formal bugfix support in the event
of CVE's. That can be found here:
https://lists.apache.org/thread/owqlclzbq333dz68ryqw8z1md7s3fcmx. The loose
consensus there seems to be that it's a very infrequent occurrance, we can
cross that bridge when we come to it, and it'd be a valuable thing for a vendor
to be able to step in and offer.

The vote for CEP-28, Unified Compaction Strategy passed easily:
https://lists.apache.org/thread/k5fg3mn43j701pdskpc1j8r1h9c20qk1. Excited to
see this land in the project Branimir.

Mike Adamson apparently walked directly into a third rail when it comes to how
we name things, with the seemingly innocuous question as to whether we'd
considered adding an alternative to "keyspace" in the form of "database".
https://lists.apache.org/thread/9hf6x577ggf4r4lwss5jx22p8zy210b5. The TL'DR: we
probably shouldn't, but we already support the usage of "schema" in its place.
TIL (back when the thread hit...)

Stefan Miklosovic reached out looking for more feedback on unifying the system
properties and environment variables in the CassandraRelevantProperties and
CassandraRelevantEnv classes. This is definitely one that's going to impact all
of us so if you have either experience in the space and/or strong opinions on
things you want heard before the trigger is pulled, now's your chance:
https://issues.apache.org/jira/browse/CASSANDRA-17797

Maxim Muzafarov reached out around building a consensus regarding whether
settings in vtables should be updatable.
https://lists.apache.org/thread/z169kk31lzmvyor7pkwn2h17nor59bfq. This was on
the tail end of a long thread on a related topic
(https://lists.apache.org/list?dev@cassandra.apache.org:gte=1d:Allow%20UPDATE%20on%20settings),
but I don't see that anyone engaged. Aleksey keeps up with the sisyphean task
of pointing out that most, if not all, of our config fields shouldn't be
volatile and we've been carrying along that pattern for reasons largely
unknown. I commend you sir.

And last and arguably leas

Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-25 Thread Josh McKenzie

To be fair Dinesh kind of primed that:

> Do you intend to make this part of CEP-7 or as an incremental update to SAI 
> once it is committed?
;)

I think this body of work more than stands on its own. Great work Jonathan, 
Mike, and Zhao; having native support for more ML-oriented workloads in C* 
would be a big win for a bunch of our users and plays into our architectural 
strengths in a lot of ways too.

On Tue, Apr 25, 2023, at 7:35 PM, Henrik Ingo wrote:
> Jonathan what a great proposal/code. An enjoyable read. And at least for me 
> educational! (Which is notable, as you're on my turf, I'm a Data Science 
> major.)
> 
> Sorry for splitting hairs but CEP-7 (as a spec, and wiki page) is approved 
> and voted on and I assume there's no proposal to change that. That said, work 
> of course continues beyond CEP-7 and this is not the only SAI feature that 
> adds on top of the CEP-7 foundation.
> 
> I just wanted to clarify so there's no confusion later.
> 
> henrik
> 
> On Sat, Apr 22, 2023 at 10:41 PM Jonathan Ellis  wrote:
>> My guess is that I will be able to get this ready to upstream before the 
>> rest of CEP-7 goes in, so it would make sense to me to roll it into that.
>> 
>> On Fri, Apr 21, 2023 at 5:34 PM Dinesh Joshi  wrote:
>>> Interesting proposal Jonathan. Will grok it over the weekend and play 
>>> around with the branch.
>>> 
>>> Do you intend to make this part of CEP-7 or as an incremental update to SAI 
>>> once it is committed?
>>> 
 On Apr 21, 2023, at 2:19 PM, Jonathan Ellis  wrote:

 Happy Friday, everyone!

 Rich text formatting ahead, I've attached a PDF for those who prefer that.
 **

 I propose adding approximate nearest neighbor (ANN) vector search 
 capability to Apache Cassandra via storage-attached indexes (SAI). This is 
 a medium-sized effort that will significantly enhance Cassandra’s 
 functionality, particularly for AI use cases. This addition will not only 
 provide a new and important feature for existing Cassandra users, but also 
 attract new users to the community from the AI space, further expanding 
 Cassandra’s reach and relevance.
 Introduction

 Vector search is a powerful document search technique that enables 
 developers to quickly find relevant content within an extensive collection 
 of documents, which is useful as a standalone technique, but it is 
 particularly hot now because it significantly enhances the performance of 
 LLMs.

 Vector search uses ML models to match the semantics of a question rather 
 than just the words it contains, avoiding the classic false positives and 
 false negatives associated with term-based search.  Alessandro Benedetti 
 gives some good examples in his _excellent talk_ 
 :

 You can search across any set of vectors, which are just ordered sets of 
 numbers.  In the context of natural language queries and document search, 
 we are specifically concerned with a type of vector called an *embedding*. 

 An embedding is a high-dimensional vector that captures the underlying 
 semantic relationships and contextual information of words or phrases. 
 Embeddings are generated by ML models trained for this purpose; OpenAI 
 provides an API to do this, but open-source and self-hostable models like 
 BERT are also popular. Creating more accurate and smaller embeddings are 
 active research areas in ML.

 Large language models (LLMs) can be described as a mile wide and an inch 
 deep. They are not experts on any narrow domain (although they will 
 hallucinate that they are, sometimes convincingly).  You can remedy this 
 by giving the LLM additional context for your query, but the context 
 window is small (4k tokens for GPT-3.5, up to 32k for GPT-4), so you want 
 to be very selective about giving the LLM the most relevant possible 
 information.

 Vector search is red-hot now because it allows us to easily answer the 
 question “what are the most relevant documents to provide as context” by 
 performing a similarity search between the embeddings vector of the query, 
 and those of your document universe.  Doing exact search is prohibitively 
 expensive, since you necessarily have to compare with each and every 
 document; this is intractable when you have billions or trillions of docs. 
  However, there are well-understood algorithms for turning this into a 
 logarithmic problem if you are willing to accept *approximately *the most 
 similar documents.  This is the “approximate nearest neighbor” problem.  
 (You will see these referred to as kNN – k nearest neighbors – or ANN.)

 _Pinecone DB has a

Re: [DISCUSS] New data type for vector search

2023-04-27 Thread Josh McKenzie

>From a machine learning perspective, vectors are a well-known concept that are 
>effectively immutable fixed-length n-dimensional values that are then later 
>used either as part of a model or in conjunction with a model after the fact.

While we could have this be non-frozen and not call it a vector, I'd be 
inclined to still make the argument for a layer of syntactic sugar on top that 
met ML users where they were with concepts they understood rather than forcing 
them through the cognitive lift of figuring out the Cassandra specific 
contortions to replicate something that's ubiquitous in their space. We did the 
same "Cassandra-first" approach with our JSON support and that didn't do us any 
favors in terms of adoption and usage as far as I know.

So is the goal here to provide something specific and idiomatic for the ML 
community or is the goal to make a primitive that's C*-centric that then 
another layer can write to? I personally argue for the former; I don't see this 
specific data type going away any time soon.

On Thu, Apr 27, 2023, at 12:39 PM, David Capwell wrote:
>> but as you point out it has the problem of allowing nulls.
> 
> If nulls are not allowed for the elements, then either we need  a) a new 
> type, or b) add some way to say elements may not be null…. As much as I do 
> like b, I am leaning towards new type for this use case.
> 
> So, to flesh out the type requirements I have seen so far
> 
> 1) represents a fixed size array of element type
> * on write path we will need to validate this
> 2) element may not be null
> * on write path we will need to validate this
> 3) “frozen” (is this really a requirement for the type or is this just 
> simpler for the ANN work?  I feel that this shouldn’t be a requirement)
> 4) works for all types (my requirement; original proposal is float only, but 
> could logically expand to primitive types)
> 
> Anything else?
> 
>> The key thing about a vector is that unlike lists or tuples you really don't 
>> care about individual elements, you care about doing vector and matrix 
>> multiplications with the thing as a unit. 
> 
> That maybe true for this use case, but “should” this be true for the type 
> itself?  I feel like no… if a user wants the Nth element of a vector why 
> would we block them?  I am not saying the first patch, or even 5.0 adds 
> support for index access, I am just trying to push back saying that the type 
> should not block this.
> 
>> (Maybe this is making the case for VECTOR FLOAT[N] rather than FLOAT 
>> VECTOR[N].)
> 
> Now that nulls are not allowed, I have mixed feelings about FLOAT[N], I 
> prefer this syntax but that limitation may not be desired for all use cases… 
> we could always add LIST and ARRAY later to address that 
> case.
> 
> In terms of syntax I have seen, here is my ordered preference:
> 
> 1) TYPE[size] - have mixed feelings due to non-null, but still prefer it
> 2) QUALIFIER TYPE[size] - QUALIFIER is just a Term we use to denote this 
> semantic…. Could even be NON NULL TYPE[size]
> 
>> On Apr 27, 2023, at 9:00 AM, Benedict  wrote:
>> 
>> 
>> That’s a bounded ring buffer, not a fixed length array.
>> 
>> This definitely isn’t a tuple because the types are all the same, which is 
>> pretty crucial for matrix operations. Matrix libraries generally work on 
>> arrays of known dimensionality, or sparse representations.
>> 
>> Whether we draw any semantic link between the frozen list and whatever we do 
>> here, it is fundamentally a frozen list with a restriction on its size. What 
>> we’re defining here are “statically” sized arrays, whereas a frozen list is 
>> essentially a dynamically sized array.
>> 
>> I do not think vector is a good name because vector is used in some other 
>> popular languages to mean a (dynamic) list, which is confusing when we also 
>> have a list concept.
>> 
>> I’m fine with just using the FLOAT[N] syntax, and drawing no direct link 
>> with list. Though it is a bit strange that this particular type declaration 
>> looks so different to other collection types.
>> 
>>> On 27 Apr 2023, at 16:48, Jeff Jirsa  wrote:
>>> 
>>> 
>>> 
>>> On Thu, Apr 27, 2023 at 7:39 AM Jonathan Ellis  wrote:
 It's been a while, so I may be missing something, but do we already have 
 fixed-size lists?  If not, I don't see why we'd try to make this fit into 
 a List-shaped problem.
>>> 
>>> We do not. The proposal got closed as wont-fix  
>>> https://issues.apache.org/jira/browse/CASSANDRA-9110
>>> 
>>>

Re: [DISCUSS] New data type for vector search

2023-05-01 Thread Josh McKenzie

> If we want to make an ML-specific data type, it should be in an ML plug-in.
How can we encourage a healthier plug-in ecosystem? As far as I know it's been 
pretty anemic historically:

cassandra: https://cassandra.apache.org/doc/latest/cassandra/plugins/index.html
postgres: https://www.postgresql.org/docs/current/contrib.html

I'm really interested to hear if there's more in the ecosystem I'm not aware of 
or if there's been strides made in this regard; users in the ecosystem being 
able to write durable extensions to Cassandra that they can then distribute and 
gain momentum could potentially be a great incubator for new features or 
functionality in the ecosystem.

If our support for extensions remains as bare as I believe it to be, I wouldn't 
recommend anyone go that route.

On Mon, May 1, 2023, at 4:17 PM, Benedict wrote:
> 
> I have explained repeatedly why I am opposed to ML-specific data types. If we 
> want to make an ML-specific data type, it should be in an ML plug-in. We 
> should not pollute the general purpose language with hastily-considered 
> features that target specific bandwagons - at best partially - no matter how 
> exciting the bandwagon.
> 
> I think a simple and easy case can be made for fixed length array types that 
> do not seem to create random bits of cruft in the language that dangle by 
> themselves should this play not pan out. This is an easy way for this effort 
> to make progress without negatively impacting the language.
> 
> That is, unless we want to start supporting totally random types for every 
> use case at the top level language layer. I don’t think this is a good idea, 
> personally, and I’m quite confident we would now be regretting this approach 
> had it been taken for earlier bandwagons.
> 
> Nor do I think anyone’s priors about how successful this effort will be 
> should matter. As a matter of principle, we should simply never deliver a 
> specialist functionality as a high level CQL language feature without at 
> least baking it for several years as a plug-in.
> 
>> On 1 May 2023, at 21:03, Mick Semb Wever  wrote:
>> 
>> 
>> Yes!  What you (David) and Benedict write beautifully supports `VECTOR 
>> FLOAT[n]` imho.
>> 
>> You are definitely bringing up valid implementation details, and that can be 
>> dealt with during patch review. This thread is about the CQL API addition.  
>> 
>> No matter which way the technical review goes with the implementation 
>> details, `VECTOR FLOAT[n]` does not limit it, and gives us the most ML 
>> idiomatic approach and the best long-term CQL API.  It's a win-win situation 
>> – no matter how you look at it imho it is the best solution api wise.  
>> 
>> Unless the suggestion is that an ideal implementation can give us a better 
>> CQL API – but I don't see what that could be.   Maybe the suggestion is we 
>> deny the possibility of using the VECTOR keyword and bring us back to 
>> something like `NON-NULL FROZEN`.   This is odd to me because 
>> `VECTOR` here can be just an alias for `NON-NULL FROZEN` while meeting the 
>> patch's audience and their idioms.  I have no problems with introducing such 
>> an alias to meet the ML crowd.
>> 
>> Another way I think of this is
>>  `VECTOR FLOAT[n]` is the porcelain ML cql api,
>>  `NON-NULL FROZEN` and `FROZEN` and `FLOAT[n]` are the 
>> general-use plumbing cql apis. 
>> 
>> This would allow implementation details to be moved out of this thread and 
>> to the review phase.
>> 
>> 
>> 
>> 
>> On Mon, 1 May 2023 at 20:57, David Capwell  wrote:
>>> > I think it is totally reasonable that the ANN patch (and Jonathan) is not 
>>> > asked to implement on top of, or towards, other array (or other) new data 
>>> > types.
>>> 
>>> 
>>> This impacts serialization, if you do not think about this day 1 you then 
>>> can’t add later on without having to worry about migration and versioning… 
>>> 
>>> Honestly I wanted to better understand the cost to be generic and the 
>>> impact to ANN, so I took 
>>> https://github.com/jbellis/cassandra/blob/vsearch/src/java/org/apache/cassandra/db/marshal/VectorType.java
>>>  and made it handle every requirement I have listed so far (size, null, all 
>>> types)… the current patch has several bugs at the type level that would 
>>> need to be fixed, so had to fix those as well…. Total time to do this was 
>>> 10 minutes… and this includes adding a method "public float[] 
>>> composeAsFloats(ByteBuffer bytes)” which made the change to existing logic 
>>> small (change VectorType.Serializer.instance.deserialize(buffer) to 
>>> type.composeAsFloats(buffer))….
>>> 
>>> Did this have any impact to the final ByteBuffer?  Nope, it had identical 
>>> layout for the FloatType case, but works for all types…. I didn’t change 
>>> the fact we store the size (felt this could be removed, but then we could 
>>> never support expanding the vector in the future…)
>>> 
>>> So, given the fact it takes a few minutes to implement all these 
>>> requirements, I do find it v

Re: [POLL] Vector type for ML

2023-05-05 Thread Josh McKenzie

Idiomatically, to my mind, there's a question of "what space are we thinking 
about this datatype in"?

- In the context of mathematics, nullability in a vector would be 0
- In the context of Cassandra, nullability tends to mean a tombstone (or 
nothing)
- In the context of programming languages, it's all over the place

Given many models are exploring quantizing to int8 and other data types, 
there's definitely the "support other data types easily in the future" piece to 
me we need to keep in mind.

So with the above and the "meet the user where they are and don't make them 
understand more of Cassandra than absolutely critical to use it", I lean:

1. DENSE_VECTOR
2. VECTOR
3. type[dimension]

This leaves the path open for us to expand on it in the future with sparse 
support and allows us to introduce some semantics that indicate idioms around 
nullability for the users coming from a different space.

"NON-NULL FROZEN" is strictly correct, however it requires 
understanding idioms of how Cassandra thinks about data (nulls mean different 
things to us, we have differences between frozen and non-frozen due to 
constraints in our storage engine and materialization of data, etc) that get in 
the way of users doing things in the pattern they're familiar with without 
learning more about the DB than they're probably looking to learn. Historically 
this has been a challenge for us in adoption; the classic "Why can't I just 
write and delete and write as much as I want? Why are deletes filling up my 
disk?" problem comes to mind.

I'd also be happy with us supporting:
* NON-NULL FROZEN
* DENSE_VECTOR as syntactic sugar for the above

If getting into the "built-in syntactic sugar mapping for communities and 
specific use-cases" is something we're willing to consider.

On Fri, May 5, 2023, at 7:26 AM, Patrick McFadin wrote:
> I think we are still discussing implementation here when I'm talking about 
> developer experience. I want developers to adopt this quickly, easily and be 
> successful. Vector search is already a thing. People use it every day. A 
> successful outcome, in my view, is developers picking up this feature without 
> reading a manual. (Because they don't anyway and get in trouble) I did some 
> more extensive research about what other DBs are using for syntax. The 
> consensus is some variety of 'VECTOR', 'DENSE' and 'SPARSE'
> 
> Pinecone[1] - dense_vector, sparse_vector
> Elastic[2]: dense_vector
> Milvus[3]: float_vector, binary_vector
> pgvector[4]: vector
> Weaviate[5]: Different approach. All typed arrays can be indexed
> 
> Based on that I'm advocating a similar syntax:
> 
> - DENSE VECTOR
> or
> - VECTOR
> 
> [1] https://docs.pinecone.io/docs/hybrid-search
> [2] 
> https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
> [3] https://milvus.io/docs/create_collection.md
> [4] https://github.com/pgvector/pgvector
> [5] https://weaviate.io/developers/weaviate/config-refs/datatypes
> 
> On Fri, May 5, 2023 at 6:07 AM Mike Adamson  wrote:
>>> Then we can have the indexing apparatus only accept *frozen* for 
>>> the HSNW case.
>> I'm inclined to agree with Benedict that the index will need to be 
>> specifically select by option rather than inferred based on type. As such 
>> there is no real reason for the *frozen* requirement on the type. The hnsw 
>> index can be built just as easily from a non-frozen array.
>> 
>> I am in favour of enforcing non-null on the elements of an array by default. 
>> I would prefer that allowing nulls in the array would be a later addition if 
>> and when a use case arose for it.
>> 
>> On Fri, 5 May 2023 at 03:02, Caleb Rackliffe  
>> wrote:
>>> Even in the ML case, sparse can just mean zeros rather than nulls, and they 
>>> should compress similarly anyway.
>>> 
>>> If we really want null values, I'd rather leave that in collections space.
>>> 
>>> On Thu, May 4, 2023 at 8:59 PM Caleb Rackliffe  
>>> wrote:
 I actually still prefer *type[dimension]*, because I think I intuitively 
 read this as a primitive (meaning no null elements) array. Then we can 
 have the indexing apparatus only accept *frozen* for the HSNW 
 case.

 If that isn't intuitive to anyone else, I don't really have a strong 
 opinion...but...conflating "frozen" and "dense" seems like a bad idea. One 
 should indicate single vs. multi-cell, and the other the presence or 
 absence of nulls/zeros/whatever.

 On Thu, May 4, 2023 at 12:51 PM Patrick McFadin  wrote:
> I agree with David's reasoning and the use of DENSE (and maybe eventually 
> SPARSE). This is terminology well established in the data world, and it 
> would lead to much easier adoption from users. VECTOR is close, but I can 
> see having to create a lot of content around "How to use it and not get 
> in trouble." (I have a lot of that content already)
> 
>  - We don't have to explain what it is. A lot of prior art out there 
> alre

Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Josh McKenzie

+1

On Tue, May 9, 2023, at 2:42 PM, Patrick McFadin wrote:
> +1
> 
> On Tue, May 9, 2023 at 10:58 AM Caleb Rackliffe  
> wrote:
>> +1
>> 
>> On Tue, May 9, 2023 at 12:04 PM Piotr Kołaczkowski  
>> wrote:
>>> Let's vote.
>>> 
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator
>>> 
>>> Piotr Kołaczkowski
>>> e. pkola...@datastax.com
>>> w. www.datastax.com

[DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-16 Thread Josh McKenzie

Similar to what we've done with accord in 
https://issues.apache.org/jira/browse/CASSANDRA-18204, I'd like to discuss 
bringing cassandra-harry in-tree as a submodule. repo link: 
https://github.com/apache/cassandra-harry

Given the value it's brought to the project's stabilization efforts and the 
movement of other things in the ecosystem to being more integrated (accord, 
build-scripts https://issues.apache.org/jira/browse/CASSANDRA-18133), I think 
having the testing framework better localized and integrated would be a net 
benefit for adoption, awareness, maintenance, and tighter workflows as we 
troubleshoot future failures it surfaces.

I'd also like to see us get a Harry run integrated as part of our pre-commit CI 
(a 5 minute simple soak test for instance) and having that local in this 
fashion should make that a cleaner integration as well.

Thoughts?

Re: [DISCUSS] Feature branch version hygiene

2023-05-18 Thread Josh McKenzie

CEP-N seems like a good compromise. NextMajorRelease bumps into our 
interchangeable use of "Major" and "Minor" from a semver perspective and could 
get confusing. Suppose we could do NextFeatureRelease, but at that point why 
not just have it linked to the CEP and have the epic set.

On Thu, May 18, 2023, at 12:26 AM, Caleb Rackliffe wrote:
> ...otherwise I'm fine w/ just the CEP name, like "CEP-7" for SAI, etc.
> 
> On Wed, May 17, 2023 at 11:24 PM Caleb Rackliffe  
> wrote:
>> So when a CEP slips, do we have to create a 5.1-cep-N? Could we just have a 
>> version that's "NextMajorRelease" or something like that? It should still be 
>> pretty easy to bulk replace if we have something else to filter on, like 
>> belonging to an epic?
>> 
>> On Wed, May 17, 2023 at 6:42 PM Mick Semb Wever  wrote:
>>> 
>>> 
>>> On Tue, 16 May 2023 at 13:02, J. D. Jordan  
>>> wrote:
 Process question/discussion. Should tickets that are merged to CEP feature 
 branches, like  https://issues.apache.org/jira/browse/CASSANDRA-18204, 
 have a fixver of 5.0 on them After merging to the feature branch?
 
 
 
 For the SAI CEP which is also using the feature branch method the 
 "reviewed and merged to feature branch" tickets seem to be given a version 
 of NA.
 
 
 
 Not sure that's the best “waiting for cep to merge” version either?  But 
 it seems better than putting 5.0 on them to me.
 
 
 
 Why I’m not keen on 5.0 is because if we cut the release today those 
 tickets would not be there.
 
 
 
 What do other people think?  Is there a better version designation we can 
 use?
 
 
 
 On a different project I have in the past made a “version number” in JIRA 
 for each long running feature branch. Tickets merged to the feature branch 
 got the epic ticket number as their version, and then it got updated to 
 the “real” version when the feature branch was merged to trunk.
 
>>> 
>>> 
>>> Thanks for raising the thread, I remember there was some confusion early 
>>> wrt features branches too.
>>> 
>>> To rehash, for everything currently resolved in trunk 5.0 is the correct 
>>> fixVersion.  (And there should be no unresolved issues today with 5.0 
>>> fixVersion, they should be 5.x)
>>> 
>>> 
>>> When alpha1 is cut, then the 5.0-alpha1 fixVersion is created and 
>>> everything with 5.0 also gets  5.0-alpha1. At the same time 5.0-alpha2, 
>>> 5.0-beta, 5.0-rc, 5.0.0 fixVersions are created. Here both 5.0-beta and 
>>> 5.0-rc are blocking placeholder fixVersions: no resolved issues are left 
>>> with this fixVersion the same as the .x placeholder fixVersions. The 5.0.0 
>>> is also used as a blocking version, though it is also an eventual 
>>> fixVersion for resolved tickets. Also note, all tickets up to and including 
>>> 5.0.0 will also have the 5.0 fixVersion.
>>> 
>>> 
>>> 
>>> A particular reason for doing things the way they are is to make it easy 
>>> for the release manager to bulk correct fixVersions, at release time or 
>>> even later, i.e. without having to read the ticket or go talk to authors or 
>>> painstakingly crawl CHANGES.txt.
>>> 
>>> 
>>> 
>>> For feature branches my suggestion is that we create a fixVersion for each 
>>> of them, e.g. 5.0-cep-15
>>> 
>>> Yup, that's your suggestion Jeremiah (I wrote this up on the plane before I 
>>> got to read your post properly).
>>> 
>>> 
>>> 
>>> (As you say) This then makes it easy to see where the code is (or what the 
>>> patch is currently being based on). And when the feature branch is merged 
>>> then it is easy to bulk replace it with trunk's fixVersion, e.g.  
>>> 5.0-cep-15 with 5.0
>>> 
>>> 
>>> 
>>> The NA fixVersion was introduced for the other repositories, e.g. website 
>>> updates.
>>>

Re: [DISCUSS] Feature branch version hygiene

2023-05-18 Thread Josh McKenzie

> My personal view is that 5.0 should not be used for any resolved tickets - 
> they should go to 5.0-alpha1, since this is the correct release for them. 5.0 
> can then be the target version, which makes more sense given it isn’t a 
> concrete release.
Well now you're just opening Pandora's box about our strange idioms with 
FixVersion usage. ;)

> every ticket targeting 5.0 could use fixVersion 5.0.x, since it is pretty 
> clear what this means.
I think this diverges from our current paradigm where "5.x" == next feature 
release, "5.0.x" == next patch release (i.e. bugfix only). Not to say it's bad, 
just an adjustment... which if we're open to adjustment...

I'm receptive to transitioning the discussion to that either on this thread or 
another; IMO we remain in a strange and convoluted place with our 
FixVersioning. My understanding of our current practice:
 • .x is used to denote target version. For example: 5.x, 5.0.x, 5.1.x, 4.1.x
 • When a ticket is committed, the FixVersion is transitioned to resolve the X 
to the next unreleased version in which it'll release
 • Weird Things are done to make this work for the release process and release 
manager on feature releases (alpha, beta, etc)
 • There's no clear fit for feature branch tickets in the above schema

And if I take what I think you're proposing here and extrapolate it out:
 • .0 is used to denote target version. For example: 5.0. 5.0.0. 5.1.0. 4.1.0
 • This appears to break down for patch releases: we _do_ release .0 versions 
of them rather than alpha/beta/etc, so a ticket targeting 4.1.0 would initially 
mean 2 different things based on resolved vs. unresolved status (resolved == in 
release, unresolved == targeting next unreleased) and that distinction would 
disappear on resolution (i.e. resolved + 4.1.0 would no longer definitively 
mean "contained in .0 release")
 • When a release is cut, we bulk update FixVersion ending in .0 to the release 
version in which they're contained (not clear how to disambiguate the things 
from the above bullet point)
 • For feature releases, .0 will transition to -alpha1
One possible solution would be to just no longer release a .0 version of things 
and reserve .0 to indicate "parked". I don't particularly like that but it's 
not the worst.

Another possible solution would be to just scrap this approach entirely and go 
with:
 • FixVersion on unreleased _and still advocated for tickets_ always targets 
the next unreleased version. For other tickets where nobody is advocating for 
their work / inclusion, we either FixVersion "Backlog" or close as "Later"
 • When a release is cut, roll all unresolved tickets w/that FixVersion to the 
next unreleased FixVersion
 • When we're gearing up to a release, we can do a broad pass on everything 
that's unreleased w/the next feature releases FixVersion and move tickets that 
are desirable but not blockers to the next unreleased FixVersion (patch for 
bug, minor/major for improvements or new features)
 • CEP tickets target the same FixVersion (i.e. next unreleased Feature 
release) as their parents. When the parent epic gets a new FixVersion on 
resolution, all children get that FixVersion (i.e. when we merge the CEP and 
update its FixVersion, we bulk update all children tickets)

On Thu, May 18, 2023, at 9:08 AM, Benedict wrote:
> 
> I don’t think we should over complicate this with special CEP release 
> targets. If we do, they shouldn’t be versioned.
> 
> My personal view is that 5.0 should not be used for any resolved tickets - 
> they should go to 5.0-alpha1, since this is the correct release for them. 5.0 
> can then be the target version, which makes more sense given it isn’t a 
> concrete release.
> 
> But, in lieu of that, every ticket targeting 5.0 could use fixVersion 5.0.x, 
> since it is pretty clear what this means. Some tickets that don’t hit 5.0.0 
> can then be postponed to a later version, but it’s not like this is 
> burdensome. Anything marked feature/improvement and 5.0.x gets bumped to 
> 5.1.x.
> 
> 
> 
> 
> 
> 
>> On 18 May 2023, at 13:58, Josh McKenzie  wrote:
>> 
>> CEP-N seems like a good compromise. NextMajorRelease bumps into our 
>> interchangeable use of "Major" and "Minor" from a semver perspective and 
>> could get confusing. Suppose we could do NextFeatureRelease, but at that 
>> point why not just have it linked to the CEP and have the epic set.
>> 
>> On Thu, May 18, 2023, at 12:26 AM, Caleb Rackliffe wrote:
>>> ...otherwise I'm fine w/ just the CEP name, like "CEP-7" for SAI, etc.
>>> 
>>> On Wed, May 17, 2023 at 11:24 PM Caleb Rackliffe  
>>> wrote:
>>>> So when a CEP slips, do we have to create a 5.1

Re: [DISCUSS] Feature branch version hygiene

2023-05-18 Thread Josh McKenzie

> My mental model, though, is that anything that’s not a concrete release 
> number is a target version. Which is where 5.0 goes wrong - it’s not a 
> release so it should be a target, but for some reason we use it as a 
> placeholder to park work arriving in 5.0.0.
Ahhh.

> So tickets go to 5.0-target if they target 5.0, and to 5.0.0 once they are 
> resolved (with additional labels as necessary)
Adding -target would definitely make things more clear. If we moved to "5.0 == 
unreleased, always move to something on commit" then you still have to find 
some external source to figure out what's going on w/our versioning.

I like 5.0-target. Easy to query for "FixVersion = 5.0-target AND type != 
'Bug'" to find stragglers after GA is cut to move to 5.1-target.

Still have the "update children FixVersion for feature branch when branch is 
merged" bit but that's not so onerous.

On Thu, May 18, 2023, at 10:28 AM, Benedict wrote:
> 
> The .x approach only breaks down for unreleased majors, for which all of our 
> intuitions breakdown and we rehash it every year.
> 
> My mental model, though, is that anything that’s not a concrete release 
> number is a target version. Which is where 5.0 goes wrong - it’s not a 
> release so it should be a target, but for some reason we use it as a 
> placeholder to park work arriving in 5.0.0.
> 
> If we instead use 5.0.0 for this purpose, we just need to get 5.0-alpha1 
> labels added when those releases are cut.
> 
> Then I propose we break the confusion in both directions by scrapping 5.0 
> entirely and introducing 5.0-target.
> 
> So tickets go to 5.0-target if they target 5.0, and to 5.0.0 once they are 
> resolved (with additional labels as necessary)
> 
> Simples?
> 
>> On 18 May 2023, at 15:21, Josh McKenzie  wrote:
>> 
>>> My personal view is that 5.0 should not be used for any resolved tickets - 
>>> they should go to 5.0-alpha1, since this is the correct release for them. 
>>> 5.0 can then be the target version, which makes more sense given it isn’t a 
>>> concrete release.
>> Well now you're just opening Pandora's box about our strange idioms with 
>> FixVersion usage. ;)
>> 
>>> every ticket targeting 5.0 could use fixVersion 5.0.x, since it is pretty 
>>> clear what this means.
>> I think this diverges from our current paradigm where "5.x" == next feature 
>> release, "5.0.x" == next patch release (i.e. bugfix only). Not to say it's 
>> bad, just an adjustment... which if we're open to adjustment...
>> 
>> I'm receptive to transitioning the discussion to that either on this thread 
>> or another; IMO we remain in a strange and convoluted place with our 
>> FixVersioning. My understanding of our current practice:
>>  • .x is used to denote target version. For example: 5.x, 5.0.x, 5.1.x, 4.1.x
>>  • When a ticket is committed, the FixVersion is transitioned to resolve the 
>> X to the next unreleased version in which it'll release
>>  • Weird Things are done to make this work for the release process and 
>> release manager on feature releases (alpha, beta, etc)
>>  • There's no clear fit for feature branch tickets in the above schema
>> 
>> And if I take what I think you're proposing here and extrapolate it out:
>>  • .0 is used to denote target version. For example: 5.0. 5.0.0. 5.1.0. 4.1.0
>>  • This appears to break down for patch releases: we _do_ release .0 
>> versions of them rather than alpha/beta/etc, so a ticket targeting 4.1.0 
>> would initially mean 2 different things based on resolved vs. unresolved 
>> status (resolved == in release, unresolved == targeting next unreleased) and 
>> that distinction would disappear on resolution (i.e. resolved + 4.1.0 would 
>> no longer definitively mean "contained in .0 release")
>>  • When a release is cut, we bulk update FixVersion ending in .0 to the 
>> release version in which they're contained (not clear how to disambiguate 
>> the things from the above bullet point)
>>  • For feature releases, .0 will transition to -alpha1
>> One possible solution would be to just no longer release a .0 version of 
>> things and reserve .0 to indicate "parked". I don't particularly like that 
>> but it's not the worst.
>> 
>> Another possible solution would be to just scrap this approach entirely and 
>> go with:
>>  • FixVersion on unreleased _and still advocated for tickets_ always targets 
>> the next unreleased version. For other tickets where nobody is advocating 
>> for their work / inclusion, we

Re: [DISCUSS] Feature branch version hygiene

2023-05-18 Thread Josh McKenzie

> They stay on 5.0-target after close and move to 5.0.0 when the epic is merged 
> and closes
Yep. When they merge to the feature branch they're still not on trunk (or 
whatever release branch) so they're still targeting it.

That leaves us with the one indicative case: something with "resolution = Fixed 
AND FixVersion ~ 'target'" means it's on a feature branch. At time of feature 
branch merge folks will need to update the FixVersion on the epic + all child 
tickets that are merged onto that branch.

On Thu, May 18, 2023, at 11:11 AM, Jeremiah D Jordan wrote:
> So what do we do with feature branch merged tickets in this model?  *They 
> stay on 5.0-target after close and move to 5.0.0 when the epic is merged and 
> closes*?
> 
>> On May 18, 2023, at 9:33 AM, Josh McKenzie  wrote:
>> 
>>> My mental model, though, is that anything that’s not a concrete release 
>>> number is a target version. Which is where 5.0 goes wrong - it’s not a 
>>> release so it should be a target, but for some reason we use it as a 
>>> placeholder to park work arriving in 5.0.0.
>> Ahhh.
>> 
>>> So tickets go to 5.0-target if they target 5.0, and to 5.0.0 once they are 
>>> resolved (with additional labels as necessary)
>> Adding -target would definitely make things more clear. If we moved to "5.0 
>> == unreleased, always move to something on commit" then you still have to 
>> find some external source to figure out what's going on w/our versioning.
>> 
>> I like 5.0-target. Easy to query for "FixVersion = 5.0-target AND type != 
>> 'Bug'" to find stragglers after GA is cut to move to 5.1-target.
>> 
>> Still have the "update children FixVersion for feature branch when branch is 
>> merged" bit but that's not so onerous.
>> 
>> On Thu, May 18, 2023, at 10:28 AM, Benedict wrote:
>>> 
>>> The .x approach only breaks down for unreleased majors, for which all of 
>>> our intuitions breakdown and we rehash it every year.
>>> 
>>> My mental model, though, is that anything that’s not a concrete release 
>>> number is a target version. Which is where 5.0 goes wrong - it’s not a 
>>> release so it should be a target, but for some reason we use it as a 
>>> placeholder to park work arriving in 5.0.0.
>>> 
>>> If we instead use 5.0.0 for this purpose, we just need to get 5.0-alpha1 
>>> labels added when those releases are cut.
>>> 
>>> Then I propose we break the confusion in both directions by scrapping 5.0 
>>> entirely and introducing 5.0-target.
>>> 
>>> So tickets go to 5.0-target if they target 5.0, and to 5.0.0 once they are 
>>> resolved (with additional labels as necessary)
>>> 
>>> Simples?
>>> 
>>>> On 18 May 2023, at 15:21, Josh McKenzie  wrote:
>>>> 
>>>>> My personal view is that 5.0 should not be used for any resolved tickets 
>>>>> - they should go to 5.0-alpha1, since this is the correct release for 
>>>>> them. 5.0 can then be the target version, which makes more sense given it 
>>>>> isn’t a concrete release.
>>>> Well now you're just opening Pandora's box about our strange idioms with 
>>>> FixVersion usage. ;)
>>>> 
>>>>> every ticket targeting 5.0 could use fixVersion 5.0.x, since it is pretty 
>>>>> clear what this means.
>>>> I think this diverges from our current paradigm where "5.x" == next 
>>>> feature release, "5.0.x" == next patch release (i.e. bugfix only). Not to 
>>>> say it's bad, just an adjustment... which if we're open to adjustment...
>>>> 
>>>> I'm receptive to transitioning the discussion to that either on this 
>>>> thread or another; IMO we remain in a strange and convoluted place with 
>>>> our FixVersioning. My understanding of our current practice:
>>>>  • .x is used to denote target version. For example: 5.x, 5.0.x, 5.1.x, 
>>>> 4.1.x
>>>>  • When a ticket is committed, the FixVersion is transitioned to resolve 
>>>> the X to the next unreleased version in which it'll release
>>>>  • Weird Things are done to make this work for the release process and 
>>>> release manager on feature releases (alpha, beta, etc)
>>>>  • There's no clear fit for feature branch tickets in the above schema
>>>> 
>>>> And if I take what I think you're proposing here and extrapolate it out

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-23 Thread Josh McKenzie

I'll hold off on this until Alex Petrov chimes in. @Alex -> got any thoughts 
here?

On Tue, May 16, 2023, at 5:17 PM, Jeremy Hanna wrote:
> I think it would be great to onboard Harry more officially into the project.  
> However it would be nice to perhaps do some sanity checking outside of Apple 
> folks to see how approachable it is.  That is, can someone take it and just 
> run it with the current readme without any additional context?
> 
> I wonder if a mini-onboarding session would be good as a community session - 
> go over Harry, how to run it, how to add a test?  Would that be the right 
> venue?  I just would like to see how we can not only plug it in to regular CI 
> but get everyone that wants to add a test be able to know how to get started 
> with it.
> 
> Jeremy
> 
>> On May 16, 2023, at 1:34 PM, Abe Ratnofsky  wrote:
>> 
>> Just to make sure I'm understanding the details, this would mean 
>> apache/cassandra-harry maintains its status as a separate repository, 
>> apache/cassandra references it as a submodule, and clones and builds Harry 
>> locally, rather than pulling a released JAR. We can then reference Harry as 
>> a library without maintaining public artifacts for it. Is that in line with 
>> what you're thinking?
>> 
>> > I'd also like to see us get a Harry run integrated as part of our 
>> > pre-commit CI
>> 
>> I'm a strong supporter of this, of course.
>> 
>>> On May 16, 2023, at 11:03 AM, Josh McKenzie  wrote:
>>> 
>>> Similar to what we've done with accord in 
>>> https://issues.apache.org/jira/browse/CASSANDRA-18204, I'd like to discuss 
>>> bringing cassandra-harry in-tree as a submodule. repo link: 
>>> https://github.com/apache/cassandra-harry
>>> 
>>> Given the value it's brought to the project's stabilization efforts and the 
>>> movement of other things in the ecosystem to being more integrated (accord, 
>>> build-scripts https://issues.apache.org/jira/browse/CASSANDRA-18133), I 
>>> think having the testing framework better localized and integrated would be 
>>> a net benefit for adoption, awareness, maintenance, and tighter workflows 
>>> as we troubleshoot future failures it surfaces.
>>> 
>>> I'd also like to see us get a Harry run integrated as part of our 
>>> pre-commit CI (a 5 minute simple soak test for instance) and having that 
>>> local in this fashion should make that a cleaner integration as well.
>>> 
>>> Thoughts?

Re: Vector search demo, and query syntax

2023-05-24 Thread Josh McKenzie

+1 to the flow of:

1: ORDER BY?

2:  Oh. Yeah. That *does *makes sense.

;)

(sending from fastmail in the hopes the image doesn't get stripped. Thanks ASF 
smtp server...)

~Josh

On Wed, May 24, 2023, at 1:00 AM, Jeremiah D Jordan wrote:
> At first I wasn’t sure about using ORDER BY, but the more I think about what 
> is actually going on, I think it does make sense.
> 
> This also matches up with some ideas that have been floating around about 
> being able to ORDER BY a sorted SAI index.
> 
> -Jeremiah
> 
>> On May 22, 2023, at 2:28 PM, Jonathan Ellis  wrote:
>> 
>> Hi all,
>> 
>> I have a branch of vector search based on cep-7-sai at 
>> _https://github.com/datastax/cassandra/tree/cep-vsearch_. Compared to the 
>> original POC branch, this one is based on the SAI code that will be mainline 
>> soon, and handles distributed scatter/gather.  Updates and deletes to vector 
>> values are still not supported.
>> 
>> I also put together a demo that uses this branch to provide context to 
>> OpenAI’s GPT, available here: _https://github.com/jbellis/cassgpt_.  
>> 
>> Here is the query that gets executed:
>> 
>> SELECT id, start, end, text 
>> FROM {self.keyspace}.{self.table} 
>> WHERE embedding ANN OF %s 
>> LIMIT %s
>> 
>> The more I used the proposed `ANN OF` syntax, the less I liked it.  This is 
>> because we don’t want an actual boolean predicate; we just want to order 
>> results.  Put another way, `ANN OF` will include all rows of the table given 
>> a high enough `LIMIT`, and that makes it a bad fit for expression processing 
>> that expects to be able to filter out rows before it starts LIMIT-ing.  And 
>> in fact the code to support executing the query looks suspiciously like what 
>> you’d want for `ORDER BY`.
>> 
>> I propose that we adopt `ORDER BY` syntax, supporting it for vector indexes 
>> first and eventually for all SAI indexes.  So this query would become
>> 
>> SELECT id, start, end, text 
>> FROM {self.keyspace}.{self.table} 
>> ORDER BY embedding ANN OF %s 
>> LIMIT %s
>> 
>> And it would compose with other SAI indexes with syntax like
>> 
>> SELECT id, start, end, text 
>> FROM {self.keyspace}.{self.table} 
>> WHERE publish_date > %s
>> ORDER BY embedding ANN OF %s 
>> LIMIT %s
>> 
>> Related work:
>> 
>> This is similar to the approach used by pgvector, except they invented the 
>> symbolic operator `<->` that has the same semantics as `ANN OF`.  I am okay 
>> with adopting their operator, but I think ANN OF is more readable.
>> 
>> --
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-24 Thread Josh McKenzie

> importantly it’s a million times better than the dtest-api process - which 
> stymies development due to the friction.
This is my major concern.

What prompted this thread was harry being external to the core codebase and the 
lack of adoption and usage of it having led to atrophy of certain aspects of 
it, which then led to redundant implementation of some fuzz testing and lost 
time.

We'd all be better served to have this closer to the main codebase as a forcing 
function to smooth out the rough edges, integrate it, and make it a collective 
artifact and first class citizen IMO.

I have similar opinions about the dtest-api.


On Wed, May 24, 2023, at 4:05 AM, Benedict wrote:
> 
> It’s not without hiccups, and I’m sure we have more to learn. But it mostly 
> just works, and importantly it’s a million times better than the dtest-api 
> process - which stymies development due to the friction.
> 
>> On 24 May 2023, at 08:39, Mick Semb Wever  wrote:
>> 
>> 
>> WRT git submodules and CASSANDRA-18204, are we happy with how it is working 
>> for accord ? 
>> 
>> The time spent on getting that running has been a fair few hours, where we 
>> could have cut many manual module releases in that time. 
>> 
>> David and folks working on accord ? 
>> 
>> 
>> 
>> On Tue, 23 May 2023 at 20:09, Josh McKenzie  wrote:
>>> __
>>> I'll hold off on this until Alex Petrov chimes in. @Alex -> got any 
>>> thoughts here?
>>> 
>>> On Tue, May 16, 2023, at 5:17 PM, Jeremy Hanna wrote:
>>>> I think it would be great to onboard Harry more officially into the 
>>>> project.  However it would be nice to perhaps do some sanity checking 
>>>> outside of Apple folks to see how approachable it is.  That is, can 
>>>> someone take it and just run it with the current readme without any 
>>>> additional context?
>>>> 
>>>> I wonder if a mini-onboarding session would be good as a community session 
>>>> - go over Harry, how to run it, how to add a test?  Would that be the 
>>>> right venue?  I just would like to see how we can not only plug it in to 
>>>> regular CI but get everyone that wants to add a test be able to know how 
>>>> to get started with it.
>>>> 
>>>> Jeremy
>>>> 
>>>>> On May 16, 2023, at 1:34 PM, Abe Ratnofsky  wrote:
>>>>> 
>>>>> Just to make sure I'm understanding the details, this would mean 
>>>>> apache/cassandra-harry maintains its status as a separate repository, 
>>>>> apache/cassandra references it as a submodule, and clones and builds 
>>>>> Harry locally, rather than pulling a released JAR. We can then reference 
>>>>> Harry as a library without maintaining public artifacts for it. Is that 
>>>>> in line with what you're thinking?
>>>>> 
>>>>> > I'd also like to see us get a Harry run integrated as part of our 
>>>>> > pre-commit CI
>>>>> 
>>>>> I'm a strong supporter of this, of course.
>>>>> 
>>>>>> On May 16, 2023, at 11:03 AM, Josh McKenzie  wrote:
>>>>>> 
>>>>>> Similar to what we've done with accord in 
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18204, I'd like to 
>>>>>> discuss bringing cassandra-harry in-tree as a submodule. repo link: 
>>>>>> https://github.com/apache/cassandra-harry
>>>>>> 
>>>>>> Given the value it's brought to the project's stabilization efforts and 
>>>>>> the movement of other things in the ecosystem to being more integrated 
>>>>>> (accord, build-scripts 
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18133), I think having 
>>>>>> the testing framework better localized and integrated would be a net 
>>>>>> benefit for adoption, awareness, maintenance, and tighter workflows as 
>>>>>> we troubleshoot future failures it surfaces.
>>>>>> 
>>>>>> I'd also like to see us get a Harry run integrated as part of our 
>>>>>> pre-commit CI (a 5 minute simple soak test for instance) and having that 
>>>>>> local in this fashion should make that a cleaner integration as well.
>>>>>> 
>>>>>> Thoughts?
>>>

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-24 Thread Josh McKenzie

> I have submitted a proposal to Cassandra Summit for a 4-hour Harry workshop,
I'm about to need to harry test for the paging across tombstone work for 
https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's where my own 
overlapping fuzzing came in). In the process, I'll see if I can't distill 
something really simple along the lines of how React approaches it 
(https://react.dev/learn).

Ideally we'd be able to get something together that's a high level "In the next 
15 minutes, you will know and understand A-G and have access to N% of the power 
of harry" kind of offer.

Honestly, there's a *lot* in our ecosystem where we could benefit from taking a 
page from their book in terms of onboarding and getting started IMO.

On Wed, May 24, 2023, at 10:31 AM, Alex Petrov wrote:
> > I wonder if a mini-onboarding session would be good as a community session 
> > - go over Harry, how to run it, how to add a test?  Would that be the right 
> > venue?  I just would like to see how we can not only plug it in to regular 
> > CI but get everyone that wants to add a test be able to know how to get 
> > started with it.
> 
> I have submitted a proposal to Cassandra Summit for a 4-hour Harry workshop, 
> but unfortunately it got declined. Goes without saying, we can still do it 
> online, time and resources permitting. But again, I do not think it should be 
> barring us from making Harry a part of the codebase, as it already is. In 
> fact, we can be iterating on the development quicker having it in-tree. 
> 
> We could go over some interesting examples such as testing 2i (SAI), 
> modelling Group By tests, or testing repair. If there is enough appetite and 
> collaboration in the community, I will see if we can pull something like that 
> together. Input on _what_ you would like to see / hear / tested is also 
> appreciated. Harry was developed out of a strong need for large-scale 
> testing, which also has informed many of its APIs, but we can make it easier 
> to access for interactive testing / unit tests. We have been doing a lot of 
> that with Transactional Metadata, too. 
> 
> > I'll hold off on this until Alex Petrov chimes in. @Alex -> got any 
> > thoughts here?
> 
> Yes, sorry for not responding on this thread earlier. I can not understate 
> how excited I am about this, and how important I think this is. Time 
> constraints are somehow hard to overcome, but I hope the results brought by 
> TCM will make it all worth it.
> 
> On Wed, May 24, 2023, at 4:23 PM, Alex Petrov wrote:
>> I think pulling Harry into the tree will make adoption easier for the folks. 
>> I have been a bit swamped with Transactional Metadata work, but I wanted to 
>> make some of the things we were using for testing TCM available outside of 
>> TCM branch. This includes a bunch of helper methods to perform operations on 
>> the clusters, data generation, and more useful stuff. Of course, the 
>> question always remains about how much time I want to spend porting it all 
>> to Gossip, but I think we can find a reasonable compromise. 
>> 
>> I would not set this improvement as a prerequisite to pulling Harry into the 
>> main branch, but rather interpret it as a commitment from myself to take 
>> community input and make it more approachable by the day. 
>> 
>> On Wed, May 24, 2023, at 2:44 PM, Josh McKenzie wrote:
>>>> importantly it’s a million times better than the dtest-api process - which 
>>>> stymies development due to the friction.
>>> This is my major concern.
>>> 
>>> What prompted this thread was harry being external to the core codebase and 
>>> the lack of adoption and usage of it having led to atrophy of certain 
>>> aspects of it, which then led to redundant implementation of some fuzz 
>>> testing and lost time.
>>> 
>>> We'd all be better served to have this closer to the main codebase as a 
>>> forcing function to smooth out the rough edges, integrate it, and make it a 
>>> collective artifact and first class citizen IMO.
>>> 
>>> I have similar opinions about the dtest-api.
>>> 
>>> 
>>> On Wed, May 24, 2023, at 4:05 AM, Benedict wrote:
>>>> 
>>>> It’s not without hiccups, and I’m sure we have more to learn. But it 
>>>> mostly just works, and importantly it’s a million times better than the 
>>>> dtest-api process - which stymies development due to the friction.
>>>> 
>>>>> On 24 May 2023, at 08:39, Mick Semb Wever  wrote:
>>>>> 
>>>>> 
>>>>> WRT git submodules and CASSANDRA-18204, are w

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-25 Thread Josh McKenzie

> I would really like us to split out utilities into a common project
+1 to the sentiment.

Would also advocate strongly for it being more tightly integrated with the base 
project than what we've been doing with our ecosystem (i.e. completely separate 
projects, not submodules), mostly from a discoverability and workflow 
standpoint.

I'm definitely salty about having to have 4 IDE's / projects open just to work 
on the entire stack.

On Thu, May 25, 2023, at 5:05 AM, Alex Petrov wrote:
> This was not a talk, but rather an interactive workshop, unfortunately will 
> not work in a recorded way, but I am trying to work out ways to preserve this.
> 
> On Thu, May 25, 2023, at 10:26 AM, Claude Warren, Jr via dev wrote:
>> Since the talk was not accepted for Cassandra Summit, would it be possible 
>> to record it as a simple youtube video and publish it so that the detailed 
>> information about how to use Harry is not lost?
>> 
>> On Thu, May 25, 2023 at 7:36 AM Alex Petrov  wrote:
>>> __
>>> While we are at it, we may also want to pull the in-jvm dtest API as a 
>>> submodule, and actually move some tests that are common between the 
>>> branches there.
>>> 
>>> On Thu, May 25, 2023, at 6:03 AM, Caleb Rackliffe wrote:
>>>> Isn’t the other reason Accord works well as a submodule that it has no 
>>>> dependencies on C* proper? Harry does at the moment, right? (Not that we 
>>>> couldn’t address that…just trying to think this through…)
>>>> 
>>>>> On May 24, 2023, at 6:54 PM, Benedict  wrote:
>>>>> 
>>>>> 
>>>>> In this case Harry is a testing module - it’s not something we will 
>>>>> develop in tandem with C* releases, and we will want improvements to be 
>>>>> applied across all branches.
>>>>> 
>>>>> So it seems a natural fit for submodules to me.
>>>>> 
>>>>> 
>>>>>> On 24 May 2023, at 21:09, Caleb Rackliffe  
>>>>>> wrote:
>>>>>> 
>>>>>> > Submodules do have their own overhead and edge cases, so I am mostly 
>>>>>> > in favor of using for cases where the code must live outside of tree 
>>>>>> > (such as jvm-dtest that lives out of tree as all branches need the 
>>>>>> > same interfaces)
>>>>>> 
>>>>>> Agreed. Basically where I've ended up on this topic.
>>>>>> 
>>>>>> > We could go over some interesting examples such as testing 2i (SAI)
>>>>>> 
>>>>>> +100
>>>>>> 
>>>>>> 
>>>>>> On Wed, May 24, 2023 at 1:40 PM Alex Petrov  wrote:
>>>>>>> __
>>>>>>> > I'm about to need to harry test for the paging across tombstone work 
>>>>>>> > for https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's 
>>>>>>> > where my own overlapping fuzzing came in). In the process, I'll see 
>>>>>>> > if I can't distill something really simple along the lines of how 
>>>>>>> > React approaches it (https://react.dev/learn).
>>>>>>> 
>>>>>>> We can pick that up as an example, sure. 
>>>>>>> 
>>>>>>> On Wed, May 24, 2023, at 4:53 PM, Josh McKenzie wrote:
>>>>>>>>> I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
>>>>>>>>> workshop,
>>>>>>>> I'm about to need to harry test for the paging across tombstone work 
>>>>>>>> for https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's 
>>>>>>>> where my own overlapping fuzzing came in). In the process, I'll see if 
>>>>>>>> I can't distill something really simple along the lines of how React 
>>>>>>>> approaches it (https://react.dev/learn).
>>>>>>>> 
>>>>>>>> Ideally we'd be able to get something together that's a high level "In 
>>>>>>>> the next 15 minutes, you will know and understand A-G and have access 
>>>>>>>> to N% of the power of harry" kind of offer.
>>>>>>>> 
>>>>>>>> Honestly, there's a *lot* in our ecosystem where we could benefit from 
>>>>>>>> taking a page from their book in terms of onboarding and getti

Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Josh McKenzie

+1

On Thu, May 25, 2023, at 8:33 PM, Jake Luciani wrote:
> +1
> 
> On Thu, May 25, 2023 at 11:45 AM Jonathan Ellis  wrote:
>> Let's make this official.
>> 
>> CEP: 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>> 
>> POC that demonstrates all the big rocks, including distributed queries: 
>> https://github.com/datastax/cassandra/tree/cep-vsearch
>> 
>> --
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced
> --
> http://twitter.com/tjake

Re: [DISCUSS] CEP-8 Drivers Donation - take 2

2023-05-30 Thread Josh McKenzie

> Is the vote for the CEP to be for all drivers, but we will introduce each 
> driver one by one?  What determines when we are comfortable with one driver 
> subproject and can move on to accepting the next ? 
Curious to hear on this as well. There's 2 implications from the CEP as written:

1. The Java and Python drivers hold special importance due to their language 
proximity and/or project's dependence upon them 
(https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation#CEP8:DatastaxDriversDonation-Scope)
2. Datastax is explicitly offering all 7 drivers for donation 
(https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation#CEP8:DatastaxDriversDonation-Goals)

This is the most complex contribution via CEP thus far from a governance 
perspective; I suggest we chart a bespoke path to navigate this. Having a top 
level indication of "the CEP is approved" logically separate from a 
per-language indication of "the project is ready to absorb this language driver 
now" makes sense to me. This could look like:

* Vote on the CEP itself
* Per language (processing one at a time):
* identify 3 PMC members willing to take on the governance role for the 
language driver
* Identify 2 contributors who are active on a given driver and stepping 
forward for a committer role on the driver
* Vote on inclusion of that language driver in the project + commit bits
* Integrate that driver into the project ecosystem (build, ci, docs, etc)

Not sure how else we could handle committers / contributors / PMC members other 
than on a per-driver basis.

On Tue, May 30, 2023, at 5:36 AM, Mick Semb Wever wrote:
> 
> Thank you so much Jeremy and Greg (+others) for all the hard work on this.
>  
>> 
>> At this point, we'd like to propose CEP-8 for consideration, starting the 
>> process to accept the DataStax Java driver as an official ASF project.
> 
> 
> Is the vote for the CEP to be for all drivers, but we will introduce each 
> driver one by one?  What determines when we are comfortable with one driver 
> subproject and can move on to accepting the next ? 
> 
> Are there key committers and contributors on each driver that want to be 
> involved?  Should they be listed before the vote?
> We also need three PMC for the new subproject.  Are we to assign these before 
> the vote?  
> 
>

Cassandra project status, 2023-05-30

2023-05-30 Thread Josh McKenzie

Been a bit over a month; let's check in and see how things are looking.

We released the following:
- 3.11.15
- 3.0.29
- 4.0.10
- 4.1.2

Thanks to all the release managers who worked on getting these out the door.

[New Contributors Getting Started]
First off, come hang out with us in the #cassandra-dev channel on
https://the-asf.slack.com (reply to me on this email if you need an invite for
your account), and reach out to the @cassandra_mentors alias with any questions
about the code. We have a list of hand-curated "starter tickets" available
here:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2160&quickFilter=2162.
Anything in the "ToDo" column is a great candidate to pick up if you want to
get your feet wet with the project. Some other useful links:

Getting Started with Development on C*:
https://cassandra.apache.org/_/development/gettingstarted.html
Building and IDE integration (worktrees are your friend):
https://cassandra.apache.org/_/development/ide.html
Code Style: https://cassandra.apache.org/_/development/code_style.html

[Dev mailing list]
https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-4-25|dto=2023-5-31:

52 threads since last email. I have made a mistake in waiting this long. ;)

Jonathan Ellis' thread on vector search reached a conclusion, a follow up
discussion about API's took place, and a CEP was proposed, voted upon, and
passed! Phew.

Thread: https://lists.apache.org/thread/16lc6d02xsfvlvqgn3ooy53pgfddyglc
Proposal on adding a new type for vector search:
https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0
Poll on syntax: https://lists.apache.org/thread/lkowo1qkxjb5wc3n8v6ov4f0r538h13c
CEP-30 proposal:
https://lists.apache.org/thread/v32tgofo0w47bl7stbb9141obfbg5r0x
CEP-30 vote: https://lists.apache.org/thread/7s581j65wtst6968c86hzncbnzrr09oj

Congratulations to Jonathan and everyone else involved for getting that up and
running and driving to consensus that rapidly.

Speaking of CEP's, the patch for CEP-28 (Spark bulk writer / reader) via the
sidecar was posted:
https://lists.apache.org/thread/7pwvlwkg49qm72xnlf0m322fy4fmvxk3. Doug Rohrer
also called a vote for CEP-28 and it passed as well!
https://lists.apache.org/thread/7kndoo6rjchrlk41hbl8v7sclkvdzkgt Congrats to
Doug and everyone else who collaborated on that effort as well. Quite a month
for us as a project!

Maxim Muzafarov keeps fighting the good fight on vtables and updating running
configurations:
https://lists.apache.org/thread/gdtr3vp375d3nyj6h8xo7owth1s556lz.

Jakub's working on getting the ant target for generate-idea-files to behave
with JDK17: https://lists.apache.org/thread/o2fdkyv2skvf9ngy9jhpnhvo92qvr17m.
Looks like he has a few reviewers on the ticket but if you're curious you can
find that here: https://issues.apache.org/jira/browse/CASSANDRA-18467

Discussion around CEP-29 (CQL NOT operator) continued:
https://lists.apache.org/thread/cl4d7yo9q6ygnqstk8hhgm597ywg69d1
And was voted upon:
https://lists.apache.org/thread/rwxc8y0c8johrhqcpxsdkns85rop0fxg and passed!
Congratulations Piotr and crew on that; that's a feature I'm sure a lot of our
users will appreciate.

Claude Warren has a PR open working with an SSTableDowngrader tool:
https://lists.apache.org/thread/wvb8c5svvyvny0b61ybbw0jvxxflog4p. The PR can be
found here: https://github.com/apache/cassandra/pull/2045, and this is in
relation to the C* JIRA issue
https://issues.apache.org/jira/browse/CASSANDRA-8928.

A new release of the in-jvm dtest API went out:
https://lists.apache.org/thread/tsn70ox1th1x2vcsc7kfky9jsv1foq61

Maxim Muzafarov reached out to let everyone know about the migration of
properties into the CassandraRelevantProperties class:
https://lists.apache.org/thread/3g5g5kmk64m54qlyhpmdvxcw8m2vsytz. I'm very
happy SonarLint will stop yelling at me about this class of warnings going
forward. :)

With SAI appearing as well as ANN Vector search, the topic of how we handle our
CREATE INDEX DDL came up courtesy of Caleb Rackliffe:
https://lists.apache.org/thread/4jxq1tghvb10f848q5vkq241w39lyw57. Looks like
we've managed to distill things down to something we can wrangle to consensus:
https://lists.apache.org/thread/oswfj6rsq298dfffw3yzy12q82ybczn7

Our usage of FixVersion continues to evolve:
https://lists.apache.org/thread/5ompnd3l76kpwc831h80o1jd1g87dcgy. This thread
came up around what FixVersion we apply to tickets that are sub-tasks of epic's
for approved CEP's that may or may not land in a major. Since we don't know if
they're going to be done by the hard cutoff for 5.0 for instance, 5.0 as a
release version would be incorrect. And since 5.X is historically reserved for
"5.0-targeting but not yet merged", we end up in a bind there.

Benedict definitely brought me around to the approach of having: FIXVERSION =
5.0-target, and upon merge of the parent epic we can update all children
tickets to whatever the parent has. No real strong conse

Re: Is simplenative in cassandra-stress still relevant?

2023-05-31 Thread Josh McKenzie

> The main issue I see with maintaining the SimpleClient in cassandra-stress is 
> the burden it puts on a user to understand the options available when 
> connecting with *-mode*:
How frequently do we expect users or devs to use the built-in cassandra-stress 
tool? Between tlp-stress and NoSQLBench, it's not clear to me that keeping 
cassandra-stress (which has been largely unmaintained for years as I understand 
it?) is the best option.

On Wed, May 31, 2023, at 9:00 AM, Brad wrote:
> We all agree that we're not suggesting removing SimpleClient from Cassandra, 
> just from its use in cassandra-stress.
> 
> For debugging the native transport protocol, in addition to the standalone 
> Java Driver, there are the python drivers and ODBC drivers which can be 
> exercised with cqlsh and Intellij respectively.  Are they not sufficient?
> 
> The main issue I see with maintaining the SimpleClient in cassandra-stress is 
> the burden it puts on a user to understand the options available when 
> connecting with *-mode*:
> 
>> > cassandra-stress help -mode
>> 
>> Usage: -mode native [unprepared] cql3 [compression=?] [port=?] [user=?] 
>> [password=?] [auth-provider=?] [maxPending=?] [connectionsPerHost=?] 
>> [protocolVersion=?]
>> 
>>  OR 
>> 
>> Usage: -mode simplenative [prepared] cql3 [port=?]
>> 
>> 
>> 
>> 
>> 
> 
> 
> A user trying to determine how to specify credentials for usr/pwd is 
> presented with the option to use simplenative and prepared statements (which 
> appear broken).  It can lead down a rabbit hole of sparse documentation 
> trying to figure out what the simplenative option is, and is better than 
> cql3? 
> 
>  
> 
> 
> 
> On Wed, May 31, 2023 at 1:58 AM Miklosovic, Stefan 
>  wrote:
>> Interesting point about the debuggability.
>> 
>> Yes, I agree that SimpleClient (as class) should not be removed because we 
>> are using it in tests. I have already mentioned in my original e-mail that 
>> for this reason that class is not going anywhere and we still need to use it.
>> 
>> The cost of keeping it there is not big, sure, but we clearly see that e.g. 
>> the usage of "prepared" is buggy and it does not work. That somehow 
>> indicates to me that it kind of atrophied and nobody seems to notice which 
>> further supports my case that it is actually not used too much if it went 
>> undetected for so long.
>> 
>> Anyway, I think that we might just look at that bug with "prepared" and fix 
>> it and keep it all there. I do not see any tests which would test 
>> cassandra-stress command, similarly what we have for nodetool in JUnit. We 
>> could cover cassandra-stress similarly, just to be sure that its invocation 
>> on the most important commands does not fail over time.
>> 
>> 
>> 
>> From: Brandon Williams 
>> Sent: Wednesday, May 31, 2023 2:33
>> To: dev@cassandra.apache.org
>> Subject: Re: Is simplenative in cassandra-stress still relevant?
>> 
>> NetApp Security WARNING: This is an external email. Do not click links or 
>> open attachments unless you recognize the sender and know the content is 
>> safe.
>> 
>> 
>> 
>> 
>> On Tue, May 30, 2023 at 7:15 PM Brad  wrote:
>> > If you're performing stress testing, why would you not want to use the 
>> > official driver?  I've spoken to several people who all have said they've 
>> > never used simplenative mode.
>> 
>> I agree that it shouldn't be used normally, but I'm not sure we should
>> remove it, because we can't remove it fully: SimpleClient is still
>> used in many tests, and I think that should continue.
>> 
>> If you suspect any kind of native proto or driver issue it may be
>> useful to have another implementation easily accessible to aid in
>> debugging the problem, and the maintenance cost of keeping it in
>> stress is roughly zero in my opinion.  We can make it clear that it's
>> not recommended for use and is intended only as a debugging tool,
>> though.
>> 
>> Kind Regards,
>> Brandon

Re: Is simplenative in cassandra-stress still relevant?

2023-05-31 Thread Josh McKenzie

> I think that Cassandra should have some basic tool available to stress-test 
> itself
> I do not think that the current cassandra-stress is completely "useless"
Think there was an implication in my statement I didn't intend. I wasn't 
talking about not having *any* stress tool as the reference in our code-base, 
just trying to poke a bit at whether or not the one we have, which isn't 
maintained, should continue to be the one we use going forward.

I'd be fine with us collectively investing more energy into cassandra-stress, 
or seeing about reaching out to the tlp folks about tlp-stress replacing it as 
the default, or NoSQLBench, or some cassandra-stress argument compatible 
wrapper over one of them so things can migrate transparently, etc.

On Wed, May 31, 2023, at 11:42 AM, Miklosovic, Stefan wrote:
> Well this is a completely different kind of discussion, Josh, let's explore 
> it, shall we?
> 
> I think that Cassandra should have some basic tool available to stress-test 
> itself. Why not? I do not want to depend on some 3rd party tools even if they 
> might be objectively better. I do not think that the current cassandra-stress 
> is completely "useless". It is doing its job, more or less. If a user wants 
> to have something more advanced she is welcome to use that but I do not like 
> that we are trying to outsource the basic tooling outside of the project.
> 
> As I see it, we just spice it up with some tests to be sure that it will not 
> break without us knowing it and that's it. The fact that it is not actively 
> contributed to does not necessarily make it eligible for deletion as a whole.
> 
> Anyway, I am not calling the shots here, if a community decides it has to go 
> so it will but I would be said to see it.
> 
> Regards
> 
> 
> 
> From: Josh McKenzie 
> Sent: Wednesday, May 31, 2023 15:15
> To: dev
> Subject: Re: Is simplenative in cassandra-stress still relevant?
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> The main issue I see with maintaining the SimpleClient in cassandra-stress is 
> the burden it puts on a user to understand the options available when 
> connecting with -mode:
> How frequently do we expect users or devs to use the built-in 
> cassandra-stress tool? Between tlp-stress and NoSQLBench, it's not clear to 
> me that keeping cassandra-stress (which has been largely unmaintained for 
> years as I understand it?) is the best option.
> 
> On Wed, May 31, 2023, at 9:00 AM, Brad wrote:
> We all agree that we're not suggesting removing SimpleClient from Cassandra, 
> just from its use in cassandra-stress.
> 
> For debugging the native transport protocol, in addition to the standalone 
> Java Driver, there are the python drivers and ODBC drivers which can be 
> exercised with cqlsh and Intellij respectively.  Are they not sufficient?
> 
> The main issue I see with maintaining the SimpleClient in cassandra-stress is 
> the burden it puts on a user to understand the options available when 
> connecting with -mode:
> 
> > cassandra-stress help -mode
> 
> 
> Usage: -mode native [unprepared] cql3 [compression=?] [port=?] [user=?] 
> [password=?] [auth-provider=?] [maxPending=?] [connectionsPerHost=?] 
> [protocolVersion=?]
> 
> OR
> 
> Usage: -mode simplenative [prepared] cql3 [port=?]
> 
> 
> 
> 
> 
> A user trying to determine how to specify credentials for usr/pwd is 
> presented with the option to use simplenative and prepared statements (which 
> appear broken).  It can lead down a rabbit hole of sparse documentation 
> trying to figure out what the simplenative option is, and is better than cql3?
> 
> 
> 
> 
> On Wed, May 31, 2023 at 1:58 AM Miklosovic, Stefan 
> mailto:stefan.mikloso...@netapp.com>> wrote:
> Interesting point about the debuggability.
> 
> Yes, I agree that SimpleClient (as class) should not be removed because we 
> are using it in tests. I have already mentioned in my original e-mail that 
> for this reason that class is not going anywhere and we still need to use it.
> 
> The cost of keeping it there is not big, sure, but we clearly see that e.g. 
> the usage of "prepared" is buggy and it does not work. That somehow indicates 
> to me that it kind of atrophied and nobody seems to notice which further 
> supports my case that it is actually not used too much if it went undetected 
> for so long.
> 
> Anyway, I think that we might just look at that bug with "prepared" and fix 
> it and keep it all there. I do not see any test

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-31 Thread Josh McKenzie

Bumping into worktree + submodule pain on some harry related work; it looks 
like "git worktree" and submodules are not currently fully implemented:

https://git-scm.com/docs/git-worktree#_bugs
BUGS

Multiple checkout in general is still experimental, and the support for 
submodules is incomplete. It is NOT recommended to make multiple checkouts of a 
superproject.

I rely pretty heavily on worktrees and I know a lot of other folks who do too. 
This is a dealbreaker for me in terms of adding anything else as a submodule 
and I'd like to know if the accord folks have been running into any worktree 
related woes w/the accord integration.


On Sun, May 28, 2023, at 10:14 AM, Alex Petrov wrote:
> Regarding approachability, one of the things I thought is worth adding is a 
> DSL. I feel like there's enough functionality in Harry and there's enough 
> information for anyone who needs to write even an involved test out there, 
> but adoption doesn't usually start with complex use-cases, so it could be 
> that making it extremely simple to generate the data and validating that 
> written data is where it's supposed to be, should help adoption a lot. 
> Unfortunately, more complex use-cases such as group-by support, or SAI 
> testing will require a bit more knowledge and writing an involved model, so I 
> do not see any shortcuts we can take here.
> 
> > I do think that moving Harry in-tree would improve approachability
> 
> I think it's similar as it is with in-jvm dtest api. I feel like we wold 
> evolve it more actively if we didn't have to cut a release before every 
> commit. In other words, I think that changing Harry code and extending 
> functionality will be easier, which I think will eventually lead to quicker 
> adoption. But of course the act of moving itself does not increase adoption, 
> it just comes from better ergonomics.
> 
> 
> On Thu, May 25, 2023, at 8:03 PM, Abe Ratnofsky wrote:
>> I'm seeing a few distinct topics here:
>> 
>> 1. Harry's adoption and approachability
>> 
>> I agree that approachability is one of Harry's main improvement areas right 
>> now. If our goal is to produce a fuzz testing framework for the Cassandra 
>> project, then adoption by contributors and usage for new feature development 
>> are reasonable indicators for whether we're achieving that goal. If Harry is 
>> not getting adopted by contributors outside of Apple, and is not getting 
>> used for new feature development, then we should make an effort to 
>> understand why. I don't think that a several-hour seminar is the best point 
>> of leverage to achieve those goals.
>> 
>> Here's what I think we do need:
>> 
>> - The README should be understandable by anyone interested in writing a fuzz 
>> test
>> - Example tests should be runnable from a fresh clone of Cassandra, in an 
>> IDE or on the command line
>> - Examples of how we would test new features (like CEP-7, CEP-29, etc) with 
>> the fuzz testing framework
>> 
>> I find the JVM dtest framework accomplishes similar goals, and one reason is 
>> because there are plenty of examples, and it's relatively easy to copy and 
>> paste one example and have it do what you'd like. I believe the same 
>> approach would work for a fuzz testing framework.
>> 
>> Some of these tasks above are already done for Harry, such as better IDE 
>> support for samples. This will be available in OSS Harry shortly.
>> 
>> 2. Moving Harry in-tree vs. in submodule
>> 
>> As I understand it, making Harry a submodule of Cassandra would make it 
>> easier to deal with versioning, since we wouldn't have to do the entire 
>> release dance we need to do for dtest-api, but I don't see this as a big 
>> improvement to approachability.
>> 
>> I do think that moving Harry in-tree would improve approachability, for the 
>> same reason as the JVM dtests. It's nice to write a feature or fix, find a 
>> similar JVM dtest, copy, paste, and edit, and have something useful.
>> 
>> 3. General subdivision of Cassandra projects
>> 
>> This topic has come up quite a few times recently - around shared utilities 
>> (CEP-10 concurrency primitives, etc), dtest-api, query parser, etc. The 
>> project has tried out a few different approaches on composition of separate 
>> projects. Hopefully in the near future we find the one that works best and 
>> can start this process of splitting out libraries.
>> 
>> --
>> Abe
>> 
>>> On May 25, 2023, at 6:36 AM, Josh McKenzie  wrote:
>>> 
>>>> I would really like us to

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-06-01 Thread Josh McKenzie

> Josh, do you see any reports on what isn’t working?  I think most people 
> don’t touch 1% of what git can do… so it might be that 10% is broken but that 
> no one in our domain actually touches that path?
Was changing .gitmodule in harry to point to a branch and git just straight up 
went out to lunch when I tried to "git submodule update --init --recursive 
--remote" or any derivation thereof. Reproducing today in a worktree with 
GIT_TRACE, and it looks like the submodule command is hanging on:

> 16:00:48.253406 git.c:460   trace: built-in: git index-pack 
> --stdin --fix-thin '--keep=fetch-pack 32955 on Joshuas-MacBook-Pro.local' 
> --check-self-contained-and-connected
> 

On a whim I just let it run and it finally got unstuck after probably 5+ 
minutes; this might just be down to me being impatient and the default logging 
on git being... completely silent. =/

Looks like subsequent runs aren't hanging on that and are hopping right 
through, so perhaps this a "first run tax" for submodule + worktree.

On Thu, Jun 1, 2023, at 2:05 PM, David Capwell wrote:
> To be clear, we only use the relative syntax during development and not long 
> lived feature branches like cep-15-accord; we use https address there.  So 
> when you create a PR you switch to relative paths (if-and-only-if you change 
> the submodule), then on merge you switch back to https pointing to apache.  
> So the main issue has been when 2 authors try to work together (such as 
> during review of a PR)
> 
>> On Jun 1, 2023, at 10:15 AM, David Capwell  wrote:
>> 
>> Most edge cases we have seen in Accord are working with feature branches 
>> from other authors where we use relative paths to make sure the git@ vs 
>> https:// doesn’t become a problem for CI (submodule points to https:// to 
>> work in CI, but if you do that during feature development it gets annoying 
>> to push to GitHub… so we do ../cassandra-accord.git so git respects w/e 
>> protocol you are using).  In 1-2 peoples environments, when they checked out 
>> another authors logic the C* remote was correct, but the Accord one was 
>> still pointing to Apache (which doesn’t have the feature branch)…. This is 
>> trivial to fix, and might be a bug with our git hooks…. But still calling 
>> out as it has been an issue.
>> 
>> Josh, do you see any reports on what isn’t working?  I think most people 
>> don’t touch 1% of what git can do… so it might be that 10% is broken but 
>> that no one in our domain actually touches that path?
>> 
>>> On May 31, 2023, at 12:36 PM, Josh McKenzie  wrote:
>>> 
>>> Bumping into worktree + submodule pain on some harry related work; it looks 
>>> like "git worktree" and submodules are not currently fully implemented:
>>> 
>>> https://git-scm.com/docs/git-worktree#_bugs
>>> BUGS
>>> 
>>> Multiple checkout in general is still experimental, and the support for 
>>> submodules is incomplete. It is NOT recommended to make multiple checkouts 
>>> of a superproject.
>>> 
>>> I rely pretty heavily on worktrees and I know a lot of other folks who do 
>>> too. This is a dealbreaker for me in terms of adding anything else as a 
>>> submodule and I'd like to know if the accord folks have been running into 
>>> any worktree related woes w/the accord integration.
>>> 
>>> 
>>> On Sun, May 28, 2023, at 10:14 AM, Alex Petrov wrote:
>>>> Regarding approachability, one of the things I thought is worth adding is 
>>>> a DSL. I feel like there's enough functionality in Harry and there's 
>>>> enough information for anyone who needs to write even an involved test out 
>>>> there, but adoption doesn't usually start with complex use-cases, so it 
>>>> could be that making it extremely simple to generate the data and 
>>>> validating that written data is where it's supposed to be, should help 
>>>> adoption a lot. Unfortunately, more complex use-cases such as group-by 
>>>> support, or SAI testing will require a bit more knowledge and writing an 
>>>> involved model, so I do not see any shortcuts we can take here.
>>>> 
>>>> > I do think that moving Harry in-tree would improve approachability
>>>> 
>>>> I think it's similar as it is with in-jvm dtest api. I feel like we wold 
>>>> evolve it more actively if we didn't have to cut a release before every 
>>>> commit. In other words, I think that changing Harry code and extending 
>>>> functionality will be easier, which I think

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Josh McKenzie

> I do not have in mind a scenario where it could be useful to specify a LIMIT 
> in bytes. The LIMIT clause is usually used when you know how many rows you 
> wish to display or use. Unless somebody has a useful scenario in mind I do 
> not think that there is a need for that feature.
If you have rows that vary significantly in their size, your latencies could 
end up being pretty unpredictable using a LIMIT BY . Being able to 
specify a limit by bytes at the driver / API level would allow app devs to get 
more deterministic results out of their interaction w/the DB if they're looking 
to respond back to a client within a certain time frame and / or determine next 
steps in the app (continue paging, stop, etc) based on how long it took to get 
results back.

I'm seeing similar tradeoffs working on gracefully paging over tombstones; 
there's a strong desire to be able to have more confidence in the statement "If 
I ask the server for a page of data, I'll very likely get it back within time 
X".

There's an argument that it's a data modeling problem and apps should model 
differently to have more consistent row sizes and/or tombstone counts; I'm 
sympathetic to that but the more we can loosen those constraints on users the 
better their experience in my opinion.

On Mon, Jun 12, 2023, at 5:39 AM, Jacek Lewandowski wrote:
> Yes, LIMIT BY  provided by the user in CQL does not make much sense to 
> me either
> 
> 
> pon., 12 cze 2023 o 11:20 Benedict  napisał(a):
>> 
>> I agree that this is more suitable as a paging option, and not as a CQL 
>> LIMIT option. 
>> 
>> If it were to be a CQL LIMIT option though, then it should be accurate 
>> regarding result set IMO; there shouldn’t be any further results that could 
>> have been returned within the LIMIT.
>> 
>> 
>>> On 12 Jun 2023, at 10:16, Benjamin Lerer  wrote:
>>> 
>>> Thanks Jacek for raising that discussion.
>>> 
>>> I do not have in mind a scenario where it could be useful to specify a 
>>> LIMIT in bytes. The LIMIT clause is usually used when you know how many 
>>> rows you wish to display or use. Unless somebody has a useful scenario in 
>>> mind I do not think that there is a need for that feature.
>>> 
>>> Paging in bytes makes sense to me as the paging mechanism is transparent 
>>> for the user in most drivers. It is simply a way to optimize your memory 
>>> usage from end to end.
>>> 
>>> I do not like the approach of using both of them simultaneously because if 
>>> you request a page with a certain amount of rows and do not get it then is 
>>> is really confusing and can be a problem for some usecases. We have users 
>>> keeping their session open and the page information to display page of data.
>>> 
>>> Le lun. 12 juin 2023 à 09:08, Jacek Lewandowski 
>>>  a écrit :
 Hi,

 I was working on limiting query results by their size expressed in bytes, 
 and some questions arose that I'd like to bring to the mailing list.

 The semantics of queries (without aggregation) - data limits are applied 
 on the raw data returned from replicas - while it works fine for the row 
 number limits as the number of rows is not likely to change after 
 post-processing, it is not that accurate for size based limits as the cell 
 sizes may be different after post-processing (for example due to applying 
 some transformation function, projection, or whatever). 

 We can truncate the results after post-processing to stay within the 
 user-provided limit in bytes, but if the result is smaller than the limit 
 - we will not fetch more. In that case, the meaning of "limit" being an 
 actual limit is valid though it would be misleading for the page size 
 because we will not fetch the maximum amount of data that does not exceed 
 the page size.

 Such a problem is much more visible for "group by" queries with 
 aggregation. The paging and limiting mechanism is applied to the rows 
 rather than groups, as it has no information about how much memory a 
 single group uses. For now, I've approximated a group size as the size of 
 the largest participating row. 

 The problem concerns the allowed interpretation of the size limit 
 expressed in bytes. Whether we want to use this mechanism to let the users 
 precisely control the size of the resultset, or we instead want to use 
 this mechanism to limit the amount of memory used internally for the data 
 and prevent problems (assuming restricting size and rows number can be 
 used simultaneously in a way that we stop when we reach any of the 
 specified limits).

 https://issues.apache.org/jira/browse/CASSANDRA-11745

 thanks,
 - - -- --- -  -
 Jacek Lewandowski

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Josh McKenzie

Yeah, my bad. I have paging on the brain. Seriously.

I can't think of a use-case in which a LIMIT based on # bytes makes sense from 
a user perspective.

On Mon, Jun 12, 2023, at 1:35 PM, Jeff Jirsa wrote:
> 
> 
> On Mon, Jun 12, 2023 at 9:50 AM Benjamin Lerer  wrote:
>>> If you have rows that vary significantly in their size, your latencies 
>>> could end up being pretty unpredictable using a LIMIT BY . Being 
>>> able to specify a limit by bytes at the driver / API level would allow app 
>>> devs to get more deterministic results out of their interaction w/the DB if 
>>> they're looking to respond back to a client within a certain time frame and 
>>> / or determine next steps in the app (continue paging, stop, etc) based on 
>>> how long it took to get results back.
>> 
>> Are you talking about the page size or the LIMIT. Once the LIMIT is reached 
>> there is no "continue paging". LIMIT is also at the CQL level not at the 
>> driver level.
>> I can totally understand the need for a page size in bytes not for a LIMIT.
> 
> Would only ever EXPECT to see a page size in bytes, never a LIMIT specifying 
> bytes.
> 
> I know the C-11745 ticket says LIMIT, too, but that feels very odd to me.
>

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Josh McKenzie

> As long as it is valid in the paging protocol to return a short page, but 
> still say “there are more pages”, I think that is fine to do that.
Thankfully the v3-v5 spec all make it clear that clients need to respect what 
the server has to say about there being more pages: 
https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec#L1247-L1253

>   - Clients should not rely on the actual size of the result set returned to
> decide if there are more results to fetch or not. Instead, they should 
> always
> check the Has_more_pages flag (unless they did not enable paging for the 
> query
> obviously). Clients should also not assert that no result will have more 
> than
>  results. While the current implementation always 
> respects
> the exact value of , we reserve the right to return
> slightly smaller or bigger pages in the future for performance reasons.

On Mon, Jun 12, 2023, at 3:19 PM, Jeremiah Jordan wrote:
> As long as it is valid in the paging protocol to return a short page, but 
> still say “there are more pages”, I think that is fine to do that.  For an 
> actual LIMIT that is part of the user query, I think the server must always 
> have returned all data that fits into the LIMIT when all pages have been 
> returned.
> 
> -Jeremiah
> 
> On Jun 12, 2023 at 12:56:14 PM, Josh McKenzie  wrote:
>> 
>> Yeah, my bad. I have paging on the brain. Seriously.
>> 
>> I can't think of a use-case in which a LIMIT based on # bytes makes sense 
>> from a user perspective.
>> 
>> On Mon, Jun 12, 2023, at 1:35 PM, Jeff Jirsa wrote:
>>> 
>>> 
>>> On Mon, Jun 12, 2023 at 9:50 AM Benjamin Lerer  wrote:
>>>>> If you have rows that vary significantly in their size, your latencies 
>>>>> could end up being pretty unpredictable using a LIMIT BY . 
>>>>> Being able to specify a limit by bytes at the driver / API level would 
>>>>> allow app devs to get more deterministic results out of their interaction 
>>>>> w/the DB if they're looking to respond back to a client within a certain 
>>>>> time frame and / or determine next steps in the app (continue paging, 
>>>>> stop, etc) based on how long it took to get results back.
>>>> 
>>>> Are you talking about the page size or the LIMIT. Once the LIMIT is 
>>>> reached there is no "continue paging". LIMIT is also at the CQL level not 
>>>> at the driver level.
>>>> I can totally understand the need for a page size in bytes not for a LIMIT.
>>> 
>>> Would only ever EXPECT to see a page size in bytes, never a LIMIT 
>>> specifying bytes.
>>> 
>>> I know the C-11745 ticket says LIMIT, too, but that feels very odd to me.
>>> 
>>

Re: [DISCUSS] Remove deprecated keyspace_count_warn_threshold and table_count_warn_threshold

2023-06-13 Thread Josh McKenzie

> have subsequently been deprecated since 4.1-alpha in CASSANDRA-17195 when 
> they were replaced/migrated to guardrails as part of CEP-3 (Guardrails).
Have we been dropping support entirely for old params or using the @Replaces 
annotation into perpetuity?

I dislike the idea of operators having to remember to update things between 
versions and being surprised when things change roughly equally to us carrying 
along undocumented deprecated param name mapping roughly equally. :)

On Mon, Jun 12, 2023, at 5:56 PM, Dan Jatnieks wrote:
> Hello everyone,
> 
> I would like to propose removing the non-guardrail thresholds 
> 'keyspace_count_warn_threshold' and 'table_count_warn_threshold' 
> configuration settings on the trunk branch for the next major release.
> 
> These thresholds were first added with CASSANDRA-16309 in 4.0-beta4 and have 
> subsequently been deprecated since 4.1-alpha in CASSANDRA-17195 when they 
> were replaced/migrated to guardrails as part of CEP-3 (Guardrails).
> 
> I'd appreciate any thoughts about this. I will open a ticket to get started 
> if there is support for doing this.
> 
> Reference:
> https://issues.apache.org/jira/browse/CASSANDRA-16309
> https://issues.apache.org/jira/browse/CASSANDRA-17195
> CEP-3: Guardrails 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-3%3A+Guardrails
> 
> 
> Thanks,
> Dan Jatnieks
>

Re: [VOTE] CEP-8 Datastax Drivers Donation

2023-06-13 Thread Josh McKenzie

+1

On Tue, Jun 13, 2023, at 10:55 AM, Jeremiah Jordan wrote:
> +1 nb
> 
> On Jun 13, 2023 at 9:14:35 AM, Jeremy Hanna  
> wrote:
>> 
>> Calling for a vote on CEP-8 [1].
>> 
>> To clarify the intent, as Benjamin said in the discussion thread [2], the 
>> goal of this vote is simply to ensure that the community is in favor of the 
>> donation. Nothing more.
>> The plan is to introduce the drivers, one by one. Each driver donation will 
>> need to be accepted first by the PMC members, as it is the case for any 
>> donation. Therefore the PMC should have full control on the pace at which 
>> new drivers are accepted.
>> 
>> If this vote passes, we can start this process for the Java driver under the 
>> direction of the PMC.
>> 
>> Jeremy
>> 
>> 1. 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
>> 2. https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp

Re: [DISCUSS] Remove deprecated keyspace_count_warn_threshold and table_count_warn_threshold

2023-06-13 Thread Josh McKenzie

.apache.cassandra.cql3.statements.schema.CreateTableStatement#validate
>>> 
>>> The only difference I see is that table_count_warn_threshold includes 
>>> system tables where as tables_warn_threshold is only user tables…
>>> 
>>> > I would like to propose removing the non-guardrail thresholds 
>>> > 'keyspace_count_warn_threshold' and 'table_count_warn_threshold' 
>>> > configuration settings on the trunk branch for the next major release.
>>> 
>>> Deprecate in 4.1 is way too new for me to accept that, and its low effort 
>>> to keep; breaking users is always a bad idea and doing it when not needed 
>>> is bad…
>>> 
>>> Honestly, I don’t see why we couldn’t use @Replaces here to solve the 
>>> semantic gap… table_count_warn_threshold includes the system tables, so we 
>>> just need a Converter that takes w/e the value the user put in and 
>>> subtracts the system tables… which then gives us the user tables (matching 
>>> tables_warn_threshold)
>>> 
>>> > On Jun 13, 2023, at 7:57 AM, Josh McKenzie  wrote:
>>> > 
>>> >> have subsequently been deprecated since 4.1-alpha in CASSANDRA-17195 
>>> >> when they were replaced/migrated to guardrails as part of CEP-3 
>>> >> (Guardrails).
>>> > Have we been dropping support entirely for old params or using the 
>>> > @Replaces annotation into perpetuity?
>>> > 
>>> > I dislike the idea of operators having to remember to update things 
>>> > between versions and being surprised when things change roughly equally 
>>> > to us carrying along undocumented deprecated param name mapping roughly 
>>> > equally. :)
>>> > 
>>> > On Mon, Jun 12, 2023, at 5:56 PM, Dan Jatnieks wrote:
>>> >> Hello everyone,
>>> >> 
>>> >> I would like to propose removing the non-guardrail thresholds 
>>> >> 'keyspace_count_warn_threshold' and 'table_count_warn_threshold' 
>>> >> configuration settings on the trunk branch for the next major release.
>>> >> 
>>> >> These thresholds were first added with CASSANDRA-16309 in 4.0-beta4 and 
>>> >> have subsequently been deprecated since 4.1-alpha in CASSANDRA-17195 
>>> >> when they were replaced/migrated to guardrails as part of CEP-3 
>>> >> (Guardrails).
>>> >> 
>>> >> I'd appreciate any thoughts about this. I will open a ticket to get 
>>> >> started if there is support for doing this.
>>> >> 
>>> >> Reference:
>>> >> https://issues.apache.org/jira/browse/CASSANDRA-16309
>>> >> https://issues.apache.org/jira/browse/CASSANDRA-17195
>>> >> CEP-3: Guardrails 
>>> >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-3%3A+Guardrails
>>> >> 
>>> >> 
>>> >> Thanks,
>>> >> Dan Jatnieks
>>>

1 2 3 4 5 6 7 >

1 - 100 of 691 matches

Mail list logo