Re: [VOTE] Release Apache Cassandra 4.1.0 (take2)

2022-12-11 Thread Mick Semb Wever
On Sat, 10 Dec 2022 at 23:09, Abe Ratnofsky  wrote:

> Sorry - responded on the take1 thread:
>
> Could we defer the close of this vote til Monday, December 12th after 6pm
> Pacific Time?
>
> Jon Meredith and I have been working thru an issue blocking streaming on
> 4.1 for the last couple months, and are now testing a promising fix. We're
> currently working on a write-up, and we'd like to hold the release until
> the community is able to review our findings.
>


Update on behalf of Jon and Abe.

The issue raised is CASSANDRA-18110.
Concurrent, or nodes with high cpu count and number of tables performing,
host replacements can fail.

It is still unclear if this is applicable to OSS C*, and if so to what
extent users might ever be impacted.
More importantly, there's a simple workaround for anyone that hits the
problem.

Without further information on the table, I'm inclined to continue with
4.1.0 GA (closing the vote in 32 hours), but add a clear message to the
release announcement of the issue and workaround. Interested in hearing
others' positions, don't be afraid to veto if that's where you're at.


[RESULT][VOTE] Release Apache Cassandra 4.1.0 (take2)

2022-12-12 Thread Mick Semb Wever
Proposing the (second) test build of Cassandra 4.1.0 for release.
>
> sha1: f9e033f519c14596da4dc954875756a69aea4e78
> Git:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.1.0-tentative
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1282/org/apache/cassandra/cassandra-all/4.1.0/
>
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/4.1.0/
>
> The vote will be open for 96 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>


Vote passes with ten +1s (six binding).


[RELEASE] Apache Cassandra 4.1.0 GA released

2022-12-13 Thread Mick Semb Wever
The Cassandra team is pleased to announce the GA release of Apache
Cassandra version 4.1.0.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is the first GA release of the 4.1 series. As always, please
pay attention to the release notes[2] and Let us know[3] if you were to
encounter any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian
/etc/apt/sources.list.d/cassandra.sources.list and RedHat
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat
it is now https://redhat.cassandra.apache.org/41x/ .

[SOCIAL MEDIA] Dev community, FYI this is the social media guide to
help/include you if you have not been in the loop. Constantia has been
amazing, and many of you would be aware of all this already. These
activities all already are assigned to folk. It's shared here so you know
the amount of effort and coordination that's happening in parallel to this
launch.
https://docs.google.com/document/d/1OrfH9wtQjkBHo1-1SxpiMP0g_6CLSApmrAhZdzNvjXo/edit?usp=sharing


Enjoy!

[1]: CHANGES.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-4.1.0
[2]: NEWS.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-4.1.0
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-12-15 Thread Mick Semb Wever
>
> Another angle I forgot to mention is that this is quite a big patch and
> there are quite big pieces of work coming, being it CEP-15, for example. So
> I am trying to figure out if we are ok to just merge this work first and
> devs doing CEP-15 will need to rework their imports or we merge this after
> them so we will fix their stuff. I do not know what is more preferable.
>


Thank you for bringing this point up Stefan.

I would be actively reaching out to all those engaged with current CEPs,
asking them the rebase impact this would cause and if they are ok with it.
The CEPs are our priority, and we have a significant amount of them in
progress compared to anything we've had for many years.


Re: [DISCUSS] Slack notifications for new Stack Overflow, Stack Exchange questions

2022-12-19 Thread Mick Semb Wever
> In any case, does anyone have concerns about the notifications in Slack?
> Do you foresee any issues with it? Cheers!
>


Thanks for writing it up Erick. Perfectly fine to have a bias for action
here, it being an easy-to-undo action.
No objections, my only concern is if it is considered too noisy by some and
reduces interaction on #cassandra
Can you take a poll there to gauge folks' receptiveness to it?


Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-12-22 Thread Mick Semb Wever
>
>
> 3. Total 5 groups, 2968 files to change
>
> ```
> org.apache.cassandra.*
> [blank line]
> java.*
> [blank line]
> javax.*
> [blank line]
> all other imports
> [blank line]
> static all other imports
> ```
>


3, then 5.
There's lots under com.*, net.*, org.* that is essentially the same as "all
other imports", what's the reason to separate those?

My preference for 3 is simply that imports are by default collapsed, and if
I expand them it's the dependencies on other cassandra stuff I'm first
grokking. It's also our only imports that lead to cyclic dependencies
(which we're not good at).


Re: Event Report -- Cassandra Day China 2022 was a big success

2022-12-27 Thread Mick Semb Wever
> Thank you all again for introducing and approving of this Cassandra Day China 
> event, we also thank the people at China Golden Bridge for their dedication 
> throughout the weeks of preparation, without these, we wouldn’t be able to 
> pull this off in such a short amount of time.
>
> We look forward to future cooperation in the Cassandra community in China,


Thank you Tom for the retro written up! I am very happy that you put
the effort into organising this and that it was a huge success. What a
great contribution and initiative to our community. We would all love
to hear more from these Cassandra production users. With the language
challenges that present I am interested to hear any and all
suggestions, particularly any successful approaches other Apache
projects have taken.

For those that missed this: the turn around on it happened quickly;
the blog post for it is here:
https://cassandra.apache.org/_/events/20221222-cday-china.html


Cassandra CI Status 2023-01-07

2023-01-09 Thread Mick Semb Wever
   Happy 2023 everyone!

With only four months in front of us before the first 5.0 release I'm
hoping we can re-energize our focus on CI and Stable Trunk.

This post covers the following
 * Recap of CI improvements
 * State of Affair
 * The Butler (Build Lead)
 * Proposal for a Repeatable Containerised CI

and it calls for the following actions
 ** we need you to sign up for a week's rotation as Build Lead !
 ** please reply in-thread any CI issues I've forgotten,
 ** does CASSANDRA-18137 warrant a CEP?


 *** Recap of CI improvements

It's been over two years since my last CI Status post, with Adam and
Josh covering much of it in their general Status emails (which are
deeply appreciated).  I'm hoping we can continue with both, given
their importance to a successful 5.0 release and the debt cost we face
otherwise going from the initial alpha release to the eventual GA.


We have made good efforts on moving towards a Stable Trunk.
Special mentions to
 - improving parity between CircleCI and ci-cassandra.a.o (CASSANDRA-17930)
 - introducing Butler and the Build Lead role
 - pre-commit workflow, and automated multiplexing, in CircleCI
(CASSANDRA-16625)
 - single digit flaky failures per build on 4.0, 4.1 and trunk
ci-cassandra.a.o !!
 - CircleCI is as stable on Large as XLarge containers (CASSANDRA-18127)


*** State of Affair

None of our CI systems are consistently green yet.  Flakies occur in
both CircleCI and ci-cassandra.a.o  . We had to lower the 4.1 release
CI criteria to accept three consequential green runs on CircleCI, as
it would have been unlikely to achieve the same on ci-cassandra.a.o.
While the flakey rate is lower than 4.0, the higher number of tests we
run is making it harder to get those green runs.

Despite the overhead we continue to face with flakies and getting
major releases out, 4.1 saw fewer releases to GA than 4.0, I think all
will agree things are improving.  But the challenge in front of us up
to the 5.0 release is huge with nine CEPs slated to land.  Pre-commit
and post-commit CI needs investing in if we want our stable trunk
efforts to continue to improve.


*** The Butler (Build Lead)

The introduction of Butler and the Build Lead was a wonderful
improvement to our CI efforts.  It has brought a lot of hygiene in
listing out flakies as they happened.  Noted that this has in-turn
increased the burden in getting our major releases out, but that's to
be seen as a one-off cost.  This initiative lost traction and
volunteers mid last year.

We really need you to take part in the Build Lead weekly rotation.

I've signed myself up for this week, please jump in and sign yourself
up for the weeks ahead.  If you are a coach/manager for a team, please
permit and encourage your engineers to be involved in this activity,
it shouldn't be more than an hour over the week.  Further instructions
found at https://cwiki.apache.org/confluence/display/CASSANDRA/Build+Lead

If it's your first time being a Build Lead the community is here to
help you, just reach out.  It's also a great way into our community
for newcomers!

When it comes to Butler it's UX of history is a bit clumsy.  TIL that
you can indeed list the full history of failures per test, see 'Full
History' under a test page*.  Please use this information to help
create jira tickets on flakies, specifically the versions it applies
to and the rough rate of failure so far observed.

*) e.g. 
https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-trunk/failure/snapshot_test/TestArchiveCommitlog/test_archive_commitlog_point_in_time_ln


*** Proposal for a Repeatable Containerised CI

Building on what Josh writes in his "Cassandra project status, Year in
Review Holiday Edition" post, and many discussions offline with many
folk, I've written up the ticket epic for creating a reproducible
containerised ci-cassandra.a.o

Please read https://issues.apache.org/jira/browse/CASSANDRA-18137

The tl;dr of it is to create a script that, using the jenkins k8s
operator, can set up a ci-cassandra.a.o clone in your k8s context.

The ticket is lengthy, despite being in bullet form.  I don't believe
it warrants a CEP, speak up if you disagree.  The idea is to provide
us a turnkey solution: the jenkins k8s operator based script (create
ci-cassandra.a.o clone, run pipeline, save results, tear down clone);
to bring our existing build and test scripts (including their docker
images) from cassandra-builds to be in-tree to give us a declarative
jenkins pipeline that (in a simple intuitive manner) maps stages to
CI-agnostic build and test scripts (that can be run locally without a
CI system if you so desire), where all branch specific testing context
(jdks, pythons, dists) is defined outside of the CI code.  Its success
depends upon providing a CI system that is stable and fast for
pre-commit testing.


Re: Should we change 4.1 to G1 and offheap_objects ?

2023-01-12 Thread Mick Semb Wever
> Ok, wrt G1 default, this is won't go ahead for 4.1-rc1
>
> We can revisit it for 4.1.x
>
> We have a lot of voices here adamantly positive for it, and those of us that 
> have done the performance testing over the years know why. But being called 
> to prove it is totally valid, if you have data to any such tests please add 
> them to the ticket 18027


Revisiting. Are there any vetoes to making G1 the default (and
updating the G1 settings, see the patch on
https://issues.apache.org/jira/browse/CASSANDRA-18027 ) for 4.1.1 ?

IIUC , the summary of this thread till now was: there were no vetoes
to the change in trunk, but there were vetoes to 4.1.0 (because we
were inside the beta to GA window), and there was a desire to have
benchmarking data presented.

WRT benchmarking, we have a separate thread for performance testing in
the project.  The ticket admittedly does not do its due diligence on
data presentation and analysis of smaller heaps: a precedent we should
be creating; but instead relies upon experience from many. Are we ok
with this this time around, or shall the patch only be applied to
trunk (where we have no choice w/ jdk17 landing)?


Re: Should we change 4.1 to G1 and offheap_objects ?

2023-01-13 Thread Mick Semb Wever
> *+1* to changing to G1 on trunk for 5.0 and 4.1.1.  We have over a
> thousand clusters and over 10K nodes running on J8 and 11 with G1GC and
> memory management is excellent.
>


Thanks for the support Brad, you're definitely not alone. Alas the project
works in a consensus model, i.e. off the objections made - which have been
all sound. A good compromise has been offered that I will move forward on,
and I'll also update the commented out G1 settings in 4.1.1 to match those
becoming the default in trunk.



> Excellent. Two observations: first we reverted MaxGCPauseMillis=200,
> which is the JVM default. Cassandra's jvm{8,11}-server.options has 500
> (commented out) for some reason. Second on some clusters with 'humongous
> allocations' we've had to increase G1HeapRegionSize in a few cases on
> clusters with very large partitions.
>
> CMS was deprecated in Java 9, so I don't know why Cassandra would still
> use it as the default.
>


Absolutely! Take a look at the patch, it aligns the G1 settings closer to
what you say.
https://github.com/apache/cassandra/compare/trunk...thelastpickle:cassandra:mck/7486/trunk


My apologies I did not create this ticket earlier.


Re: Cassandra CI Status 2023-01-07

2023-01-15 Thread Mick Semb Wever
>
> *** The Butler (Build Lead)
>
> The introduction of Butler and the Build Lead was a wonderful
> improvement to our CI efforts.  It has brought a lot of hygiene in
> listing out flakies as they happened.  Noted that this has in-turn
> increased the burden in getting our major releases out, but that's to
> be seen as a one-off cost.
>


New Failures from Build Lead Week 3.


*** CASSANDRA-18156
– 
repair_tests.deprecated_repair_test.TestDeprecatedRepairNotifications.test_deprecated_repair_error_notification
 - AssertionError: Node logs don't have an error message for the failed
repair
 - hard regression
 - 3.0, 3.11,

*** CASSANDRA-18164 – CASTest Message serializedSize(12) does not match
what was written with serialize(out, 12) for verb
PAXOS2_COMMIT_AND_PREPARE_RSP
 - serializer class org.apache.cassandra.net.Message$Serializer; expected
1077, actual 1079
 - 4.1, trunk

*** CASSANDRA-18158
– 
org.apache.cassandra.distributed.upgrade.MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck
 - Cannot achieve consistency level ALL
 - 3.11, trunk

*** CASSANDRA-18159 – repair_tests.repair_test.TestRepair.test_*dc_repair
  - AssertionError: null
in MemtablePool$SubPool.released(MemtablePool.java:193)
 - 3.11, 4.0, 4.1, trunk

*** CASSANDRA-18160
– 
cdc_test.TestCDC.test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space
 - Found orphaned index file in after CDC state not in former
 - 4.1, trunk

*** CASSANDRA-18161 –
 
org.apache.cassandra.transport.CQLConnectionTest.handleCorruptionOfLargeMessageFrame
 - AssertionFailedError in
CQLConnectionTest.testFrameCorruption(CQLConnectionTest.java:491)
 - 4.0, 4.1, trunk

*** CASSANDRA-18162 –
cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_non_prepared_statements
- Inet address 127.0.0.3:7000 is not available: [Errno 98] Address already
in use
- 3.0, 3.11, 4.0, 4.1, trunk

*** CASSANDRA-18163 –
 
transient_replication_test.TestTransientReplicationRepairLegacyStreaming.test_speculative_write_repair_cycle
 - AssertionError Incoming stream entireSSTable
 - 4.0, 4.1, trunk


While writing these up, some thoughts…
 - While Butler reports failures against multiple branches, there's no
feedback/sync that the ticket needs its fixVersions updated when failures
happen in other branches after the ticket is created.
 - In 4.0 onwards, a majority of the failures are timeouts (>900s),
reinforcing that the current main problem we are facing in ci-cassandra.a.o
is saturation/infra


Re: Intra-project dependencies

2023-01-16 Thread Mick Semb Wever
>
> I think (4) is the only sensible option. It permits different development
> branches to easily reference different versions of a library and also to
> easily co-develop them - from within the same IDE project, even.
>


I've only heard horror stories about submodules. The challenges they bring
should be listed and checked.

Some examples
 - you can no longer just `git clone …`  (and we clone automatically in a
number of places)
 - same with `git pull …` (easy to be left with out-of-sync submodules)
 - permanence from a git SHA no longer exists
 - our releases get more complicated (our source tarballs are the asf
releases)
 - handling patches cover submodules
 - switching branches, and using git worktrees, during dv

I see (4) as a valid option, but concerned with the amount of work required
to adapt to it, and whether it will only make it more complicated for the
new contributor to the project. For example the first two points are
addressed by remembering to do `git clone --recurse-submodules …` . And who
would be fixing our build/test/release scripts to accommodate?

Not blockers, just concerns we need to raise and address.



> We might even be able to avoid additional release votes as a matter of
> course, by compiling the library source as part of the C* release, so that
> they adopt the C* release vote (or else we may periodically release the
> library as we do other releases)
>


Yes. Today we do a combination of first (3) and then (1). Having to make a
release of these libraries every time a patch (/feature branch) is
completing is a horror story in itself.

I might be missing something, does anyone have any other bright ideas for
> approaching this problem? I’m sure there are plenty of opinions out there.
>


Looking at the problem with these libraries,
 - we don't need releases
 - we don't have a clean version/branch parity to in-tree
 - codebase parity between branches is important for upgrade tests (shared
classloaders)

 For (2) you mention drift of the "same" version, isn't this only a problem
for dtest-api in the way it requires the "same version" of a codebase for
compatibility when running upgrade tests? As the library itself no longer
has an explicit version, what I presume you meant by logical version.

To begin with, I'm leaning towards (2) because it is a cognitive re-use of
our release branches, and the problems around classpath compatibility can
be solved with tests. I'm sure I'm not seeing the whole picture though…


Re: Intra-project dependencies

2023-01-16 Thread Mick Semb Wever
>  - permanence from a git SHA no longer exists
>
> With the caveat that I haven't worked w/submodules before and only know
> about them from a cursory search, it looks like git-submodule status would
> show us the sha for submodules and …
>


That isn't one SHA, but a collection of SHAs.

I'm thinking about reproducible builds, switching between branches, and git
bisecting, this stuff needs to just work. A build that fails fast if a
submodule is not on a specific SHA helps but introduces more problems.



> we could have parent projects reference specific shas to pull for
> submodules to build?
> https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203
> 
>


Yes, we can enforce a 1:1 relationship from parent SHA to submodule SHAs,
but then what's the point: you have both the headache of submodules and
having to always commit to multiple branches and forward merge.

That is, with fixed parent-to-submodule SHA relationships, these new
challenges are introduced:
- patches are off submodule SHAs, not the submodule's HEAD,
- you need to be making commits to all branches (and forward merging)
anyway to update submodule SHAs,
- if development is active on trunk, and then you need an update on an
older branch, you have to accommodate to backporting all those trunk
changes (or introduce the same branching in the submodule),

IMHO submodules are just trading one set of problems for another. And
overall life is simpler if we reduce the cognitive burden to just what we
have today: forward merging.

Benedict, experience based on developing one feature against one branch
doesn't face the problems of working, and switching frequently, between
branches.

The problem of wanting an external repository for these libraries to
promote external non-cassandra consumers I would solve by exporting the
code out of cassandra (not trying to import it). Git history is easy to
keep/replicate. We were talking about doing this with the jamm library,
given its primary development is currently with C* but we want it to appear
as a standalone library (/github codebase).


Re: Intra-project dependencies

2023-01-16 Thread Mick Semb Wever
>
> … extrapolating this experience to multiple C* versions
>
>
To include forward-merging, bisecting old history, etc etc. that's a leap
of faith that I believe deserves the discussion.

- patches are off submodule SHAs, not the submodule's HEAD,
>
>
> A SHA would point to the HEAD of a given branch, at the time of merge,
> just by SHA? I’ve no idea what you imagine here, but this just ensures that
> a given SHA of the importing project continues to compile correctly when it
> is no longer HEAD. It does not mean there’s no HEAD that corresponds
> directly to the SHA of the importing project’s HEAD.
>


That wasn't my concern. Rather that you need to know in advance when the
SHA is not HEAD. You can't commit off a past SHA. Once you find out (and
how does this happen?) that the submodule code is not HEAD what do you then
do? What if fast-forwarding the submodule to HEAD's SHA breaks things, do
you now have to fix that or introduce branching in the submodule? If the
submodule doesn't have releases, is it doing versioning, and if not how are
branches distinguished?

Arn't these all fair enquiries to raise?

- you need to be making commits to all branches (and forward merging)
> anyway to update submodule SHAs,
>
>
> Exactly as you would any library upgrade?
>


Correct. submodules does not solve/remove the need to commit to multiple
branches and forward merge.
Furthermore submodules means at least one additional commit, and possibly
twice as many commits.


- if development is active on trunk, and then you need an update on an
> older branch, you have to accommodate to backporting all those trunk
> changes (or introduce the same branching in the submodule),
>
>
> If you do feature development against Accord then you will obviously
> branch it? You would only make bug fixes to a bug fix branch. I’m not sure
> what you think is wrong here.
>


That's not obvious, you stated that a goal was to avoid maintaining
multiple branches. Sure there's benefits to a lazy branching approach, but
it contradicts your initial motivations and introduces methodology changes
that are worth pointing out. What happens when there are multiple consumers
of Accord, and (like the situation we face with jamm) its HEAD is well in
front of anything C* is using.

As Henrik states, the underlying problem doesn't change, we're just
choosing between trade-offs. My concern is that we're not even doing a very
good job of choosing between the trade-offs. Based on past experiences with
submodules: that started with great excitement and led to tears and
frustration after a few years; I'm only pushing for a more thorough
discussion and proposal.


Re: Merging CEP-15 to trunk

2023-01-16 Thread Mick Semb Wever
Could you file a bug report with more detail about which classes you think
> are lacking adequate documentation in each project, and what you would like
> to see?
>


I suggest instead that we open a task ticket for the merge.

I 100% agree with merging work incrementally, under a feature flag, but the
pre-commit gateway here is higher than the previous tickets being worked
on. API changes, pre-commit test results, and high (/entry) level comments,
all deserve any extra eyeballs available.


Re: Intra-project dependencies

2023-01-17 Thread Mick Semb Wever
>
> Regarding the use of snapshots, this isn’t impossible Henrik - I floated
> this as an option. But besides the additional overhead during development,
> this does not maintain reproducible builds, as the snapshot can change.
>

You would reference the snapshot dependency by the timestamped snapshot.
This makes it a reproducible build.

We have done this with dtest-api already, and there's already a comment
explaining it:
https://github.com/apache/cassandra/blob/trunk/.build/build-resolver.xml#L59-L60


It introduces some overhead when bisecting to go from the snapshot's
timestamp to the other repo's SHA (this is easily solvable by putting the
SHA inside the jarfile).

I don't see the problem of letting trunk use snapshots during the annual
development cycle, if we accept the overhead of cutting all library
releases before we cut the first alpha/beta.

FTR, i'm sitting on the fence between this and submodules. There's many dev
tasks we do, and different approaches have different pain points. The
amount of dev happening in the library also matters. I also agree with
Derek that linking in the source code into in-tree is a significant thing
to do, just to avoid the rigamaroles of dependency management.

Josh, bundling releases gets tricky in that you need to include the library
sources, because the cassandra release is essentially being voted on
(because it has been built) with non-released dependencies.


Re: Intra-project dependencies

2023-01-18 Thread Mick Semb Wever
You would reference the snapshot dependency by the timestamped snapshot.
> This makes it a reproducible build.
>
>
> How confident are we that the repository will not alter or delete them?
>


They cannot be altered.

I see artefacts there that are more than a decade old. But we cannot rely
on their permanence.

Putting the SHA into the jar's manifest is easy.  And this blog post shows
how you can also expose this info on the command line:
https://medium.com/liveramp-engineering/identifying-maven-snapshot-artifacts-by-git-revision-15b860d6228b


Given there's no guaranteed permanence to the snapshots, we would need to
have the git sha in the version, so if much older versions can't be
downloaded it can still be rebuilt.

This is done like: 1.0.0_${sha1}-SNAPSHOT



> linking in the source code into in-tree is a significant thing to do
>
>
> Could you explain why? I thought your preferred alternative was merging
> the source trees permanently
>


Linking or merging while it is still also being a separate library and repo.
If we are really not that interested in it as a separate library, and dev
change is high, or the code is somewhere less accessible, then in tree
makes sense IMHO.


Re: Intra-project dependencies

2023-01-19 Thread Mick Semb Wever
Thanks David for the detailed write up. Replies inline…



> We tried in-tree for in-jvm dtest and found that this broke every other
> commit… maintaining the APIs across all our supported branches was too hard
> to do and moving it outside of the tree helped make the upgrade tests more
> stable (there were breakage but less frequent)….
>


The in-jvm dtest-api library is unique in this way. I would not use it as
reasoning that other libraries should not be in-tree.




> We tried to do snapshot builds where the version contained the SHA, but
> this has the issue that snapshot builds “may” go away over time and made
> older SHAs no longer building…
>


Only keeping the last snapshot in repository.a.o is INFRA's policy (i've
found out).
We can ask INFRA to set up a separate snapshots repository just for us,
with a longer expiry policy. I'd rather not create extra work for infra if
there's other ways we can do this, and this approach would always require
some fallback approach to rebuilding the dedepency's SHA from scratch.




> We break python-dtest when cross-cutting changes are added as CI is hard
> to do correctly or not supported (testing downstream users (our 4 supported
> branches) is rarely done).
>


python dtests' is also in a different category, (the context and
consumption in a different direction, i.e. it's not a library used within
the in-tree).



> * [nice to have] be able to work with all subprojects in one IDE and not
> have to switch between windows while making cross-cutting changes
>


Isn't it only IntelliJ that suffers this problem? (That doesn't invalidate
it, just asking…)



>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>
>
> Correct, if you use submodules/script you have a text file saying what we
> “should” use, but this does not enforce actually using them… again we could
> make sure build.xml does the right thing,
>


If we try this approach out, I'm definitely in favour of any build.xml
command immediately failing if `git submodule status` != `git submodule
status --cached`



> but this can be confusing for people who mainly build in IDE and don’t
> depend on build.xml until later in development… this is something we should
> think about…
>


Again, isn't this only IntelliJ?



> A project I am familiar with has their build auto-inject git hooks to make
> sure things “just work”, we may be able to solve this in a similar way?
>


I'd like to hear/see more!

 - permanence from a git SHA no longer exists
>
>
> Why is this?  The SHA points to other SHAs, so it is still immutable.  If
> we claim that pointing to other SHAs doesn’t count then why do library
> versions?  Both are immutable snapshots of code at a specific point in time?
>


This, and a number of the other points, is already resolved (that
submodule's are on fixed SHAs, not floating HEAD).

 - our releases get more complicated (our source tarballs are the asf
> releases)
>
>
> We don’t include our dependencies do we?  If so, then does it really?  If
> Accord is a library we use, why would we include it’s source in the build?
> Isn’t it just another library from this point of view?
>


The build of the source tarball must work. If the source tarball release
switches how it does things, from building the submodule to including a
dependency then we're back to having to make releases (and introducing a
risk, and we don't ourselves work frequently with the source tarballs).




> switching between branches,
>
>
> This is a pain point that I feel should be handled by git hooks.  We have
> this issue when you try to reuse the same directory for different release
> branches, and its super annoying when you go back in time when jars were
> in-tree as you need to cleanup after switching back…. I do agree that we
> should flesh this out as its going to be the common case, so how do we “do
> the right thing” should be handled
>


+1



> Rather that you need to know in advance when the SHA is not HEAD.
>
>
> Do you?  Or do you really need to know which “branch” it is following?
> For example, lets say we release 5.0 then 5.1 then 5.2, and there are
> accord versions for each: 1.0, 1.2, 2.0… do we not need to really know
> which branch it is following, and only when you are trying to do a
> cross-cutting change?
>

I'm still a little confused here. If a submodule is following a branch, is
that floating? Then a parent SHA isn't fixed to a submodule SHA?

Say trunk is using accord:a12 where a12 is a SHA on its trunk. Other
non-cassandra people using accord make commits, but our in-tree trunk isn't
moved forward. Then someone in-tree does some dev that touches accord, they
work away but late in the dev cycle find out that in-tree trunk isn't on
the latest accord trunk and there's a conflict rebasing their work onto the
latest accord. Is this an accurate description?

Hope that all makes sense.


Re: [DISCUSS] Formation of Apache Cassandra Publicity & Marketing Group

2023-01-20 Thread Mick Semb Wever
>
> *To achieve this, we are proposing the formation of a Publicity &
> Marketing Working Group, and we are requesting your participation.*
>


+1 to the proposal and everything you write Patrick!

I've submitted the request for the ML (can take 24 hours). Who would like
to be a moderator for the list?

Otherwise let's give this a few days for any concerns, questions,
objections to be raised.


Re: Merging CEP-15 to trunk

2023-01-20 Thread Mick Semb Wever
On Tue, 17 Jan 2023 at 10:29, Benedict  wrote:

> but the pre-commit gateway here is higher than the previous tickets being
> worked on
>
> Which tickets, and why?
>


All tickets resolved in the feature branch to which you are now bringing
from feature branch into trunk.

A quick scan I see… 17103, 17109, 18041, 18056, 18057, 18087, 17719, 18154,
18142, 18142.

All these tickets are resolved but have not been merged to trunk, and they
have no fixVersion.

Assuming that it won't be a merge as-is (e.g. ninja-fixes will get
squashed), i think a task ticket for the clean up of the feature branch and
merge of it to trunk warrants a separate ticket. Such a ticket also helps
with the admin (linking to the tickets the merge is bringing in).


Re: Intra-project dependencies

2023-01-20 Thread Mick Semb Wever
replies are inline to your inline replies to my inline replies 🥁



> We can ask INFRA to set up a separate snapshots repository just for us,
> with a longer expiry policy. I'd rather not create extra work for infra if
> there's other ways we can do this, and this approach would always require
> some fallback approach to rebuilding the dedepency's SHA from scratch.
>
>
> If they will allow this and allow the snapshots to never be purged, then I
> am ok with this as a solution.
>


They will get purged eventually, and may get lost (no backups).


>
>
>> We break python-dtest when cross-cutting changes are added as CI is hard
>> to do correctly or not supported (testing downstream users (our 4 supported
>> branches) is rarely done).
>>
>
>
> python dtests' is also in a different category, (the context and
> consumption in a different direction, i.e. it's not a library used within
> the in-tree).
>
>
> I disagree.  The point I was making is we have several dependencies and we
> should think about how we maintain them.  My point is still valid that
> python dtests are involved with cross cutting changes to Cassandra, and the
> lack of downstream testing has broken us several times.  The solution to
> this problem may be different than Accord (as C* doesn’t depend on python
> dtest as you point out), but that does not mean we shouldn’t think about it
> in this conversation….
>
> One thing that comes to mind is that dependencies may benefit from running
> a limited C* CI as part of their merge process.  At the moment people are
> expected to create a tmp CI branch for all 4 supported C* versions, point
> it to the python dtest change, then submit to the JIRA as proof that CI was
> ran… normally when I find python dtest broke in branch X I find this had
> not happened…
>
> This holds true I believe for JVM dtest as well as we should be validating
> that the 4 target C* branches still work if you are touching jvm dtest…
>
> Now, with all that, Accord being external will have similar issues, a
> change there may break Cassandra so we should include a subset of Cassandra
> tests in Accord’s CI.
>


Fair enough, and this reasoning also applies to dtest-api. But this is an
additional concern in the discussion, with potentially different solutions.

Part of the testing requirements to dtests (and libraries that are included
in-tree) is downstream CI.
When you make a change in cassandra-dtest, you shouldn't have to go test
the C* branches – it should be part of the CI pipeline for cassandra-dtest
itself.

For dtests the versions tested are explicit. It's different for libraries
that are included in-tree, but you have to make the change in-tree, so it
makes sense it's part of in-tree CI.



> * [nice to have] be able to work with all subprojects in one IDE and not
>> have to switch between windows while making cross-cutting changes
>>
> Isn't it only IntelliJ that suffers this problem? (That doesn't invalidate
> it, just asking…)
>
> I have not used Eclipse or NetBeans for around 10 years so no clue!
>
>
>
>> but this can be confusing for people who mainly build in IDE and don’t
>> depend on build.xml until later in development… this is something we should
>> think about…
>>
> Again, isn't this only IntelliJ?
>
> Not sure, the only other IDE we support is NetBeans and not sure what we
> do there.
>


Off-topic: NetBeans allows you to have many projects open in the one window
(easy to have 20-30 projects open), and it does not do anything with
sources its own way – everything is delegated to the project's build system
(ant/gradle/maven).


A project I am familiar with has their build auto-inject git hooks to make
>> sure things “just work”, we may be able to solve this in a similar way?
>>
>
> I'd like to hear/see more!
>
> The project wants to make sure commit messages are structured “correctly”
> so enforces this via git hooks.  Gradle (the build they use) makes tasks
> depend on “installGitHooks” which copies 2 hooks to .git/hooks (commit-msg,
> and pre-push)
>
> We could always do the same in build.xml, that copies hooks we define into
> .git/hooks to make sure the behaviors we expect are enforced.  See
> https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks
> 
>
> There is a “post-checkout” hook we could leverage to detect that the
> dependences SHA are no longer the same and recursively checks out the
> “correct” dependencies
>


I like it! But I think we would need this AND a fail-fast in the build.xml



>  - our releases get more complicated (our source tarballs are the asf
>> releases)
>>
>>
>> We don’t include our dependencies do we?  If so, then does it really?  If
>> Accord is a library we use, why would we include it’s source in the build?
>> Isn’t it just another library from this point of view?
>>
>
> The build of the source 

Re: Intra-project dependencies

2023-01-20 Thread Mick Semb Wever
>
> Both a git post-checkout and a build fail-fast will protect us here. But
> the post-checkout will need to fail silently if the .git subdirectory
> doesn't exist.
>


Correction: the build fail-fast will need to fail silently if the .git
subdirectory doesn't exist.


Re: Intra-project dependencies

2023-01-20 Thread Mick Semb Wever
> Both a git post-checkout and a build fail-fast will protect us here. But
>>> the post-checkout will need to fail silently if the .git subdirectory
>>> doesn't exist.
>>>
>>
>> Correction: the build fail-fast will need to fail silently if the .git
>> subdirectory doesn't exist.
>>
>
> How will this work for users downloading source distributions?
>

It is presumed that the source found in the submodule is on the correct
SHA. The integrity checks are in place when creating and when voting on the
source tarball release. This means that the the build of the submodule has
to be part of the in-tree build (which I have assumed already).


Re: Merging CEP-15 to trunk

2023-01-20 Thread Mick Semb Wever
These tickets have all met the standard integration requirements, so I’m
> just unclear what “higher pre-commit gateway” you are referring to.
>


A merge into trunk deserves extra eyeballs than a merge into a feature
branch.

We can refer to this as a "higher pre-commit gateway" or a "second pass".
Either way I believe it is a good thing.



> I think the existing epics are probably more natural tickets to reference
> in the merge, eg 17091 and 17092.
>


If _all_ tickets in that epic are being merged, sure. Otherwise how can I
see what resolved tickets are in trunk and what resolved tickets are not…

I would rather not have to be working on accord actively to be able to see
this stuff quickly.


Re: Merging CEP-15 to trunk

2023-01-20 Thread Mick Semb Wever
What Benedict says is that the commits into cassandra/cep-15-accord and
> cassandra-accord/trunk branch have all been vetted by at least two
> committers already. Each authored by a Cassandra committer and then
> reviewed by a Cassandra committer. That *is* our bar for merging into
> Cassandra trunk.
>


Yes yes. But we do catch things late, and the eyes on a merge would have a
different PoV than original reviews, and that can be helpful. And yes,
we can also review things post-commit if we like. I'm not saying it has to
be done, or that our rules enforce it, just that I think it would be
helpful to offer the invite for more eyeballs (and provide the hygiene in
jira).
Looking through the cep-15-accord branch there's >10 ninja-fixes.
Won't these get cleaned up?
And if so, shouldn't changes be open to another review round?

It just seems like a more congenial manner to collaborate 🤷🏻‍♀️
And maybe no one is interested in such a second pass pre-merge review. idk.


Re: [DISCUSS] Formation of Apache Cassandra Publicity & Marketing Group

2023-01-20 Thread Mick Semb Wever
I'll add the both of you, and anyone else that speaks up.

To clarify, being a moderator to the mailing list is only about
accepting/rejecting posts being sent from recipients that have not (yet)
subscribed.
This is usually 95% spam and 5% existing users posting from a
different account.


On Fri, 20 Jan 2023 at 19:24, Molly Monroy  wrote:

> 
> I am also happy to be a moderator. Melissa and I together can ensure we
> have a solid level of coverage.
>
> On Jan 20, 2023, at 11:03 AM, Melissa Logan  wrote:
>
> 
> I appreciate the open and more structured approach to publicity &
> marketing so everyone can provide input and for transparency.
>
> I'm also happy to be a moderator.
>
>
> On Fri, Jan 20, 2023 at 7:01 AM Patrick McFadin 
> wrote:
>
>> I would be happy to be one of the moderators. Not sure if that's singular
>> or plural. :D Just need to know how to do it.
>>
>> Patrick
>>
>> On Fri, Jan 20, 2023 at 1:44 AM Mick Semb Wever  wrote:
>>
>>> *To achieve this, we are proposing the formation of a Publicity &
>>>> Marketing Working Group, and we are requesting your participation.*
>>>>
>>>
>>>
>>> +1 to the proposal and everything you write Patrick!
>>>
>>> I've submitted the request for the ML (can take 24 hours). Who would
>>> like to be a moderator for the list?
>>>
>>> Otherwise let's give this a few days for any concerns, questions,
>>> objections to be raised.
>>>
>>>
>
> --
> Melissa Logan (she/her)
> CEO & Founder, Constantia.io
> LinkedIn
> <https://urldefense.com/v3/__https://www.linkedin.com/in/mklogan/__;!!PbtH5S7Ebw!YGnvK3ARkrhdBVEtsA3QgnE1n_pV54nDtoqR5qw6gbgUBCN2qVlUi0P03-5QFxkuHcU-oda0AhaVjTU$>
>  | Twitter <https://twitter.com/Melissa_B2B>
>
>
>


Re: Merging CEP-15 to trunk

2023-01-23 Thread Mick Semb Wever
>
> The sooner it’s in trunk, the more eyes it will draw, IMO, if you are
> right about most contributors not having paid attention to a feature branch.
>


We all agree we want the feature branch incrementally merged sooner rather
than later.
IMHO any merge to trunk, and any rebase and squash of ninja-fix commits,
deserves an invite to reviewers.
Any notion of merge-then-review isn't our community precedent.

I appreciate the desire to not "be left hanging" by creating a merge ticket
that requires a reviewer when no reviewer shows. And the desire to move
quickly on this.

I don't object if you wish to use this thread as that review process. On
the other hand, if you create the ticket I promise to be a reviewer of it,
so as not to delay.


Re: Merging CEP-15 to trunk

2023-01-24 Thread Mick Semb Wever
>  But it's not merge-than-review, because they've already been
> reviewed, before being merged to the feature branch, by committers
> (actually PMC members)?
>
> You want code that's been written by one PMC member and reviewed by 2
> other PMC members to be put up for review by some random 4th party? For how
> long?
>


It is my hope that the work as-is is not being merged. That there is a
rebase and some trivial squashing to do. That deserves a quick check by
another. Ideally this would be one of the existing reviewers (but like any
other review step, no matter how short and trivial it is, that's still an
open process). I see others already doing this when rebasing larger patches
before the final merge.

Will the branch be rebased and cleaned up?
How will the existing tickets make it clear when and where their final
merge happened?


Re: [DISCUSS] Formation of Apache Cassandra Publicity & Marketing Group

2023-01-24 Thread Mick Semb Wever
The market...@cassandra.apache.org list is created.

To subscribe send an email to marketing-subscr...@cassandra.apache.org from
the email address you want to subscribe from.

If you are a committer you can alternately use Whimsy:
https://whimsy.apache.org/committers/subscribe

regards,
Mick


On Fri, 20 Jan 2023 at 00:31, Patrick McFadin  wrote:

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *Hello Cassandra Community!We are at a pivotal moment for the Cassandra
> community, with the first Cassandra Summit in 7 years coming up on March
> 13th, and a major release coming later this year with Cassandra 5.0. It is
> important that we come together to set the publicity strategy and direction
> for these important moments, and that we work together to define how
> Cassandra shows up across the technology industry.To achieve this, we are
> proposing the formation of a Publicity & Marketing Working Group, and we
> are requesting your participation.What is the Publicity & Marketing Working
> Group?This is a working group open to community members who have the
> insight and skills to help define Cassandra’s public narrative and
> participate in our marketing strategy and execution. The group will meet
> once a month for an hour to discuss important marketing topics. You can
> find us on #cassandra-events. We also propose adding a mailing list,
> marketing@cassandra.a.o, to handle day-to-day marketing needs and async
> communication. Our publicity and marketing partners from Constantia - Molly
> Monroy  and Melissa Logan  -
> will work with us to build this working group. What will this group be
> responsible for?Our initial vision for this group is to accelerate how we
> do marketing & publicity for Cassandra. We will refine and advance
> Cassandra’s public perception of the tech industry, to show how Cassandra
> has grown, innovated, and revitalized itself as a community. We will do
> this through: - Participating in marketing strategy for major moments (in
> particular, C* Summit in March and Cassandra 5.0 release later this year)-
> Expanding our local meetup and events presence- Sourcing end-user case
> studies for marketing and PR collateral- Making sure the Cassandra
> community shows up at third-party events- Contributing content - from blogs
> to documentation - to ensure we have a robust stream of content for our end
> usersOur first two orders of business will be: 1. Jointly determine
> operating model and governance, and get input and alignment on the above
> goals/responsibilities. 2. Discuss marketing for Cassandra Summit,
> primarily defining the news we will share at the event from the project
> directly and from our sponsors. This is coming up quickly and we will need
> community assistance to achieve our publicity goals. As this is a
> community-driven group, please share ideas and feedback on the purpose of
> this group and what we need to achieve. When is the meeting?We are
> proposing the meetings take place on the 4th Wednesday of each month. We
> will alternate times of the day to try to accommodate. We can adjust based
> on member attendance.  - Jan, March, May, July, Sept, Nov.  - 4th Wed of
> the month,  8a PT- Feb, April, June, August, October, Dec - 4th Wed of the
> month, Wed 4p PTWe will create a centralized document to share and document
> information about the working group, including meeting minutes, monthly
> tasks, and priorities. Decisions will be discussed and finalized using the
> project mailing list. Patrick*
>


Re: Cassandra Summit update for 2023-01-24

2023-01-25 Thread Mick Semb Wever
>
> *To create a more neutral ground that reflects our community better, Linux
> Foundation Events has taken on the considerable task of running Cassandra
> Summit in 2023. We are very grateful they took a chance on our community,
> and we will be better for it.  *
>
*…*
>
*Why is this important to mention? Our community needs an independent
> Cassandra Summit, and right now, it needs your support in attending the
> event. Let’s show the Linux Foundation that Cassandra Summit is something
> we value as a community. I know budgets are tight, and it’s hard to get
> approval. If you are able, make the case and register today. *
>


I particularly appreciate and am inspired by this. Well worded, thank you
Patrick!


Re: Merging CEP-15 to trunk

2023-02-01 Thread Mick Semb Wever
Hi Everyone, I hope you all had a lovely holiday period.
>


Thank you David and Caleb for your continuous work dealing with the
practicalities involved in getting the merge to happen!  And thank you
Benedict for starting the thread – it is an important topic for us to
continue as CEPs and feature branches become a thing!

Reading the thread I see some quite distinct perspectives, and it's making
it difficult to have as constructive a dialogue as we should wish for.  As
is usual we see this balanced by the  productive collaboration happening
but hidden in the jira tickets.

I want to raise the following points, not as things to solve necessarily
but to help us appreciate just how different our PoVs are and how easy it
is for us to be talking past each other. If you immediately disagree with
any of them I suggest you read below how my PoV gets me there.

1. Work based on a notion of meritocracy. If you do the work, or have done
the work, your input weighs more.
2. Promote teamwork and value an inclusive culture.
3. Where the work is, or has been committed to, can be considered
irrelevant (in the context of this thread).
4. There are different types of reviews. We want to embrace them all.
5. We gate on either CI (not only circleci). Do not trash or reduce parity
in the other.
6. A feature branch != release branch/trunk.
7. The cep-15-accord was not ready to merge, this thread helped smoke out a
number of minor things.
8. Merging of feature branches should be quick and light team-work.


Going into these in more detail…

1+3)
If a patch from a pmc engineer has been reviewed by two other pmc
engineers,  CI has been run,  and they are about to merge,  a new reviewer
interjecting late can expect very little to happen pre-merge.  If there's
something serious it needs to be explained quickly.  We value people's
time, productiveness, and expertise.

This applies everywhere in the project: from tickets with patch files to
CEP work and feature branches.

2)
The more eyes the better.  Everything we do is transparent, and we should
always be receptive and by default open to new input regardless of where in
the process we are and where it comes from.  First be receptive and hear
people out, afterwards make the "now, or later" decision.  Take the time to
explain why a "we'll do it later" decision – it might not be obvious to
everyone, it helps newcomers understand the project dynamics, and it sets
project precedence.  Use (1) as needed.

4)
A lot of the thread has focused on reviews of Accord and its technical
components.  There are other types of reviews.  Examples are docs (internal
and external), the build system, our CIs and integrations, releases and ASF
policies.  Henrik pointed out that it is wasteful for these reviews to
happen early in the review window.  In addition, they are quite small in
comparison to initial technical reviews and equally quick to address. The
larger the patch the more we can expect others stepping in with these types
of tail-end reviews.

If we think of the review process as promoting teamwork and an inclusive
culture, we can also think of reviews akin to pair-programming that help
mentor those less knowledgeable in an area of the code.  This needs to be
time-boxed ofc, but IMHO we should keep an open mind to it – these types of
reviews do surprisingly lead to bug fixes and improvements with the
reviewer acting as a rubber-duck. This type of review we would want early,
not late.

There's also another type of review here.  If the author of another CEP has
another feature branch about to merge, they may want to review the branch
to get ahead of possible incoming conflicts.  This is a review that cares
nothing about Accord itself.  Their evaluation is if there's any
project-wide ROI if some changes happen before the merge.  Again, teamwork.

5)
It's been mentioned "always shippable trunk according to circleci".  That's
not true: we are always shippable according to *either* CI.  There are folk
just using ci-cassandra for our pre-commit gateway.  It is important that
you don't trash the other CI system, particularly when it comes to parity
of the tests that we run.  If you introduce new tests in one CI it is on
you to introduce it to the other CI, and therefore run both CI pre-merge.
This does not imply a merge is blocked on green ci-cassandra.a.o

6)
It's been mentioned that code committed to a feature branch can be
considered finalised, i.e. the review window is closed.  I don't buy this.
We don't cut releases off feature branches and don't have a policy of
"always stable feature-branch".  Both (4) and (5) help illustrate this.
Thinking about the different types of reviews and concerns we have on trunk
was my background to presuming the review window is open until the final
atomic commits to our release branches.  And our stable trunk agreement and
efforts is my presumption that trunk is to be treated like a release
branch. This supports (3), it makes no difference if the work could be in a

Re: Apache Cassandra 5.0 documentation

2023-02-01 Thread Mick Semb Wever
No objections from me! And yes, 7 days have passed and no one has spoken
up, you're a committer – you can assume silence is consent.

Folks, Lorina's proposal here is very thorough, and it will re-organise the
docs significantly. Make sure to check it out quickly if you think you
might have any concerns or objections!


On Thu, 2 Feb 2023 at 05:08, Lorina Poland  wrote:

> I've not had any comment on this topic. Can I assume that no one has
> objections?
>
> Lorina
>
> On 2023/01/25 19:02:41 Lorina Poland wrote:
> > Greetings!
> >
> > I'm gearing up to help get the Cassandra 5.0 docs in good order before
> the
> > GA release occurs later this year. Recently, I've been thinking about a
> > more standardized organization to docs, to make it simpler for users to
> > find what they are looking for, separate from searching. [That's the kind
> > of thing docs nerds think about.] To that end, I've created a unified
> > information architecture (IA) that can apply to any kind of
> documentation,
> > including the Apache C* docs.
> >
> > Up front, I'll say, not every section of this organization applies to
> > Apache C* docs, but reorganizing the docs to follow this pattern as much
> as
> > possible will help users find what they need.
> >
> > I'd like your input into this IA that I've outlined. Please give me
> > feedback about your opinions! If I can tackle this issue before launching
> > into adding CEP features, working down the existing JIRA tickets for
> > documentation, and backfilling missing items, it would be immensely
> > helpful. No opinion will go unaddressed, so please take a few minutes to
> > take a look.
> >
> > I'm linking a google doc, to make it easy for anyone to make comments:
> >
> https://docs.google.com/document/d/1A96K73vj9MbJoD7wJNgIKWrOkLq-ZL2cNZAEXSWrciY/edit?usp=sharing
> >
> > I'm also drafting an Apache C* 5.0 Doc Plan for the work, to make it
> simple
> > for anyone to know what is being done, and will share that next. In
> > addition, I've started consolidating the current Documentation tickets
> that
> > are open under the JIRA project, component "Documentation".
> >
> > Thanks,
> > Lorina Poland
> >
>


Re: [DISCUSS] Merging incremental feature work

2023-02-05 Thread Mick Semb Wever
Love the write up Henrik :-8



> On Fri, Feb 3, 2023, at 9:20 AM, Henrik Ingo wrote:
>
> …
> 1) I assume JDK17 support is invasive, so that would suggest a feature
> branch. However, the next question is, is there any risk involved in this
> work (like Falcon for MySQL). Hypothetically it could be that Java 17 has
> worse performance than Java 11, or some other blocking problem is
> encountered. But in practice we probably estimate that this risk is small.
> In such a case JDK17 support could indeed be developed with small patches
> directly against trunk, but this would be an exception to the rule!
>
>

The introduction of JDK17 (in progress) has a different challenge to it as
well, as it involves small changes on many different repos. This makes a
feature branch(es) cumbersome and really doesn't de-risk the final merge.

And as Derek you point out, our existing method of introducing and dropping
JDK support is clumsy.  The JDK17 work includes improving how we do this so
it will become easier (the primary tactic here is that the JDKs we support
are defined only in build.xml, and other repos read and use it as context),
leaving just the work on fixing what actually is broken when using the new
jdk.

So it's another example of incremental roll out, because of base
improvements and then the actual changes in safe incremental steps.

To my understanding this wasn't the original desire and consensus with
JDK17, folk requested that it be introduced complete, though I cannot
actually find the reference to that.  I was about to raise a thread asking
for us to instead take an incremental approach, to help us move faster and
safer, but am doing it here, thanks for raising the thread Josh.   As
others point out, we can't paint ourselves into the wrong corner with
JDK17, though we can't drop JDK8 support until we're out of the (right)
corner.


ref:
 - https://lists.apache.org/thread/hny49r5vlg4nn9d53n3fksxvjg71joqz
 - https://lists.apache.org/thread/s1bntyk3ykovtw0ph48rf5sy2v9ls8qw


Re: [DISCUSS] Merging incremental feature work

2023-02-05 Thread Mick Semb Wever
> To my understanding this wasn't the original desire and consensus with
> JDK17, folk requested that it be introduced complete, though I cannot
> actually find the reference to that.  I was about to raise a thread asking
> for us to instead take an incremental approach, to help us move faster and
> safer, but am doing it here, thanks for raising the thread Josh.   As
> others point out, we can't paint ourselves into the wrong corner with
> JDK17, though we can't drop JDK8 support until we're out of the (right)
> corner.
>


I forgot to mention something.

Taking an incremental approach here also includes dropping support for
scripted UDFs first, and later on adding hooks for UDFs so users can re-add
the functionality.  (This could have been (but idk,) the "complete" desire
expressed.)

Implementing the hooks for UDFs is a current blocker and slowing down the
introduction of jdk17.  We would like to remove the blocker by first
dropping the already deprecated UDFs first.  I am for this approach because
everyone having to develop and test against jdk8, when they know 5.0
won't, is more the headache here.


Re: Welcome Patrick McFadin as Cassandra Committer

2023-02-08 Thread Mick Semb Wever
Long overdue with so much you have done for so many years. Congrats!

On Thu, 2 Feb 2023 at 23:26, Molly Monroy  wrote:

> Congrats, Patrick... much deserved!
>
> On Thu, Feb 2, 2023 at 1:59 PM Derek Chen-Becker 
> wrote:
>
>> Congrats!
>>
>> On Thu, Feb 2, 2023 at 10:58 AM Benjamin Lerer  wrote:
>>
>>> The PMC members are pleased to announce that Patrick McFadin has accepted
>>> the invitation to become committer today.
>>>
>>> Thanks a lot, Patrick, for everything you have done for this project and
>>> its community through the years.
>>>
>>> Congratulations and welcome!
>>>
>>> The Apache Cassandra PMC members
>>>
>>
>>
>> --
>> +---+
>> | Derek Chen-Becker |
>> | GPG Key available at https://keybase.io/dchenbecker
>> 
>> and   |
>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org
>> 
>> |
>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>> +---+
>>
>>


Re: [VOTE] Release Apache Cassandra 4.0.8

2023-02-09 Thread Mick Semb Wever
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
> [1]: CHANGES.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0.8-tentative
> [2]: NEWS.txt:
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0.8-tentative
>


+1

Checked
- signing correct
- checksums are correct
- source artefact builds (JDK 8+11)
- binary artefact runs (JDK 8+11)
- debian package runs (JDK 8+11)
- debian repo runs (JDK 8+11)
- redhat* package runs (JDK 8+11)
- redhat* repo runs (JDK 8+11)


Re: JIRA account creation request

2023-02-15 Thread Mick Semb Wever
> I would like to get my JIRA account created as I would like to contribute.
> Here are my details
>
> email address : manishkhandelwa...@gmail.com
>


Your jira account has been created. You should have received an email.

regards,
Mick


Re: JIRA account creation request

2023-02-15 Thread Mick Semb Wever
> HI Mick,
>
> Could you pls. help with JIRA account for me as well ?
>




Done Srinivas. You should have received an email.

Welcome to the Cassandra community.


Re: Intra-project dependencies

2023-02-17 Thread Mick Semb Wever
On Thu, 16 Feb 2023 at 21:43, David Capwell  wrote:

> After a lot of effort I think this branch is in a good state, accord feels
> mostly like its in-tree and all the complexity of git is hidden mostly.  I
> would love more feedback as the patch is in a usable state
>


This work is very good, thanks David.
It is going to require a little bit for folk to get familiar with: there's
no free lunch; so give it a whirl. There's help in CONTRIBUTING.md
https://github.com/dcapwell/cassandra/tree/accord-submodules


Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-02-27 Thread Mick Semb Wever
New failures from Build Lead week 8

*** CASSANDRA-18290 – SecondaryIndexTest.testUpdatesToMemtableData
4.1, row did not delete

*** CASSANDRA-18289 –
sslnodetonode_test.TestNodeToNodeSSLEncryption.test_ssl_client_auth_required_fail
trunk, error not found in logs

*** CASSANDRA-18287 –
InsertUpdateIfConditionTest.testMultiExistConditionOnSameRowClustering
trunk,

*** CASSANDRA-18288 – TopPartitionsTest.basicRegularTombstonesTest
trunk, missing tombstones

***CASSANDRA-18286 – TTLTest.testCapWarnExpirationOverflowPolicy
negative deletion and expiry value
3.0, MarshalException: A local expiration time should not be negative



On Tue, 21 Feb 2023 at 03:41, guo Maxwell  wrote:
>
> Hi all :
> Here comes Cassandra CI status for  2023-2-13 - 2023-2-17 :
>
> *** CASSANDRA-18274 - Test 
> Failures:org.apache.cassandra.utils.binlog.BinLogTest.testTruncationReleasesLogSpace-compression
>  -linked in 4.1
> Other tests below are time out exceptions, and we can ignore them as it's 
> considered test-infrastructure failures. Which we are working on separately 
> (CASSANDRA-18137), and I have already modify this notification in Build Lead 
> page.
> *** CASSANDRA-18273: Timeout occurred. Please note the time in the report 
> does not reflect the time until the timeout. - linked in trunk, 4.0
> *** CASSANDRA-11493:dtest failure in 
> consistency_test.TestAccuracy.test_simple_strategy_users - it is timeout 
> exception for CASSANDRA-11493 so I do not reopen it .
>
>
>
> German Eichberger via dev  于2023年2月14日周二 00:29写道:
>>
>> First, one of my learnings was that a ticket assigned to an issue in one 
>> branch of butler doesn't carry to another. So always search.
>>
>> New failures from build lead week 7:
>>
>> I created a Jira filter for finding the tickets I created: 
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20issuetype%20%3D%20Bug%20AND%20component%20in%20(%22Test%2Fdtest%2Fjava%22%2C%20%22Test%2Fdtest%2Fpython%22%2C%20%22Test%2Ffuzz%22%2C%20%22Test%2Funit%22)%20AND%20created%20%3E%3D%20-7d%20AND%20reporter%20in%20(xgerman42)
>>
>> *** CASSANDRA-18257 - Test Failures: 
>> org.apache.cassandra.net.ProxyHandlerConnectionsTest.testExpireSome - linked 
>> in 4.0, 4.1, trunk
>> *** CASSANDRA-18253 - Test Failures: dtest 
>> repair_tests.repair_test.TestRepair.test_simple_sequential_repair - linked 
>> in 4.0, trunk
>> *** CASSANDRA-18246 - Test Failures: 
>> org.apache.cassandra.cql3.validation.operations.TTLTest.testCapNoWarnExpirationOverflowPolicy
>>  - linked in 3.11
>> *** CASSANDRA-18245 - Test Failures: 
>> org.apache.cassandra.db.compaction.CompactionsTest.testDontPurgeAccidentally 
>> - linked in 3.11
>> -
>>
>> 
>> From: Dan Jatnieks 
>> Sent: Friday, February 10, 2023 2:42 PM
>> To: dev@cassandra.apache.org ; Claude Warren, Jr 
>> 
>> Subject: [EXTERNAL] Re: Cassandra CI Status 2023-01-07
>>
>> You don't often get email from d...@datastax.com. Learn why this is important
>> New Failures from Build Lead Week 6:
>>
>> *** CASSANDRA-18021 - Flaky 
>> org.apache.cassandra.distributed.test.ReprepareTestOldBehaviour#testReprepareMixedVersionWithoutReset
>> - This existing ticket has been linked in butler to new failures on 3.11
>>
>> *** CASSANDRA-17608 - Fix testMetricsWithRebuildAndStreamingToTwoNodes
>> - Re-opened as intermittent failure occurred in build 1445 on trunk
>>
>> Several new failures had only a single occurrence; no new tickets were 
>> opened during this time.
>>
>>
>>
>> On Fri, Feb 10, 2023 at 12:44 AM Claude Warren, Jr via dev 
>>  wrote:
>>
>> New Failures from Build Lead Week 5
>>
>> *** CASSANDRA-18198 - "AttributeError: module 'py' has no attribute 'io'" 
>> reported in multiple tests
>> - reported in 4.1, 3.11, and 3.0
>> - identified as a possible class loader issue associated with CASSANDRA-18150
>>
>> *** CASSANDRA-18191 - Native Transport SSL tests failing
>> - TestNativeTransportSSL.test_connect_to_ssl and 
>> TestNativeTransportSSL.test_connect_to_ssl (novnode)
>> - TestNativeTransportSSL.test_connect_to_ssl_optional and 
>> TestNativeTransportSSL.test_connect_to_ssl_optional (nvnode)
>>
>>
>> On Mon, Jan 23, 2023 at 10:10 PM Caleb Rackliffe  
>> wrote:
>>
>> New failures from Build Lead Week 4:
>>
>> *** CASSANDRA-18188 - Test failure in 
>> upgrade_tests.cql_tests.cls.test_limit_ranges
>> - trunk
>> - AttributeError: module 'py' has no attribute 'io'
>>
>> *** CASSAN

Re: A Guest invitation to the Slack Dev Community

2023-02-28 Thread Mick Semb Wever
Invite sent to you.

On Tue, 28 Feb 2023 at 08:51, Maxim Chanturiay 
wrote:

> Hello!
>
> I would like to join slack dev community, as a guest.
> My current email address is the one I'd like to use.
>
> If it is of any help, here's a link to my ASF Jira Account:
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=maximc
>
> There are a couple of submitted patches that I am looking for someone to
> review.
>
> One of the tickets has a patch for more than a month.
>
> Regarding the other, the person who I've requested a review from replied
> that they are busy in the near future.
> I thought to help them out by asking around for someone with more time on
> their hands.
>
> An additional ticket that I am currently researching will require a design
> insight from a more experienced member.
>
> Jira discussions proved to be too long in the past issues I've worked on.
> Plus, some of the replies take a week or so.
>
> Please tell me if Slack is not the right place to address my points. And,
> where I should go. 🙂
>
> Regards,
> Maxim
>


Re: [DISCUSS] Next release date

2023-02-28 Thread Mick Semb Wever
>
> We forked the 4.0 and 4.1 branches beginning of May. Unfortunately, for
> 4.1 we were only able to release GA in December which impacted how much
> time we could spend focussing on the next release and the progress that we
> could do. By consequence, I am wondering if it makes sense for us to branch
> 5.0 in May or if we should postpone that date.
>
> What is your opinion?
>


My initial preference is to stick with the May branch and its initial
alpha/beta release.

Giving in to the delays doesn't improve the causes of them.

We should focus on why it took 6 months to go from 4.1 first alpha to GA
and what happened inside that time window. I'm not convinced summer
holidays can be to blame for. I think a lack of QA/CI and folk dedicating
time to get it to GA is the bigger problem.

On the QA/CI front I believe we have made significant improvements already.
And we saw less releases of 4.1 before its GA. I also think reducing the
focus and scope of the subsequent release cycle is a cost that creates the
correct incentive, so long as we share the burden of the stabilising_to_GA
journey. While it might be difficult for folk to commit their time over
summer holidays, the three months of May-July should be way more than
enough if we are serious about it.

My thoughts don't touch on CEPs inflight. But my feeling is this should not
be about what we want to "squeeze in" (which only makes the problem worse),
rather whether the folk that are offering their time to stabilise to GA
have a preference for May-July or their September-November.

"Postponing" suggests a one-off move, but I'm presuming this would be a
permanent change?


Re: [DISCUSS] Next release date

2023-03-01 Thread Mick Semb Wever
>
> My thoughts don't touch on CEPs inflight.
>



For the sake of broadening the discussion, additional questions I think
worthwhile to raise are…

1. What third parties, or other initiatives, are invested and/or working
against the May deadline? and what are their views on changing it?
  1a. If we push branching back to September, how confident are we that
we'll get to GA before the December Summit?
2. What CEPs look like not landing by May that we consider a must-have this
year?
  2a. Is it just tail-end commits in those CEPs that won't make it? Can
these land (with or without a waiver) during the alpha phase?
  2b. If the final components to specified CEPs are not
approved/appropriate to land during alpha, would it be better if the
project commits to a one-off half-year release later in the year?


Re: [EXTERNAL] [DISCUSS] Next release date

2023-03-07 Thread Mick Semb Wever
On Tue, 7 Mar 2023 at 11:20, Sam Tunnicliffe  wrote:

> Currently, we anticipate CEP-21 being in a mergeable state around late
> July/August.
>


For me this is a more important reason to delay the branch date than
CEP-15, it being such a foundational change. Also, this is the first
feedback we've had that any CEP won't land by May.

Thank you Sam (and German) for the directness in your posts.

My concern remaining is the unknown branch to GA time, and the real risk of
not seeing a GA release (with highly anticipated features) landing this
year. I hope that delaying the branch date is accompanied with broad
commitments to fixing flakies, improving QA/CI, everything etc etc, so our
hope of a 2 month GA journey and a more stable trunk is realised.


Re: Removal of DateTieredCompactionStrategy in 5.0

2023-03-07 Thread Mick Semb Wever
>
> Are people OK with the removal of DTCS in 5.0?
>


Yes.


Re: [EXTERNAL] Re: [DISCUSS] Next release date

2023-03-09 Thread Mick Semb Wever
>
> I've also found some useful Cassandra's JIRA dashboards for previous
> releases to track progress and scope, but we don't have anything
> similar for the next release. Should we create it?
> Cassandra 4.0GAScope
> Cassandra 4.1 GA scope
>


https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484


Re: Role of Hadoop code in Cassandra 5.0

2023-03-09 Thread Mick Semb Wever
On Thu, 9 Mar 2023 at 18:54, Brandon Williams  wrote:

> I think if we reach consensus here that decides it. I too vote to
> deprecate in 4.1.x.  This means we would remove it in 5.0.
>


+1


Re: [EXTERNAL] Re: [DISCUSS] Next release date

2023-03-09 Thread Mick Semb Wever
>
> One place we've been weak historically is in distinguishing between
> tickets we consider "nice to have" and things that are "blockers". We don't
> have any metadata that currently distinguishes those two, so determining
> what our burndown leading up to 5.0 looks like is a lot more data massaging
> and hand-waving than I'd prefer right now.
>


We distinguish "blockers" with `Priority=Urgent` or `Severity=Critical`, or
by linking the ticket as blocking to a specific ticket that spells it out.
We do have the metadata, but yes it requires some work…

The project previously made an agreement to one release a year, akin to a
release train model, which helps justify why fixVersion 5.x has just fallen
to be "next". (And then there is no "burn-down" in such a model.)

Our release criteria, especially post-branch, demonstrates that we do
introduce and rely on "blockers". If we agree that certain exceptional CEPs
are "blockers", a la warrant delaying the release date, using this approach
seems to fit in appropriately.

When I (just) folded fixVersion 4.2 into 5.0 (and 4.x into 5.x), I also
created 5.1.x and 6.x.  I (and others) wish to do the exercise of running
through our 5.x list and pushing out everything we can see with no
commitment or activity (and also closing out old and now
irrelevant/inapplicable tickets) (and this will be done via a proposed
filter). But a question here is the fixVersion can infer where a ticket can
be applied (appropriateness) or where we foresee it landing (roadmap). For
example we mark bugs with the fixVersions ideally they should be applied
to, regardless of whether anyone comes to address them or not.


Re: [EXTERNAL] Re: [DISCUSS] Next release date

2023-03-09 Thread Mick Semb Wever
> > > One place we've been weak historically is in distinguishing between 
> > > tickets we consider "nice to have" and things that are "blockers". We 
> > > don't have any metadata that currently distinguishes those two, so 
> > > determining what our burndown leading up to 5.0 looks like is a lot more 
> > > data massaging and hand-waving than I'd prefer right now.
> >
> > We distinguish "blockers" with `Priority=Urgent` or `Severity=Critical`, or 
> > by linking the ticket as blocking to a specific ticket that spells it out. 
> > We do have the metadata, but yes it requires some work…
>
> For everything not urgent or a blocker, does it matter whether something has 
> a fixver of where we think it's going to land or where we'd like to see it 
> land? At the end of the day, neither of those scenarios will actually shift a 
> release date if we're proactively putting "blocker / urgent" status on new 
> features, improvements, and bugs we think are significant enough to delay a 
> release right?


Ooops, actually we were using the -beta, and -rc fixVersion
placeholders to denote the blockers once "the bridge was crossed"
(while Urgent and Critical is used more broadly, e.g. patch releases).
If we use this approach, then we could add a 5.0-alpha placeholder
that indicates a consensus on tickets blocking the branching (if we
agree alpha1 should be cut at the same time we branch…). IMHO such
tickets should also still be marked as Urgent, but I suggest we use
Urgent/Critical as an initial state, and the fixVersion placeholders
where we have consensus or it is according to our release criteria
:shrug:


[DISCUSS] New dependencies with Chronicle-Queue update

2023-03-13 Thread Mick Semb Wever
JDK17 requires us to update our chronicle-queue dependency: CASSANDRA-18049

We use chronicle-queue for both audit logging and fql.

This update pulls in a number of new transitive dependencies.

affinity-3.23ea1.jar
asm-analysis-9.2.jar
asm-commons-9.2.jar
asm-tree-9.2.jar
asm-util-9.2.jar
jffi-1.3.9.jar
jna-platform-5.5.0.jar
jnr-a64asm-1.0.0.jar
jnr-constants-0.10.3.jar
jnr-ffi-2.2.11.jar
jnr-x86asm-1.0.2.jar
posix-2.24ea4.jar


More info here:
https://issues.apache.org/jira/browse/CASSANDRA-18049?focusedCommentId=17699393&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17699393


Objections?


[DISCUSS] Lift MessagingService.minimum_version to 40 in trunk

2023-03-13 Thread Mick Semb Wever
If we do not recommend and do not test direct upgrades from 3.x to
5.x, we have the opportunity to clean up a fair chunk of code by
making `MessagingService.minimum_version=40`

As Cassandra versions 4.x and  5.0 are all on
`MessagingService.current_version=40` this would mean lifting
MessagingService.minimum_version would make it equal to the
current_version.

Today already we don't allow mixed-version streaming.  The only
argument I can see for keeping minimum_version=30 is for supporting
non-streaming messages between 3.x and 5.0 nodes, which I can't find a
basis for.

An _example_ of the code that can be cleaned up is in the patch
attached to the ticket:
CASSANDRA-18314 – Lift MessagingService.minimum_version to 40

What do you think?


Re: [DISCUSS] New dependencies with Chronicle-Queue update

2023-03-13 Thread Mick Semb Wever
On Mon, 13 Mar 2023 at 16:39, Jeremiah D Jordan 
wrote:

> Given we need to upgrade to support JDK17 it seems fine to me.  The only
> concern I have is that some of those libraries are already pretty old, for
> example the most recent jna-platform is 5.13.0 and 5.5.0 is almost 4 years
> old.
>


Good catch. I've updated the transitive dependencies to their latest.
(Taking this approach is kinda unfortunate, as pinning the transitive
dependency versions requires declaring them explicitly.)

Note, the introduction of jnr-ffi, jffi, and openhft:posix  introduces
platform/machine dependent differences as native libraries are taken
advantage of (when available). While we don't have a choice (the
alternative would be to rewrite the o.a.c.utils.binlog package without
chronicle-queue), it's still worth raising attention to.

The new transitive dependencies are now:

 affinity-3.23ea1.jar
 asm-analysis-9.4.jar
 asm-commons-9.4.jar
 asm-tree-9.4.jar
 asm-util-9.4.jar
 jffi-1.3.11-native.jar
 jffi-1.3.11.jar
 jna-platform-5.13.0.jar
 jnr-a64asm-1.0.0.jar
 jnr-constants-0.10.4.jar
 jnr-ffi-2.2.13.jar
 jnr-x86asm-1.0.2.jar
 posix-2.24ea4.jar


[DISCUSS] Drop support for sstable formats m* (in trunk)

2023-03-13 Thread Mick Semb Wever
If we do not recommend and do not test direct upgrades from 3.x to
5.x, we can clean up a fair bit by removing code related to sstable
formats m*, as Cassandra versions 4.x and  5.0 are all on sstable
formats n*.

We don't allow mixed-version streaming, so it's not possible today to
stream any such older sstable format between nodes. This
compatibility-break impacts only node-local and/or offline.

Some arguments raised to keep m* sstable formats are:
 - offline cluster upgrade, e.g. direct from 3.x to 5.0,
 - single-invocation sstableupgrade usage
 - third-party tools based on the above

Personally I am not in favour of keeping, or recommending users use,
code we don't test.

An _example_ of the code that can be cleaned up is in the patch
attached to the ticket:
CASSANDRA-18312 – Drop support for sstable formats before `na`

What do you think?


Re: [DISCUSS] New dependencies with Chronicle-Queue update

2023-03-16 Thread Mick Semb Wever
>  asm-analysis-9.4.jar
>  asm-commons-9.4.jar
>  asm-tree-9.4.jar
>  asm-util-9.4.jar


FYI, on further inspection of the posix dependency, i've excluded
these four asm* dependencies.


Re: [VOTE] Release Apache Cassandra 4.1.1

2023-03-17 Thread Mick Semb Wever
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>



+1

Checked
- signing correct
- checksums are correct
- source artefact builds (JDK 8+11)
- binary artefact runs (JDK 8+11)
- debian package runs (JDK 8+11)
- debian repo runs (JDK 8+11)
- redhat* package runs (JDK 8+11)
- redhat* repo runs (JDK 8+11)


Re: [DISCUSS] Drop support for sstable formats m* (in trunk)

2023-03-17 Thread Mick Semb Wever
Ok ok, there's a number of strong arguments to keep sstable formats around
for much longer than the previous major Cassandra version, I will unset
fixVersion on 18312  :-)


Taking a look at the history of sstable formats. They were first introduced
in version 0.7, and minor versions introduced in version 1.0.3 with hb.

Looking at when we have dropped support and cleaned up the code for past
formats.

 - Versions before 1.2.5: formats <=ib; were removed in CASSANDRA-5511
https://github.com/apache/cassandra/commit/7f2c3a8e40f97c626def5c510d77c1da3d9ae926

 - Version 1.2.5: format ic; were remove in CASSANDRA-6869
https://github.com/apache/cassandra/commit/8e172c8563a995808a72a1a7e81a06f3c2a355ce

 - All pre-3.0 formats were removed in CASSANDRA-12716
https://github.com/apache/cassandra/commit/4a2464192e9e69457f5a5ecf26c094f9298bf069


Saying that dropping the n* formats right now is such a small reduction in
code, roughly double the size of 6869's patch, I agree with.  Saying that
there is never any complexity and we should keep formats in perpetuity, and
I'm sitting here having a heart attack, srsly.  I can also appreciate
coming up with a good rule of thumb in advance is difficult when we just
don't know how many formats there will be and what they will introduce.


>From Aleksey:
> But it’s one thing to require a two rolling restarts (3.0 to 4.0, 4.0 to
5.0), it’s another to require the operator to upgrade every single m*
sstable to n*.


Good point.

Though I _always_ recommend users upgrade all sstables, before and after
every major upgrade.  But I recognise how easy it is to forget or err in
that process, and we don't need to punish operators unnecessarily.  Also
worth noting since 4.x we have `automatic_sstable_upgrade` (which is wisely
false by default).

Question/Suggestion: should we improve gossip to include what the oldest
format a node has, and ensure newer versioned node joining fail/warn if it
does not support that older format?  That is, should we give a clear signal
back to operators that their rolling upgrade is not going to work smoothly,
that they are going to hit nodes they will need to stop and do
upgradesstables on (leaving them in a state of mix-versions and nodes busy
upgrading…)


>From Scott:

> To expand on the final point he makes re: requiring SSTables be fully
> rewritten prior to rev'ing from 4.x to 5.x (if the cluster previously ran
> 3.x) –
>
> This would also invalidate incremental backups. Operators would either be
> required to perform a full snapshot backup of each cluster to object
> storage prior to upgrading from 4.x to 5.x; or to enumerate the contents of
> all snapshots from an incremental backup series to ensure that no m*-series
> SSTables were present prior to upgrading.
>
> If one failed to take on the work to do so, incremental backup snapshots
> would not be restorable to a 5.x cluster if an m*-series SSTable were
> present.
>
>
Again, I would always recommend a backup before each major upgrade, and I
would think this has become standard advice.  On sstables residing in
storage, and the need to do a full backup, that's another good point, but
which I think we might solve in a smarter way (see below).


>From Aleksey:

> 2. It’s very stable and battle tested at this point
>
>

I beg to differ on this. We don't test it, and upgrade code gets limited
production time.  And I bet operators are less incentivised to file bug
reports on upgrade issues so long as they get through the upgrade one way
or another (and I bet many issues pop up why too late, like the numerous
range tombstone issues over many 3.11.x versions).

We could be testing it more, and IMHO we should…



> 5. There are third-party tools that I know of which benefit from a single
> C* jar that can read all relevant stable versions, and relevant here
> includes 3.0 ones
>
>

I suggest we should have a way to read/write from/to all sstable versions,
I absolutely agree this is useful (e.g. backups in storage). And we should
be better at thorough testing.

With such use-cases only applying only to node-local and offline scenarios,
we can tackle this cross-branch, i.e. take the best of both worlds: simpler
_tested_ code, and forward (and hopefully backward) compatibility _into
perpetuity_.

One example of this is if we could stream sstableupgrades, e.g.
```
   # read from disk any l* sstables, write to disk latest m format
   sstableupgrade-3.11 --stream-output -f jb-1-big-Data.db  |
sstableupgrade-5.0 --stream-input
```
Sure, this is no longer "single C* jar", but that seems a minor trade-off
to get something better. The idea of cross-branch functionality and testing
is nothing new to us (e.g. jvm dtests). Note, this approach would likely be
slower unless you threw cpu+mem at it. And it is applicable regardless of
what the format compatibility policy we decide…

The suggestion, even if it's only a strawman, raises some other questions …

- Why doesn't sstableupgrade today upgrade sstables in parallel, o

Re: [DISCUSS] Drop support for sstable formats m* (in trunk)

2023-03-17 Thread Mick Semb Wever
On Fri, 17 Mar 2023 at 17:24, Brandon Williams  wrote:

> On Fri, Mar 17, 2023 at 9:25 AM Mick Semb Wever  wrote:
> > Question/Suggestion: should we improve gossip to include what the oldest
> format a node has, and ensure newer versioned node joining fail/warn if it
> does > not support that older format?  That is, should we give a clear
> signal back to operators that their rolling upgrade is not going to work
> smoothly, that they are > going to hit nodes they will need to stop and do
> upgradesstables on (leaving them in a state of mix-versions and nodes busy
> upgrading…)
>
> We already have this (even in 3.0!) to facilitate dropping compact
> storage:
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/gms/ApplicationState.java#L59



Nice!
So is there appetite for such a patch to fail or warn (guardrail?) to
prevent a node running on a new version that does not support sstable
formats existing on other nodes?


Welcome our next PMC Chair Josh McKenzie

2023-03-23 Thread Mick Semb Wever
It is time to pass the baton on, and on behalf of the Apache Cassandra
Project Management Committee (PMC) I would like to welcome and congratulate
our next PMC Chair Josh McKenzie (jmckenzie).

Most of you already know Josh, especially through his regular and valuable
project oversight and status emails, always presenting a balance and
understanding to the various views and concerns incoming.

Repeating Paulo's words from last year: The chair is an administrative
position that interfaces with the Apache Software Foundation Board, by
submitting regular reports about project status and health. Read more about
the PMC chair role on Apache projects:
- https://www.apache.org/foundation/how-it-works.html#pmc
- https://www.apache.org/foundation/how-it-works.html#pmc-chair
- https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers

The PMC as a whole is the entity that oversees and leads the project and
any PMC member can be approached as a representative of the committee. A
list of Apache Cassandra PMC members can be found on:
https://cassandra.apache.org/_/community.html


Re: [EXTERNAL] Re: [DISCUSS] Next release date

2023-03-24 Thread Mick Semb Wever
> I would like to propose a partial freeze of 5.0 in June.
>
…
>
This partial freeze will be valid for every new feature except CEP-21 and
> CEP-15.
>


+1

Thanks for summarising the thread this way Benjamin. This addresses my two
main concerns: letting the branch/release date slip too much into the
unknown, squeezing GA QA efforts, while putting in place exceptional
waivers for CEP-21 and CEP-15.

I hope that in the future we will be more willing to commit to the release
train model: less concerned about "what the next release contains"; more
comfortable letting big features land where they land. But this is opinion
and discussion for another day… possibly looping back to the discussion on
preview releases…


Do we have yet from anyone a (rough) eta on CEP-15 (post CEP-21) landing in
trunk?


Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-03-25 Thread Mick Semb Wever
>
> Here comes Cassandra CI status for  2023-3-13 - 2023-23-179 :
>
> *** CASSANDRA-18338
>  -  
> dtest.bootstrap_test.TestBootstrap.test_cleanup
> trunk
> ***  CASSANDRA-18338
>  - test:
> org.apache.cassandra.distributed.test.ByteBuddyExamplesTest.countTest ,
> this failed twice with jdk 8 and jdk 11, on trunk and  4.1
> others are some timeout exception.
>


New failures from Week 12
*** A possible regression from CASSANDRA-18328 on 2.x to 3.x dtest upgrades

otherwise all test failures are timeouts.

We need volunteers for the Build Lead the weeks ahead.


Re: [EXTERNAL] [DISCUSS] Next release date

2023-03-31 Thread Mick Semb Wever
> We have a lot of significant features that have or will land soon and our
> experience suggests that those merges usually bring their set of
> instabilities. The goal of the proposal was to make sure that we get rid of
> them before TCM and Accord land to allow us to more easily identify the
> root causes of problems.




Agree with Benjamin that testing in phases, i.e. separate periods of time,
has positives that we can take advantage of.



a) do we have test failures on circle on trunk right now, and
> b) do we have regressions on trunk on ASF CI compared to 4.1
>
> Whether or not new features land near the cutoff date or not shouldn't
> impact the above right?
>


I don't think it can be limited to the above. They are our minimum
requirements to getting to beta, to rc, and to GA. But in practice we wait
and receive bug reports from downstream testing efforts. Such testing isn't
necessarily possible pre-commit, e.g. third-party and not feasible to
continuously run, nor appropriate to upstream/open-source.

We want GA releases to be production ready for any cluster at any scale. So
I guess in practice for us Stable Trunk != GA, but that's ok – just being
honest to the ideal we are moving towards.


Re: [EXTERNAL] [DISCUSS] Next release date

2023-04-02 Thread Mick Semb Wever
>
> I'd be happier with something concrete like the following expected release
> flow:
>
> 1) We freeze a branch
> 2) To hit RC, we need green circle + no regression on ASF (or green ASF in
> the future when stable)
> 3) We need N weeks in this frozen state for people to test it out
> 4) Once we have both 2 and 3, we RC and GA
>


Yeah, I was thinking (1) would include the beta1 release if we're already
green, i.e. meaning we'd skip (2), or alpha1 if not yet green.
3) would still hold, but would be N weeks from first beta to first rc.

That is, something like…

1) branch. if green cut beta1 else cut alpha1
  1a) when green then cut beta1
2) wait N weeks from beta1. if no blockers cut rc1
3) wait 2 weeks. if no blockers cut GA

As evident from both 4.0 and 4.1  the alpha to beta timeframe hurts, and
our Stable Trunk (and CI) efforts are to minimise/remove this.  That is,
this incentivises us to get on top of CI issues and flakies ahead of branch
time.


Is it too prescriptive to say "we'll be frozen on a branch for at least 8
> weeks so folks can test out the betas"? (I ask because I know I can get a
> little "structure-happy" at times).
>


6-8 weeks feels right, if we want to be prescriptive. And there needs to be
a sense of urgency when we make this call to action to downstream testers.
As a release manager I know that an error margin of two weeks is typical.


Re: [VOTE] Release Apache Cassandra 4.0.9

2023-04-04 Thread Mick Semb Wever
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>


+1

Checked
- signing correct
- checksums are correct
- source artefact builds (JDK 8+11)
- binary artefact runs (JDK 8+11)
- debian package runs (JDK 8+11)
- debian repo runs (JDK 8+11)
- redhat* package runs (JDK 8+11)
- redhat* repo runs (JDK 8+11)


Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-06 Thread Mick Semb Wever
>
> Something like "TABLESPACE" or 'TABLEGROUP" would *theoretically* better
> satisfy point 1 and 2 above but subjectively I kind of recoil at both
> equally. So there's that.
>



TABLEGROUP would work for me.  Immediately intuitive.

brain-storming…

A keyspace today defines replication strategy, rf, and durable_writes. If
they also had the table options that could be defined as defaults for all
tables in that group, and one tablegroup could be a child and inherit
settings from another tablegroup, you could logically group tables in ways
that both benefit your application platform's taxonomy and the spread of
keyspace/table settings. DATABASE, NAMESPACE, whatever, can be aliases to
it too, if you like.


Re: [VOTE] Release Apache Cassandra 4.0.9

2023-04-06 Thread Mick Semb Wever
> Up to you to fail the vote and we realistically release 4.0.9 after Easter
>


-1 to the vote.

I support your initial veto and reasoning, and it appears you are willing
to recut once 18429 is resolved.


Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-06 Thread Mick Semb Wever
> … but that should be a different discussion about how we evolve config.
>


I disagree. Nomenclature being difficult can benefit from holistic and
forward thinking.

Sure you can label this off-topic if you like, but I value our discuss
threads being collaborative in an open-mode. Sometimes the best idea is on
the tail end of a sequence of bad and/or unpopular ideas.


Re: [VOTE] CEP-26: Unified Compaction Strategy

2023-04-06 Thread Mick Semb Wever
+1

On Thu, 6 Apr 2023 at 19:32, Francisco Guerrero  wrote:

> +1 (nb)
>
> On 2023/04/06 17:30:37 Josh McKenzie wrote:
> > +1
> >
> > On Thu, Apr 6, 2023, at 12:18 PM, Joseph Lynch wrote:
> > > +1
> > >
> > > This proposal looks really exciting!
> > >
> > > -Joey
> > >
> > > On Wed, Apr 5, 2023 at 2:13 AM Aleksey Yeshchenko 
> wrote:
> > > >
> > > > +1
> > > >
> > > > On 4 Apr 2023, at 16:56, Ekaterina Dimitrova 
> wrote:
> > > >
> > > > +1
> > > >
> > > > On Tue, 4 Apr 2023 at 11:44, Benjamin Lerer 
> wrote:
> > > >>
> > > >> +1
> > > >>
> > > >> Le mar. 4 avr. 2023 à 17:17, Andrés de la Peña <
> adelap...@apache.org> a écrit :
> > > >>>
> > > >>> +1
> > > >>>
> > > >>> On Tue, 4 Apr 2023 at 15:09, Jeremy Hanna <
> jeremy.hanna1...@gmail.com> wrote:
> > > 
> > >  +1 nb, will be great to have this in the codebase - it will make
> nearly every table's compaction work more efficiently.  The only possible
> exception is tables that are well suited for TWCS.
> > > 
> > >  On Apr 4, 2023, at 8:00 AM, Berenguer Blasi <
> berenguerbl...@gmail.com> wrote:
> > > 
> > >  +1
> > > 
> > >  On 4/4/23 14:36, J. D. Jordan wrote:
> > > 
> > >  +1
> > > 
> > >  On Apr 4, 2023, at 7:29 AM, Brandon Williams 
> wrote:
> > > 
> > >  
> > >  +1
> > > 
> > >  On Tue, Apr 4, 2023, 7:24 AM Branimir Lambov 
> wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I would like to put CEP-26 to a vote.
> > > >
> > > > Proposal:
> > > >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy
> > > >
> > > > JIRA and draft implementation:
> > > > https://issues.apache.org/jira/browse/CASSANDRA-18397
> > > >
> > > > Up-to-date documentation:
> > > >
> https://github.com/blambov/cassandra/blob/CASSANDRA-18397/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md
> > > >
> > > > Discussion:
> > > > https://lists.apache.org/thread/8xf5245tclf1mb18055px47b982rdg4b
> > > >
> > > > The vote will be open for 72 hours.
> > > > A vote passes if there are at least three binding +1s and no
> binding vetoes.
> > > >
> > > > Thanks,
> > > > Branimir
> > > 
> > > 
> > > >
> > >
>


Re: [VOTE] Release Apache Cassandra 4.0.9 - SECOND ATTEMPT

2023-04-11 Thread Mick Semb Wever
On Tue, 11 Apr 2023 at 10:12, Berenguer Blasi 
wrote:

> +1
>
> On 11/4/23 9:54, Miklosovic, Stefan wrote:
> …
> > The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.



+1


Re: (CVE only) support for 3,11 beyond published EOL

2023-04-13 Thread Mick Semb Wever
>
> There have been several discussions on slack [1], [2] to support 3.11 beyond 
> the date stated on the web [3] which is May-July 23 and given it's April 
> that's an unlikely date.
>


Strictly speaking it is maintained until the 5.0 GA release. We should
update the downloads page accordingly.


>
> So we will support anyway but I would like to start a broader discussion if 
> we, the community, are interested in at a minimum CVE only support, maybe bug 
> fixes as well,  after 5.0 is released for 3.11 and if so for how long - 
> something like a Cassandra LTS policy.
>



The community's resources are limited, and the statement is intended
to avoid tying up resources and to avoid letting users down. This is
open source and "to upgrade" is often our easy and pragmatic answer.

It is not a statement that fixes to older branches will be rejected. A
(two) committers can still push to older branches, and a release can
still happen if you find someone to do it (and three PMCs to +1 it).
This is why the 2.2 branch is still present on ci-cassandra.a.o. If
vendors want to provide support for versions longer and can make the
commitment to upstream those efforts (whether that's bug-fixes and
releases, or only bug-fixes) the machinery is in place to accept it.

We already have an understanding and precedence in place that CVEs on
the previous unmaintained branch are addressed and released.


Re: [EXTERNAL] Re: (CVE only) support for 3,11 beyond published EOL

2023-04-13 Thread Mick Semb Wever
>
> Yes, this would be great. Right now users are confused what EOL means and
> what they can expect.
>
>

I think the project would need to land on an agreed position.  I tried to
find any reference to my earlier statement around CVEs on the latest
unmaintained branch but could not find it (I'm sure it was mentioned
somewhere :(

How many past branches?  All CVEs?  What if CVEs are in dependencies?
And is this a slippery slope, will such a formalised and documented
commitment lead to more users on EOL versions? (see below)
How do other committers feel about this?


I am also asking specifically for 3.11 since this release has been around
> so long that it might warrant longer support than what we would offer for
> 4.0.
>
>

This logic can also be the other way around :-)

We should be sending a clear signal that OSS users are expected to perform
a major upgrade every ~two years.  Vendors can, and are welcome to solve
this, but the project itself does not support any user's production system,
it only maintains code branches and performs releases off them, with our
focus on quality solely on those maintained branches.


Re: [DISCUSS] Next release date

2023-04-16 Thread Mick Semb Wever
>
>> We have a lot of significant features that have or will land soon and our 
>> experience suggests that those merges usually bring their set of 
>> instabilities. The goal of the proposal was to make sure that we get rid of 
>> them before TCM and Accord land to allow us to more easily identify the root 
>> causes of problems.
>
>
> Agree with Benjamin that testing in phases, i.e. separate periods of time, 
> has positives that we can take advantage of.
>


Where did we land on this?

With the following intentions:
 - moving towards the goal of annual releases, with a cadence 12±3 months apart,
 - the branch to GA period being 2-3 months,
 - avoiding any type of freeze on trunk,
 - getting a release out by December's Summit,
 - freeing up folk to start QA (that makes sense to start) immediately

;I'm going to suggest the following…

 1. Once all CEPs except CEP-21 and CEP-15 land we branch cassandra-5.0,
1a. QA starts on cassandra-5.0,
1b. CEP-21 and CEP-15 are waivered to land in cassandra-5.0, and
forward-merge to trunk,

 2. When CEP-15 lands we cut alpha1,
2a. The deadline is first week of October, anything not yet in
cassandra-5.0 is not in 5.0,
2b. We expect a minimum two months of testing and beta+rc releases
to get to GA.


Additional notes,
 1) "Once all CEPs" includes jdk17 and extending TTL tickets.
 1) We ask folk to be considerate of what they commit in trunk wrt to
the inbound CEP-21.
 1a) There's an understanding of what needs to be re-tested after CEP-21.
 2) The initial release may be beta1, we make that call at that time.
 2a) features not complete can still be in 5.0 as experimental and not
enabled by default.
 2b) If CEP-15 lands Aug/Sept, then the earliest possible GA release
date is in October.

I feel this proposal will give us evidence and help put us back on
track for a release train model with a shorter QA2GA period, and
aiming for a 5.1 release a bit earlier in the 2024 year (e.g. Q3).

If we agree on this proposal I will update our downloads page (ref
German's thread).


Re: [DISCUSS] Next release date

2023-04-17 Thread Mick Semb Wever
>
> 2. When CEP-15 lands we cut alpha1,
> 2a. The deadline is first week of October, anything not yet in
> cassandra-5.0 is not in 5.0,
> 2b. We expect a minimum two months of testing and beta+rc releases
> to get to GA.
>
> To clarify, is the intent here to say "The deadline for cutoff is 1st week
> of October for everything, including CEP-15"? Or is the intent to say "we
> don't cut alpha1 until CEP-15 lands"?
>


The former. The first week of October will be the full feature freeze on
the cassandra-5.0 branch.


Re: [DISCUSS] Next release date

2023-04-17 Thread Mick Semb Wever
On Mon, 17 Apr 2023 at 19:31, Caleb Rackliffe 
wrote:

> ...or just cutting a 5.0 branch when CEP-21 is ready.
>
> There's nothing stopping us from testing JDK17 and TTL bits in trunk
> before that.
>
> On Mon, Apr 17, 2023 at 11:25 AM Caleb Rackliffe 
> wrote:
>
>> > Once all CEPs except CEP-21 and CEP-15 land we branch cassandra-5.0
>>
>> For the record, I'm not convinced this is necessarily better than just
>> cutting a cassandra-5.0 branch on 1 October.
>>
>

How else would this work without being akin to a feature freeze on trunk.

We want (need) as much time as possible to test. We have no evidence that
it will be quicker than 4.1, we have to create that evidence. Those folk
that free up and are ready to get ahead and de-risk our testing efforts
should be given a release branch to make their work easier and to give us
that evidence in a more controlled manner so that we can plan better next
time. Appreciate that there's one too many variables here, but I'm sticking
up for the testing efforts here.


Re: [DISCUSS] Next release date

2023-04-17 Thread Mick Semb Wever
> b.) Cut a 5.0 branch when the major release-defining element (maybe
> CEP-21?) merges to trunk, with the shared understanding (possibly what
> we're disagreeing about) that very little of what we need to test/de-risk
> is going to be inhibited by not cutting that branch earlier (and that
> certain testing efforts would be almost wholesale wasted if done
> beforehand).
>

Yup, it's (b) for me, and everything minus 21 and 15 is defining enough to
warrant the branching and a checkpoint where testing can start and not be
wasted.  I understand that cep-21 changes a lot and that impacts testing,
but I wholeheartedly trust testers to be taking this into consideration.


Re: [DISCUSS] Next release date

2023-04-17 Thread Mick Semb Wever
>
> My personal .02: I think we should consider branching 5.0 September 1st.
> That gives us basically 12 weeks for folks to do their testing and for us
> to stabilize anything that's flaky in circle or regressed in ASF CI.
>


I'm not for this, sorry. I see the real risk here of there being no GA
release this year.

My proposal was based on reading through the thread and gathering what I
saw to be the best middle ground for everyone. It's not my first choice,
but as a middle ground I can accept it.

Caleb, you appear to be the only one objecting, and it does not appear that
you have made any compromises in this thread. Can I ask that you do?  I
(and others) do see that letting testing start as soon as it can, where
they can, as an important tactic to de-risking an important 5.0 release.


Re: [DISCUSS] Next release date

2023-04-20 Thread Mick Semb Wever
Thanks Caleb and JD, I'm keen to see us move forward.

Josh,
 I (and others) have expressed concern about trying to use dates, and
stating periods of time to achieve X, to work backwards from a desired GA
date.  Dates always slip, and we don't have the evidence (beyond
the extreme of 6 months for 4.1).  I feel this approach and such a
discussion will be far more appropriate and productive next year.

But that said, your breakdown is awesome, and the use of dates to specify
the _latest_ branch date works for me.

I would like to add that branching earlier than August 1st isn't just
impacting cep-21+15 engineers, but also all other engineers that have freed
up and where they want to start validation testing (i.e. again their desire
too for a stable base).

To my understanding the jdk17 work is still taking time, so I really don't
think we'll be much earlier than the 1st August if at all. Let's discuss it
again when it looks like we are approaching being ready to branch.



On Wed, 19 Apr 2023 at 17:13, Josh McKenzie  wrote:

> Let me try to break this down another way:
>
> I see a few competing concerns, each with QA related time requirements
> (asserting 8 weeks minimum, 16 weeks maximum we should plan for to
> stabilize a GA):
>
>1. A freeze to a branch to stabilize for release (8-16 weeks of QA
>required after we branch)
>2. A freeze to a branch to make room for large complex work to have
>increased velocity on merge due to having a more stable destination (8-16
>weeks of QA required after they merge)
>3. A commitment to release once a year (for our purposes, we've
>defined this as calendar year) (8-16 weeks of QA required *before*).
>
> If we walk backwards from Dec 1, that means our latest date to freeze and
> validate a 5.0 branch would be Friday August 11; let's go with 1st Friday
> in August for simplicity, 2023-08-04. That would give us just over 16 weeks
> worst-case to stabilize.
>
> So we branch for 5.0 *at the latest* on 2023-08-04; I think we can all
> agree on this?
>
> So the next question: when do we branch for 5.0 *at the earliest*? Pros
> and cons of an earlier branch:
> Pros:
>
>- Earlier start of validation testing on a more stable base (no
>improvements or new features excepting CEP-15 and CEP-21)
>- Theoretically higher velocity of completion of CEP-15 and CEP-21
>(the team doing this can speak to the degree to which this is true)
>
> Cons:
>
>- Smaller amount of improvements and new features go into 5.0
>- The rest of the dev community has another branch they need to target
>with bugfixes (annoying but not _too_ costly since bugfixes are often a bit
>smaller in scope)
>
>
> Through this lens, we are weighing the belief that CEP-15 and CEP-21 will
> land by August 1st and be accelerated by branching early against the belief
> that other improvements and features will go in if we branch later; if we
> freeze today and neither CEP-15 nor CEP-21 land for unforeseen reasons, we
> will have a GA release that had a shortened amount of time for new features
> and improvements to be merged in.
>
> Lastly, as input data to the discussion, here's a list of all the new
> features and improvements in 5.0 as of today; hypothetically were we to
> freeze 5.0 today and worst-case unforeseen things lead to CEP-21 and CEP-15
> not landing by cutoff, this would be our feature-set for our next GA:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20fixversion%20%3D%205.0%20and%20fixversion%20not%20in%20(4.0.x%2C%204.1.x%2C%204.0%2C%204.1%2C%204.1.1%2C%204.1.2%2C%204.0.8%2C%204.1-alpha1%2C%204.1-alpha1%2C%204.1-beta2%2C%204.1-beta1%2C%204.1-rc1)%20and%20type%20in%20(%22New%20Feature%22%2C%20%22Improvement%22)%20and%20component%20!%3D%20Accord%20order%20by%20type%20desc%2C%20resolved%20desc
>
> Phew. Ok, so using the above framework, I'm personally ok with us freezing
> 5.0 earlier than August 1st if the engineers actively on CEP-15 and CEP-21
> indicate that it will appreciably increase their velocity. The list of
> improvements and features is substantial enough that an earlier freeze
> would still have enough in it to be "meaty" in my opinion; especially given
> the likelihood of CEP-25 (Trie-indexed SSTable format) landing relatively
> soon.
>
> So the next question to me is: "when"? On that I defer to Sam, Alex,
> Benedict, Blake, David, et. al: how much would freezing 5.0 early help in
> terms of your development velocity on TrM and Accord?
>
>
> On Wed, Apr 19, 2023, at 6:22 AM, Henrik Ingo wrote:
>
> I'm going to repeat the point from my own thread: rather than thinking of
> this as some kind of concession to two exceptional CEPs, could we rather
> take the point of view that they get their own space and time precisely
> because they are large and invasive and both the merge and testing of them
> will benefit from everything else in the branch quieting down?
>
> I'm also not particularly interested in a long featu

Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-25 Thread Mick Semb Wever
I was soo happy when I saw this, I know many users are going to be
thrilled about it.


On Wed, 26 Apr 2023 at 05:15, Patrick McFadin  wrote:

> Not sure if this is what you are saying, Josh, but I believe this needs to
> be its own CEP. It's a change in CQL syntax and changes how clusters
> operate. The change needs to be documented and voted on. Jonathan, you know
> how to find me if you want me to help write it. :)
>

I'd be fine with just a DISCUSS thread to agree to the CQL change, since
it: `DENSE FLOAT32` appears to be a minimal,  and the overall patch
building on SAI. As Henrik mentioned there's other SAI extensions being
added too without CEPs.  Can you elaborate on how you see this changing how
the cluster operates?

This will be easier to decide once we have a patch to look at, but that
depends on a CEP-7 base (e.g. no feature branch exists). If we do want a
CEP we need to allow a few weeks to get it through, but that can happen in
parallel and maybe drafting up something now will be valuable anyway for an
eventual CEP that proposes the more complete features (e.g.
cosine_similarity(…)).


Re: [EXTERNAL] Re: (CVE only) support for 3,11 beyond published EOL

2023-04-26 Thread Mick Semb Wever
On Sat, 15 Apr 2023 at 03:17, C. Scott Andreas  wrote:

> If there’s lack of clarity around EOL policy and dates, we should
> absolutely make this clear.
>


Fix is here:
https://github.com/thelastpickle/cassandra-website/tree/mck/update-5-0_dates_download_page


w/ html generated here:
https://raw.githack.com/thelastpickle/cassandra-website/mck/update-5-0_dates_download_page_generated/content/_/download.html


I'll merge this tomorrow if there's no further input.


Re: [DISCUSS] New data type for vector search

2023-04-26 Thread Mick Semb Wever
>
> My inclination then would be to say you declare an ARRAY (which
> is semantic sugar for FROZEN>). This is very consistent with
> our existing style. We then simply permit such columns to define ANN
> indexes.
>


So long as nulls aren't a problem as David questions, an alternative is:

 FLOAT[N] as semantic sugar for LIST

And ANN requiring FROZEN

Maybe taking a poll in a few days will be positive to keep this
moving forward.


Re: [DISCUSS] New data type for vector search

2023-05-01 Thread Mick Semb Wever
>
>
> > But suggesting that Jonathan should work on implementing general purpose
> arrays seems to fall outside the scope of this discussion, since the result
> of such work wouldn't even fill the need Jonathan is targeting for here.
>
> Every comment I have made so far I have argued that the v1 work doesn’t
> need to do some things, but that the limitations proposed so far are not
> real requirements; there is a big difference between what “could be
> allowed” and what is implemented day one… I am pushing back on what “could
> be allowed”, so far every justification has been that it slows down the ANN
> work…
>
> Simple examples of this already exists in C* (every example could be
> enhanced logically, we just have yet to put in the work)
>
> * updating a element of a list is only allowed for multi-cell
> * appending to a list is only allowed for multi-cell
> * etc.
>
> By saying that the type "shall not support", you actively block future
> work and future possibilities...
>



I am coming around strongly to the `VECTOR FLOAT[n]` option.

This gives Jonathan the simplest path right now with ths ANN work, while
also ensuring the CQL API gets the best future potential.

With `VECTOR FLOAT[n]` the 'vector' is the ml sugar that means non-null and
frozen, and that allows both today and in the future, as desired, for its
implementation to be entirely different to `FLOAT[n]`.  This addresses a
number of people's concerns that we meet ML's idioms head on.

IMHO it feels like it will fit into the ideal future CQL , where all `
primitive[N]` are implemented, and where we have VECTOR FLOAT[n] (and maybe
VECTOR BYTE[n]). This will also permit in the future `FROZEN`
if we wanted nulls in frozen arrays.

I think it is totally reasonable that the ANN patch (and Jonathan) is not
asked to implement on top of, or towards, other array (or other) new data
types.

I also think it is correct that we think about the evolution of CQL's API,
 and how it might exist in the future when we have both ml vectors and
general use arrays.


Re: [DISCUSS] New data type for vector search

2023-05-01 Thread Mick Semb Wever
Yes!  What you (David) and Benedict write beautifully supports `VECTOR
FLOAT[n]` imho.

You are definitely bringing up valid implementation details, and that can
be dealt with during patch review. This thread is about the CQL API
addition.

No matter which way the technical review goes with the implementation
details, `VECTOR FLOAT[n]` does not limit it, and gives us the most ML
idiomatic approach and the best long-term CQL API.  It's a win-win
situation – no matter how you look at it imho it is the best solution api
wise.

Unless the suggestion is that an ideal implementation can give us a better
CQL API – but I don't see what that could be.   Maybe the suggestion is we
deny the possibility of using the VECTOR keyword and bring us back to
something like `NON-NULL FROZEN`.   This is odd to me because
`VECTOR` here can be just an alias for `NON-NULL FROZEN` while meeting the
patch's audience and their idioms.  I have no problems with introducing
such an alias to meet the ML crowd.

Another way I think of this is
 `VECTOR FLOAT[n]` is the porcelain ML cql api,
 `NON-NULL FROZEN` and `FROZEN` and `FLOAT[n]` are the
general-use plumbing cql apis.

This would allow implementation details to be moved out of this thread and
to the review phase.




On Mon, 1 May 2023 at 20:57, David Capwell  wrote:

> > I think it is totally reasonable that the ANN patch (and Jonathan) is
> not asked to implement on top of, or towards, other array (or other) new
> data types.
>
>
> This impacts serialization, if you do not think about this day 1 you then
> can’t add later on without having to worry about migration and versioning…
>
> Honestly I wanted to better understand the cost to be generic and the
> impact to ANN, so I took
> https://github.com/jbellis/cassandra/blob/vsearch/src/java/org/apache/cassandra/db/marshal/VectorType.java
> and made it handle every requirement I have listed so far (size, null, all
> types)… the current patch has several bugs at the type level that would
> need to be fixed, so had to fix those as well…. Total time to do this was
> 10 minutes… and this includes adding a method "public float[]
> composeAsFloats(ByteBuffer bytes)” which made the change to existing logic
> small (change VectorType.Serializer.instance.deserialize(buffer) to
> type.composeAsFloats(buffer))….
>
> Did this have any impact to the final ByteBuffer?  Nope, it had identical
> layout for the FloatType case, but works for all types…. I didn’t change
> the fact we store the size (felt this could be removed, but then we could
> never support expanding the vector in the future…)
>
> So, given the fact it takes a few minutes to implement all these
> requirements, I do find it very reasonable to push back and say we should
> make sure the new type is not leaking details from a special ANN index…. We
> have spent more time debating this than it takes to support… we also have
> fuzz testing on trunk so just updating
> org.apache.cassandra.utils.AbstractTypeGenerators to know about this new
> type means we get type coverage as well…
>
> I have zero issues helping to review this patch and make sure the testing
> is on-par with existing types (this is a strong requirement for me)
>
>
> > On May 1, 2023, at 10:40 AM, Mick Semb Wever  wrote:
> >
> >
> > > But suggesting that Jonathan should work on implementing general
> purpose arrays seems to fall outside the scope of this discussion, since
> the result of such work wouldn't even fill the need Jonathan is targeting
> for here.
> >
> > Every comment I have made so far I have argued that the v1 work doesn’t
> need to do some things, but that the limitations proposed so far are not
> real requirements; there is a big difference between what “could be
> allowed” and what is implemented day one… I am pushing back on what “could
> be allowed”, so far every justification has been that it slows down the ANN
> work…
> >
> > Simple examples of this already exists in C* (every example could be
> enhanced logically, we just have yet to put in the work)
> >
> > * updating a element of a list is only allowed for multi-cell
> > * appending to a list is only allowed for multi-cell
> > * etc.
> >
> > By saying that the type "shall not support", you actively block future
> work and future possibilities...
> >
> >
> >
> > I am coming around strongly to the `VECTOR FLOAT[n]` option.
> >
> > This gives Jonathan the simplest path right now with ths ANN work, while
> also ensuring the CQL API gets the best future potential.
> >
> > With `VECTOR FLOAT[n]` the 'vector' is the ml sugar that means non-null
> and frozen, and that allows both today and in the future, as desired, for
> its imp

Re: [DISCUSS] New data type for vector search

2023-05-02 Thread Mick Semb Wever
I have no problem with `VECTOR` hanging around forever as an alias for
`NON-NULL FROZEN`.  Even without ANN, it makes sense and will stick with
new C* users.

A plug-in system would be great, but it shouldn't hold back this work imho.



On Mon, 1 May 2023 at 22:17, Benedict  wrote:

> I have explained repeatedly why I am opposed to ML-specific data types. If
> we want to make an ML-specific data type, it should be in an ML plug-in. We
> should not pollute the general purpose language with hastily-considered
> features that target specific bandwagons - at best partially - no matter
> how exciting the bandwagon.
>
> I think a simple and easy case can be made for fixed length array types
> that do not seem to create random bits of cruft in the language that dangle
> by themselves should this play not pan out. This is an easy way for this
> effort to make progress without negatively impacting the language.
>
> That is, unless we want to start supporting totally random types for every
> use case at the top level language layer. I don’t think this is a good
> idea, personally, and I’m quite confident we would now be regretting this
> approach had it been taken for earlier bandwagons.
>
> Nor do I think anyone’s priors about how successful this effort will be
> should matter. As a matter of principle, we should simply never deliver a
> specialist functionality as a high level CQL language feature without at
> least baking it for several years as a plug-in.
>
> On 1 May 2023, at 21:03, Mick Semb Wever  wrote:
>
> 
>
> Yes!  What you (David) and Benedict write beautifully supports `VECTOR
> FLOAT[n]` imho.
>
> You are definitely bringing up valid implementation details, and that can
> be dealt with during patch review. This thread is about the CQL API
> addition.
>
> No matter which way the technical review goes with the implementation
> details, `VECTOR FLOAT[n]` does not limit it, and gives us the most ML
> idiomatic approach and the best long-term CQL API.  It's a win-win
> situation – no matter how you look at it imho it is the best solution api
> wise.
>
> Unless the suggestion is that an ideal implementation can give us a better
> CQL API – but I don't see what that could be.   Maybe the suggestion is we
> deny the possibility of using the VECTOR keyword and bring us back to
> something like `NON-NULL FROZEN`.   This is odd to me because
> `VECTOR` here can be just an alias for `NON-NULL FROZEN` while meeting the
> patch's audience and their idioms.  I have no problems with introducing
> such an alias to meet the ML crowd.
>
> Another way I think of this is
>  `VECTOR FLOAT[n]` is the porcelain ML cql api,
>  `NON-NULL FROZEN` and `FROZEN` and `FLOAT[n]` are the
> general-use plumbing cql apis.
>
> This would allow implementation details to be moved out of this thread and
> to the review phase.
>
>
>
>
> On Mon, 1 May 2023 at 20:57, David Capwell  wrote:
>
>> > I think it is totally reasonable that the ANN patch (and Jonathan) is
>> not asked to implement on top of, or towards, other array (or other) new
>> data types.
>>
>>
>> This impacts serialization, if you do not think about this day 1 you then
>> can’t add later on without having to worry about migration and versioning…
>>
>> Honestly I wanted to better understand the cost to be generic and the
>> impact to ANN, so I took
>> https://github.com/jbellis/cassandra/blob/vsearch/src/java/org/apache/cassandra/db/marshal/VectorType.java
>> and made it handle every requirement I have listed so far (size, null, all
>> types)… the current patch has several bugs at the type level that would
>> need to be fixed, so had to fix those as well…. Total time to do this was
>> 10 minutes… and this includes adding a method "public float[]
>> composeAsFloats(ByteBuffer bytes)” which made the change to existing logic
>> small (change VectorType.Serializer.instance.deserialize(buffer) to
>> type.composeAsFloats(buffer))….
>>
>> Did this have any impact to the final ByteBuffer?  Nope, it had identical
>> layout for the FloatType case, but works for all types…. I didn’t change
>> the fact we store the size (felt this could be removed, but then we could
>> never support expanding the vector in the future…)
>>
>> So, given the fact it takes a few minutes to implement all these
>> requirements, I do find it very reasonable to push back and say we should
>> make sure the new type is not leaking details from a special ANN index…. We
>> have spent more time debating this than it takes to support… we also have
>> fuzz testing on trunk so just updating
>> org.apache.cassandra.utils.AbstractTypeGenerat

Re: [POLL] Vector type for ML

2023-05-02 Thread Mick Semb Wever
On Tue, 2 May 2023 at 17:14, Jonathan Ellis  wrote:

> Should we add a vector type to Cassandra designed to meet the needs of
> machine learning use cases, specifically feature and embedding vectors for
> training, inference, and vector search?
>
> ML vectors are fixed-dimension (fixed-length) sequences of numeric types,
> with no nulls allowed, and with no need for random access. The ML industry
> overwhelmingly uses float32 vectors, to the point that the industry-leading
> special-purpose vector database ONLY supports that data type.
>
> This poll is to gauge consensus subsequent to the recent discussion thread
> at https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.
>
> Please rank the discussed options from most preferred option to least,
> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B
> = A (C is my preference, followed by B or A approximately equally.)
>
> (A) I am in favor of adding a vector type for floats; I do not believe we
> need to tie it to any particular implementation details.
>
> (B) I am okay with adding a vector type but I believe we must add array
> types that compose with all Cassandra types first, and make vectors a
> special case of arrays-without-null-elements.
>
> (C) I am not in favor of adding a built-in vector type.
>



A  > B > C

B is stated as "must add array types…".  I think this is a bit loaded.  If
B was the (A + the implementation needs to be a non-null frozen float32
array, serialisation forward compatible with other frozen arrays later
implemented) I would put this before (A).  Especially because it's been
shown already this is easy to implement.


Re: [VOTE] Release Apache Cassandra 3.11.15

2023-05-03 Thread Mick Semb Wever
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>


+1

Checked
- signing correct
- checksums are correct
- source artefact builds
- binary artefact runs
- debian package runs
- redhat package runs


Re: [POLL] Vector type for ML

2023-05-04 Thread Mick Semb Wever
>
> Did we agree on a CQL syntax?
>
> I don’t believe there has been a pool on CQL syntax… my understanding
> reading all the threads is that there are ~4-5 options and non are -1ed, so
> believe we are waiting for majority rule on this?
>


Re-reading that thread, IIUC the valid choices remaining are…

1. VECTOR FLOAT[n]
2. FLOAT VECTOR[n]
3. VECTOR
4. VECTOR[n]
5. ARRAY
6. NON-NULL FROZEN


Yes I'm putting my preference (1) first ;) because (banging on) if the
future of CQL will have FLOAT[n] and FROZEN, where the VECTOR
keyword is: for general cql users; just meaning "non-null and frozen",
these gel best together.

Options (5) and (6) are for those that feel we can and should provide this
type without introducing the vector keyword.


Re: [POLL] Vector type for ML

2023-05-05 Thread Mick Semb Wever
On Fri, 5 May 2023 at 18:43, David Capwell  wrote:

> Went through and created a spreed sheet of current votes… For Patric and
> Mike, I don’t see a clear vote, so I put a ? where I “think” your
> preference is… for Mick, I only put one vote as the list looked like a
> summary, but you mentioned the first was your preference
>
> *Syntax*
>
> *Jonathan Ellis*
>
> *David Capwell*
>
> *Josh McKenzie*
>
> *Caleb Rackliffe*
>
> *Patrick McFadin*
>
> *Brandon Williams*
>
> *Mike Adamson*
>
> *Benedict*
>
> *Mick Semb Wever*
>
> VECTOR
>
> 1
>
> 2
>
> 2
>
>
>
> 1
>
> ?
>
> 3
>
>
> DENSE VECTOR
>
> 2
>
> 1
>
>
>
> ?
>
>
> ?
>
>
>
> type[dimension]
>
> 3
>
> 3
>
> 3
>
> 1
>
>
> 3
>
>
> 2
>
>
> DENSE_VECTOR
>
>
>
> 1
>
>
>
>
>
>
>
> NON NULL [dimention]
>
>
> 1
>
>
>
>
>
>
> 1
>
>
> VECTOR type[n]
>
>
>
>
>
>
> 2
>
>
>
> 1
>
> ARRAY
>
>
>
>
>
>
>
>
>
>
> NON-NULL FROZEN
>
>
>
>
>
>
>
>
>
>
>
> 1 = top pick
> 2 = second pick
> 3 = third pick
>


Is what Josh writes always separate ??

My 2 is VECTOR

Thanks David for tallying.


Re: [VOTE] Release Apache Cassandra 3.0.29

2023-05-12 Thread Mick Semb Wever
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>


+1

Checked
- signing correct
- checksums are correct
- source artefact builds
- binary artefact runs
- debian package runs
- redhat package runs


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Mick Semb Wever
On Thu, 11 May 2023 at 05:27, Patrick McFadin  wrote:

> Having pulled a lot of developers out of the 2i fire,
>


Yes.  I'm keen not to leave 2i as the default once SAI lands. Otherwise I
agree with the deprecated first principle, but 2i is just too problematic.
Just having no default in 5.0, forcing the user to evaluate which index to
use would be an improvement.

For example, if the default index in cassandra.yaml option exists but is
commented out, that would prevent `CREATE INDEX` from working without
specifying a `USING`. Then the yaml documentation would be clear about
choices.  I'd be ok with that for 5.0, and then make sai the default in the
following release.

Note, having the option in cassandra.yaml is problematic, as this is not a
per-node setting (AFAIK).


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Mick Semb Wever
>
>
> Given it seems most DBs have a default index (see Postgres, etc.), I tend
> to lean toward having one, but that's me...
>


I'm for it too.  Would be nice to enforce the setting is globally uniform
to avoid the per-node problem. Or add a keyspace option.

For users replaying <5 DDLs this would just require they set the default
index to 2i.
This is not a headache, it's a one-off action that can be clearly expressed
in NEWS.
It acts as a deprecation warning too.
This prevents new uneducated users from creating the unintended index, it
supports existing users, and it does not present SAI as the battle-tested
 default.

Agree with the poll, there's a number of different PoVs here already.  I'm
not fond of the LOCAL addition,  I appreciate what it informs, but it's
just not important enough IMHO (folk should be reading up on the index
type).


Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread Mick Semb Wever
[POLL] Centralize existing syntax or create new syntax?
>
> 1.) CREATE INDEX ... USING  WITH OPTIONS...
> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but
> adds LOCAL keyword for clarity and separation from future GLOBAL indexes)
>


(1) CREATE INDEX …



> [POLL] Should there be a default? (YES/NO)
>


Yes (but see below).



> [POLL] What do do with the default?
>
> 1.) Allow a default, and switch it to SAI (no configurables)
> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
> 3.) YAML config to override default index (legacy 2i remains the default)
> 4.) YAML config/guardrail to require index type selection (not required by
> default)
>


(4) YAML config. Commented out default of 2i.

I agree that the default cannot change in 5.0, but our existing default of
2i can be commented out.

For the user this gives them the same feedback, and puts the same
requirement to edit one line of yaml, as when we disabled MVs and SASI in
4.0
No one has complained about either of these, which is a clear signal folk
understood how to get their existing DDLs to work from 3.x to 4.x


Re: [DISCUSS] Feature branch version hygiene

2023-05-17 Thread Mick Semb Wever
On Tue, 16 May 2023 at 13:02, J. D. Jordan 
wrote:

> Process question/discussion. Should tickets that are merged to CEP feature
> branches, like  https://issues.apache.org/jira/browse/CASSANDRA-18204, have
> a fixver of 5.0 on them After merging to the feature branch?
>
>
> For the SAI CEP which is also using the feature branch method the
> "reviewed and merged to feature branch" tickets seem to be given a version
> of NA.
>
>
> Not sure that's the best “waiting for cep to merge” version either?  But
> it seems better than putting 5.0 on them to me.
>
>
> Why I’m not keen on 5.0 is because if we cut the release today those
> tickets would not be there.
>
>
> What do other people think?  Is there a better version designation we can
> use?
>
>
> On a different project I have in the past made a “version number” in JIRA
> for each long running feature branch. Tickets merged to the feature branch
> got the epic ticket number as their version, and then it got updated to the
> “real” version when the feature branch was merged to trunk.
>


Thanks for raising the thread, I remember there was some confusion early
wrt features branches too.

To rehash, for everything currently resolved in trunk 5.0 is the correct
fixVersion.  (And there should be no unresolved issues today with 5.0
fixVersion, they should be 5.x)


When alpha1 is cut, then the 5.0-alpha1 fixVersion is created and
everything with 5.0 also gets  5.0-alpha1. At the same time 5.0-alpha2,
5.0-beta, 5.0-rc, 5.0.0 fixVersions are created. Here both 5.0-beta and
5.0-rc are blocking placeholder fixVersions: no resolved issues are left
with this fixVersion the same as the .x placeholder fixVersions. The 5.0.0
is also used as a blocking version, though it is also an eventual
fixVersion for resolved tickets. Also note, all tickets up to and including
5.0.0 will also have the 5.0 fixVersion.


A particular reason for doing things the way they are is to make it easy
for the release manager to bulk correct fixVersions, at release time or
even later, i.e. without having to read the ticket or go talk to authors or
painstakingly crawl CHANGES.txt.


For feature branches my suggestion is that we create a fixVersion for each
of them, e.g. 5.0-cep-15

Yup, that's your suggestion Jeremiah (I wrote this up on the plane before I
got to read your post properly).


(As you say) This then makes it easy to see where the code is (or what the
patch is currently being based on). And when the feature branch is merged
then it is easy to bulk replace it with trunk's fixVersion, e.g.  5.0-cep-15
with 5.0


The NA fixVersion was introduced for the other repositories, e.g. website
updates.


Re: [DISCUSS] Feature branch version hygiene

2023-05-18 Thread Mick Semb Wever
So when a CEP slips, do we have to create a 5.1-cep-N?
>


No, you'd just rename it, easy to do in just one place.
I really don't care, but the version would at least helps indicate what the
branch is getting rebased off.




> My personal view is that 5.0 should not be used for any resolved tickets -
> they should go to 5.0-alpha1, since this is the correct release for them.
> 5.0 can then be the target version, which makes more sense given it isn’t a
> concrete release.
>


Each time, we don't know if the first release will be an alpha1 or if we're
confident enough to go straight to a beta1.
A goal with stable trunk would make the latter possible.

And with the additional 5.0 label has been requested by a few to make it
easy to search for what's new, this has been the simplest way.


Re: [DISCUSS] Feature branch version hygiene

2023-05-19 Thread Mick Semb Wever
On Thu, 18 May 2023 at 07:23, Benedict  wrote:

> So we just rename alpha1 to beta1 if that happens?
>


Yes, agreed Benedict.



> Or, we point resolved tickets straight to 5.0.0, and add 5.0-alpha1 to any
> tickets with *only* 5.0.0
> This is probably the easiest for folk to understand when browsing.
> Finding new features is easy either way - look for 5.0.0.
>


No opinion either way. I suspect 5.0 was used as the additional label
because 5.0.0 and other versions are the specific versions as found in
CHANGES.txt


Re: Vector search demo, and query syntax

2023-05-23 Thread Mick Semb Wever
>
>
> *I propose that we adopt `ORDER BY` syntax, supporting it for vector
> indexes first and eventually for all SAI indexes.  So this query would
> becomeSELECT id, start, end, text FROM
> {self.keyspace}.{self.table} ORDER BY embedding ANN OF %s LIMIT %s*
>


LGTM.

I first stumbled a bit with "there's no where clause and no filtering
allowed…"

But I doubt that reaction from any experienced cql user will last more than
a moment.


Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-24 Thread Mick Semb Wever
WRT git submodules and CASSANDRA-18204, are we happy with how it is working
for accord ?

The time spent on getting that running has been a fair few hours, where we
could have cut many manual module releases in that time.

David and folks working on accord ?



On Tue, 23 May 2023 at 20:09, Josh McKenzie  wrote:

> I'll hold off on this until Alex Petrov chimes in. @Alex -> got any
> thoughts here?
>
> On Tue, May 16, 2023, at 5:17 PM, Jeremy Hanna wrote:
>
> I think it would be great to onboard Harry more officially into the
> project.  However it would be nice to perhaps do some sanity checking
> outside of Apple folks to see how approachable it is.  That is, can someone
> take it and just run it with the current readme without any additional
> context?
>
> I wonder if a mini-onboarding session would be good as a community session
> - go over Harry, how to run it, how to add a test?  Would that be the right
> venue?  I just would like to see how we can not only plug it in to regular
> CI but get everyone that wants to add a test be able to know how to get
> started with it.
>
> Jeremy
>
> On May 16, 2023, at 1:34 PM, Abe Ratnofsky  wrote:
>
> Just to make sure I'm understanding the details, this would mean
> apache/cassandra-harry maintains its status as a separate repository,
> apache/cassandra references it as a submodule, and clones and builds Harry
> locally, rather than pulling a released JAR. We can then reference Harry as
> a library without maintaining public artifacts for it. Is that in line with
> what you're thinking?
>
> > I'd also like to see us get a Harry run integrated as part of our
> pre-commit CI
>
> I'm a strong supporter of this, of course.
>
> On May 16, 2023, at 11:03 AM, Josh McKenzie  wrote:
>
> Similar to what we've done with accord in
> https://issues.apache.org/jira/browse/CASSANDRA-18204, I'd like to
> discuss bringing cassandra-harry in-tree as a submodule. repo link:
> https://github.com/apache/cassandra-harry
>
> Given the value it's brought to the project's stabilization efforts and
> the movement of other things in the ecosystem to being more integrated
> (accord, build-scripts
> https://issues.apache.org/jira/browse/CASSANDRA-18133), I think having
> the testing framework better localized and integrated would be a net
> benefit for adoption, awareness, maintenance, and tighter workflows as we
> troubleshoot future failures it surfaces.
>
> I'd also like to see us get a Harry run integrated as part of our
> pre-commit CI (a 5 minute simple soak test for instance) and having that
> local in this fashion should make that a cleaner integration as well.
>
> Thoughts?
>
>
>


Re: [GitHub] [cassandra-analytics] frankgh opened a new pull request, #1: Provide a SecretsProvider interface to abstract the secret provisioning

2023-05-24 Thread Mick Semb Wever
Francisco, can you please put the appropriate .asf.yaml file in place so
notifications are sent to correct MLs.

On Tue, 23 May 2023 at 22:56, frankgh (via GitHub)  wrote:

>
> frankgh opened a new pull request, #1:
> URL: https://github.com/apache/cassandra-analytics/pull/1
>
>This commit introduces the SecretsProvider interface that abstracts the
> secrets provisioning. This way different implementations of the
> SecretsProvider can be used to provide SSL secrets for the Analytics job.
> We provide an implementation, SslConficSecretsProvider, which provides
> secrets based on the configuration for the job.
>
>
> --
> This is an automated message from the Apache Git Service.
> To respond to the message, please log on to GitHub and use the
> URL above to go to the specific comment.
>
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>
> For queries about this service, please contact Infrastructure at:
> us...@infra.apache.org
>
>


Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-24 Thread Mick Semb Wever
>
> So looking at accord trunk, we needed 12 votes for a release, and each
> vote is a min of 3 days, so 36 days of overhead vs 5 hours of work?
>


That's apples and oranges (wait time vs effort time).  I was most
interested in (and supportive of) your qualified opinion :-)




> One thing that can be annoying is for people who don’t use work trees and
> switch between trunk and cassandra-4.x in the same directory… I am not sure
> if the issues here are my scripts, or git getting confused…. If you use
> work trees (I strongly recommend regardless of submodules or not) you don’t
> have these issues (my disk layout is below [1]).
>


I'm wondering if we should wait until our first git submodule is in trunk,
so the process gets more exposure, before adding our second ?
It's a bit of a pita to get rid of warnings/errors from bad hooks.


> Submodules do have their own overhead and edge cases, so I am mostly in
> favor of using for cases where the code must live outside of tree (such as
> jvm-dtest that lives out of tree as all branches need the same interfaces)
>
>

Agree. If it makes sense it would be better to just bring the code in.


[VOTE] Release Apache Cassandra 4.0.10

2023-05-25 Thread Mick Semb Wever
Proposing the test build of Cassandra 4.0.10 for release.

sha1: da77d3f729160e84fbab37666de99550be794265
Git: https://github.com/apache/cassandra/tree/4.0.10-tentative
Maven Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1299/org/apache/cassandra/cassandra-all/4.0.10/

The Source and Build Artifacts, and the Debian and RPM packages and
repositories, are available here:
https://dist.apache.org/repos/dist/dev/cassandra/4.0.10/

The vote will be open for 72 hours (longer if needed). Everyone who has
tested the build is invited to vote. Votes by PMC members are considered
binding. A vote passes if there are at least three binding +1s and no -1's.

[1]: CHANGES.txt:
https://github.com/apache/cassandra/blob/4.0.10-tentative/CHANGES.txt
[2]: NEWS.txt:
https://github.com/apache/cassandra/blob/4.0.10-tentative/NEWS.txt


[VOTE] Release Apache Cassandra 4.1.2

2023-05-25 Thread Mick Semb Wever
Proposing the test build of Cassandra 4.1.2 for release.

sha1: c5c075f0080f3f499d2b01ffb155f89723076285
Git: https://github.com/apache/cassandra/tree/4.1.2-tentative
Maven Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1302/org/apache/cassandra/cassandra-all/4.1.2/

The Source and Build Artifacts, and the Debian and RPM packages and
repositories, are available here:
https://dist.apache.org/repos/dist/dev/cassandra/4.1.2/

The vote will be open for 72 hours (longer if needed). Everyone who has
tested the build is invited to vote. Votes by PMC members are considered
binding. A vote passes if there are at least three binding +1s and no -1's.

[1]: CHANGES.txt:
https://github.com/apache/cassandra/blob/4.1.2-tentative/CHANGES.txt
[2]: NEWS.txt:
https://github.com/apache/cassandra/blob/4.1.2-tentative/NEWS.txt


Re: [VOTE] Release Apache Cassandra 4.0.10

2023-05-25 Thread Mick Semb Wever
>
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>


+1


Checked
- signing correct
- checksums are correct
- source artefact builds (JDK 8+11)
- binary artefact runs (JDK 8+11)
- debian package runs (JDK 8+11)
- debian repo runs (JDK 8+11)
- redhat* package runs (JDK 8+11)
- redhat* repo runs (JDK 8+11) ***


***) yum repo installation looks to be failing due to a legacy (SHA1)
third-party sig in our KEYS file. This would impact all rhel9+ users.
Workaround is…
```
# run this before `yum install cassandra`
update-crypto-policies --set LEGACY
```
ref:
https://www.redhat.com/en/blog/rhel-security-sha-1-package-signatures-distrusted-rhel-9


  1   2   3   4   5   6   7   8   9   10   >