Re: [DISCUSS] CEP-8 Drivers Donation - take 2

2023-05-30 Thread Mick Semb Wever
Thank you so much Jeremy and Greg (+others) for all the hard work on this.



> At this point, we'd like to propose CEP-8 for consideration, starting the
> process to accept the DataStax Java driver as an official ASF project.
>


Is the vote for the CEP to be for all drivers, but we will introduce each
driver one by one?  What determines when we are comfortable with one driver
subproject and can move on to accepting the next ?

Are there key committers and contributors on each driver that want to be
involved?  Should they be listed before the vote?
We also need three PMC for the new subproject.  Are we to assign these
before the vote?


Re: [DISCUSS] CEP-8 Drivers Donation - take 2

2023-05-30 Thread Josh McKenzie
> Is the vote for the CEP to be for all drivers, but we will introduce each 
> driver one by one?  What determines when we are comfortable with one driver 
> subproject and can move on to accepting the next ? 
Curious to hear on this as well. There's 2 implications from the CEP as written:

1. The Java and Python drivers hold special importance due to their language 
proximity and/or project's dependence upon them 
(https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation#CEP8:DatastaxDriversDonation-Scope)
2. Datastax is explicitly offering all 7 drivers for donation 
(https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation#CEP8:DatastaxDriversDonation-Goals)

This is the most complex contribution via CEP thus far from a governance 
perspective; I suggest we chart a bespoke path to navigate this. Having a top 
level indication of "the CEP is approved" logically separate from a 
per-language indication of "the project is ready to absorb this language driver 
now" makes sense to me. This could look like:

* Vote on the CEP itself
* Per language (processing one at a time):
* identify 3 PMC members willing to take on the governance role for the 
language driver
* Identify 2 contributors who are active on a given driver and stepping 
forward for a committer role on the driver
* Vote on inclusion of that language driver in the project + commit bits
* Integrate that driver into the project ecosystem (build, ci, docs, etc)

Not sure how else we could handle committers / contributors / PMC members other 
than on a per-driver basis.

On Tue, May 30, 2023, at 5:36 AM, Mick Semb Wever wrote:
> 
> Thank you so much Jeremy and Greg (+others) for all the hard work on this.
>  
>> 
>> At this point, we'd like to propose CEP-8 for consideration, starting the 
>> process to accept the DataStax Java driver as an official ASF project.
> 
> 
> Is the vote for the CEP to be for all drivers, but we will introduce each 
> driver one by one?  What determines when we are comfortable with one driver 
> subproject and can move on to accepting the next ? 
> 
> Are there key committers and contributors on each driver that want to be 
> involved?  Should they be listed before the vote?
> We also need three PMC for the new subproject.  Are we to assign these before 
> the vote?  
> 
> 


Re: [DISCUSS] CEP-8 Drivers Donation - take 2

2023-05-30 Thread Benjamin Lerer
The idea was to have a single driver sub-project. Even if the code bases
are different we believe that it is important to keep the drivers together
to retain cohesive API semantics and make sure they have similar
functionality and feature support.
In this scenario we would need only 3 PMC members for the governance. I am
willing to be one of them.

For the committers, my understanding, based on subproject governance
procedures,
 was that
they should be proposed directly to the PMC members.

Is the vote for the CEP to be for all drivers, but we will introduce each
> driver one by one?  What determines when we are comfortable with one driver
> subproject and can move on to accepting the next ?
>

The goal of the CEP is simply to ensure that the community is in favor of
the donation. Nothing more.
The plan is to introduce the drivers, one by one. Each driver donation will
need to be accepted first by the PMC members, as it is the case for any
donation. Therefore the PMC should have full control on the pace at which
new drivers are accepted.


Le mar. 30 mai 2023 à 12:22, Josh McKenzie  a écrit :

> Is the vote for the CEP to be for all drivers, but we will introduce each
> driver one by one?  What determines when we are comfortable with one driver
> subproject and can move on to accepting the next ?
>
> Curious to hear on this as well. There's 2 implications from the CEP as
> written:
>
> 1. The Java and Python drivers hold special importance due to their
> language proximity and/or project's dependence upon them (
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation#CEP8:DatastaxDriversDonation-Scope
> )
> 2. Datastax is explicitly offering all 7 drivers for donation (
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation#CEP8:DatastaxDriversDonation-Goals
> )
>
> This is the most complex contribution via CEP thus far from a governance
> perspective; I suggest we chart a bespoke path to navigate this. Having a
> top level indication of "the CEP is approved" logically separate from a
> per-language indication of "the project is ready to absorb this language
> driver now" makes sense to me. This could look like:
>
> * Vote on the CEP itself
> * Per language (processing one at a time):
> * identify 3 PMC members willing to take on the governance role for
> the language driver
> * Identify 2 contributors who are active on a given driver and
> stepping forward for a committer role on the driver
> * Vote on inclusion of that language driver in the project + commit
> bits
> * Integrate that driver into the project ecosystem (build, ci, docs,
> etc)
>
> Not sure how else we could handle committers / contributors / PMC members
> other than on a per-driver basis.
>
> On Tue, May 30, 2023, at 5:36 AM, Mick Semb Wever wrote:
>
>
> Thank you so much Jeremy and Greg (+others) for all the hard work on this.
>
>
>
> At this point, we'd like to propose CEP-8 for consideration, starting the
> process to accept the DataStax Java driver as an official ASF project.
>
>
>
> Is the vote for the CEP to be for all drivers, but we will introduce each
> driver one by one?  What determines when we are comfortable with one driver
> subproject and can move on to accepting the next ?
>
> Are there key committers and contributors on each driver that want to be
> involved?  Should they be listed before the vote?
> We also need three PMC for the new subproject.  Are we to assign these
> before the vote?
>
>
>
>


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-30 Thread Jonathan Ellis
Thanks, all.  Closing the vote as accepted with 8 binding +1 (including me)
and 11 non-binding votes.

On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis  wrote:

> Let's make this official.
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-30 Thread Jonathan Ellis
Thanks to Benjamin for pointing out to me that committer votes count as
binding for CEPs.

That makes the updated tally 15 binding and 4 non-binding.

On Tue, May 30, 2023 at 8:44 AM Jonathan Ellis  wrote:

> Thanks, all.  Closing the vote as accepted with 8 binding +1 (including
> me) and 11 non-binding votes.
>
> On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis  wrote:
>
>> Let's make this official.
>>
>> CEP:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>>
>> POC that demonstrates all the big rocks, including distributed queries:
>> https://github.com/datastax/cassandra/tree/cep-vsearch
>>
>> --
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced
>>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Cassandra Contributor Meeting May 30

2023-05-30 Thread Melissa Logan
The Cassandra Contributor Meeting starts soon at 10am PT. Dinesh Joshi will
be discussing CEP-28: Reading and Writing Cassandra Data with Spark Bulk
Analytics. See you then!

Details:
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Contributor+Meeting

How to join:

Join Zoom Meeting:
https://us02web.zoom.us/j/85996789692?pwd=eFVjWE44VXVmZzIwejhFSk43emFZUT09
Meeting ID: 859 9678 9692
Passcode: 193585


On Mon, May 8, 2023 at 9:31 AM Melissa Logan  wrote:

> Hi folks,
>
> The Cassandra community will be hosting monthly Contributor Meetings the
> last Tuesday of each month at 10:00 PT / 13:00 ET / 17:00 UTC / 22:30 IST.
> The purpose of these meetings is to enable real-time collaboration for
> contributors to discuss CEPs and other issues, and ask questions.
>
>
> The May 30 meeting has one topic to discuss so far, which is CEP-28:
> Reading and Writing Cassandra Data with Spark Bulk Analytics facilitated
> by Dinesh Joshi.
>
>
> If you have an item to discuss, add it to the Confluence page. Details
> and how to participate are here:
> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Contributor+Meeting
>
>
> Copy the invite from the public Cassandra Community Meeting Calendar:
> https://calendar.google.com/calendar/b/1?cid=a2w5cHVoZ2s3cXRkdXFhdHRlOHRmZDVtcHNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
>
> See you then!
>
> --
> Melissa Logan (she/her)
> Member, Apache Software Foundation
> CEO/Founder, Constantia.io
>


-- 
Melissa Logan (she/her)
Member, Apache Software Foundation
CEO/Founder, Constantia.io


Cassandra project status, 2023-05-30

2023-05-30 Thread Josh McKenzie
Been a bit over a month; let's check in and see how things are looking.

We released the following:
- 3.11.15
- 3.0.29
- 4.0.10
- 4.1.2

Thanks to all the release managers who worked on getting these out the door.


[New Contributors Getting Started]
First off, come hang out with us in the #cassandra-dev channel on 
https://the-asf.slack.com (reply to me on this email if you need an invite for 
your account), and reach out to the @cassandra_mentors alias with any questions 
about the code. We have a list of hand-curated "starter tickets" available 
here: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2160&quickFilter=2162.
 Anything in the "ToDo" column is a great candidate to pick up if you want to 
get your feet wet with the project. Some other useful links:

Getting Started with Development on C*: 
https://cassandra.apache.org/_/development/gettingstarted.html
Building and IDE integration (worktrees are your friend): 
https://cassandra.apache.org/_/development/ide.html
Code Style: https://cassandra.apache.org/_/development/code_style.html


[Dev mailing list]
https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-4-25|dto=2023-5-31:

52 threads since last email. I have made a mistake in waiting this long. ;)

Jonathan Ellis' thread on vector search reached a conclusion, a follow up 
discussion about API's took place, and a CEP was proposed, voted upon, and 
passed! Phew.

Thread: https://lists.apache.org/thread/16lc6d02xsfvlvqgn3ooy53pgfddyglc
Proposal on adding a new type for vector search: 
https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0
Poll on syntax: https://lists.apache.org/thread/lkowo1qkxjb5wc3n8v6ov4f0r538h13c
CEP-30 proposal: 
https://lists.apache.org/thread/v32tgofo0w47bl7stbb9141obfbg5r0x
CEP-30 vote: https://lists.apache.org/thread/7s581j65wtst6968c86hzncbnzrr09oj

Congratulations to Jonathan and everyone else involved for getting that up and 
running and driving to consensus that rapidly.

Speaking of CEP's, the patch for CEP-28 (Spark bulk writer / reader) via the 
sidecar was posted: 
https://lists.apache.org/thread/7pwvlwkg49qm72xnlf0m322fy4fmvxk3. Doug Rohrer 
also called a vote for CEP-28 and it passed as well! 
https://lists.apache.org/thread/7kndoo6rjchrlk41hbl8v7sclkvdzkgt Congrats to 
Doug and everyone else who collaborated on that effort as well. Quite a month 
for us as a project!

Maxim Muzafarov keeps fighting the good fight on vtables and updating running 
configurations: 
https://lists.apache.org/thread/gdtr3vp375d3nyj6h8xo7owth1s556lz.

Jakub's working on getting the ant target for generate-idea-files to behave 
with JDK17: https://lists.apache.org/thread/o2fdkyv2skvf9ngy9jhpnhvo92qvr17m. 
Looks like he has a few reviewers on the ticket but if you're curious you can 
find that here: https://issues.apache.org/jira/browse/CASSANDRA-18467

Discussion around CEP-29 (CQL NOT operator) continued: 
https://lists.apache.org/thread/cl4d7yo9q6ygnqstk8hhgm597ywg69d1
And was voted upon: 
https://lists.apache.org/thread/rwxc8y0c8johrhqcpxsdkns85rop0fxg and passed! 
Congratulations Piotr and crew on that; that's a feature I'm sure a lot of our 
users will appreciate.

Claude Warren has a PR open working with an SSTableDowngrader tool: 
https://lists.apache.org/thread/wvb8c5svvyvny0b61ybbw0jvxxflog4p. The PR can be 
found here: https://github.com/apache/cassandra/pull/2045, and this is in 
relation to the C* JIRA issue 
https://issues.apache.org/jira/browse/CASSANDRA-8928.

A new release of the in-jvm dtest API went out: 
https://lists.apache.org/thread/tsn70ox1th1x2vcsc7kfky9jsv1foq61

Maxim Muzafarov reached out to let everyone know about the migration of 
properties into the CassandraRelevantProperties class: 
https://lists.apache.org/thread/3g5g5kmk64m54qlyhpmdvxcw8m2vsytz. I'm very 
happy SonarLint will stop yelling at me about this class of warnings going 
forward. :)

With SAI appearing as well as ANN Vector search, the topic of how we handle our 
CREATE INDEX DDL came up courtesy of Caleb Rackliffe: 
https://lists.apache.org/thread/4jxq1tghvb10f848q5vkq241w39lyw57. Looks like 
we've managed to distill things down to something we can wrangle to consensus: 
https://lists.apache.org/thread/oswfj6rsq298dfffw3yzy12q82ybczn7

Our usage of FixVersion continues to evolve: 
https://lists.apache.org/thread/5ompnd3l76kpwc831h80o1jd1g87dcgy. This thread 
came up around what FixVersion we apply to tickets that are sub-tasks of epic's 
for approved CEP's that may or may not land in a major. Since we don't know if 
they're going to be done by the hard cutoff for 5.0 for instance, 5.0 as a 
release version would be incorrect. And since 5.X is historically reserved for 
"5.0-targeting but not yet merged", we end up in a bind there.

Benedict definitely brought me around to the approach of having: FIXVERSION = 
5.0-target, and upon merge of the parent epic we can update all children 
tickets to whatever the parent has. No real strong conse

Re: Cassandra Contributor Meeting May 30

2023-05-30 Thread Melissa Logan
Video from today's meeting is here: https://youtu.be/gImCuUCwb0Q

Learn more and start testing/using it today:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics

This is a great project for first-time contributors as it doesn't require
deep expertise in Cassandra or Spark.

On Tue, May 30, 2023 at 9:38 AM Melissa Logan  wrote:

> The Cassandra Contributor Meeting starts soon at 10am PT. Dinesh Joshi
> will be discussing CEP-28: Reading and Writing Cassandra Data with Spark
> Bulk Analytics. See you then!
>
> Details:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Contributor+Meeting
>
> How to join:
>
> Join Zoom Meeting:
> https://us02web.zoom.us/j/85996789692?pwd=eFVjWE44VXVmZzIwejhFSk43emFZUT09
> Meeting ID: 859 9678 9692
> Passcode: 193585
>
>
> On Mon, May 8, 2023 at 9:31 AM Melissa Logan 
> wrote:
>
>> Hi folks,
>>
>> The Cassandra community will be hosting monthly Contributor Meetings the
>> last Tuesday of each month at 10:00 PT / 13:00 ET / 17:00 UTC / 22:30 IST.
>> The purpose of these meetings is to enable real-time collaboration for
>> contributors to discuss CEPs and other issues, and ask questions.
>>
>>
>> The May 30 meeting has one topic to discuss so far, which is CEP-28:
>> Reading and Writing Cassandra Data with Spark Bulk Analytics facilitated
>> by Dinesh Joshi.
>>
>>
>> If you have an item to discuss, add it to the Confluence page. Details
>> and how to participate are here:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Contributor+Meeting
>>
>>
>> Copy the invite from the public Cassandra Community Meeting Calendar:
>> https://calendar.google.com/calendar/b/1?cid=a2w5cHVoZ2s3cXRkdXFhdHRlOHRmZDVtcHNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
>>
>> See you then!
>>
>> --
>> Melissa Logan (she/her)
>> Member, Apache Software Foundation
>> CEO/Founder, Constantia.io
>>
>
>
> --
> Melissa Logan (she/her)
> Member, Apache Software Foundation
> CEO/Founder, Constantia.io
>


-- 
Melissa Logan (she/her)
Member, Apache Software Foundation
CEO/Founder, Constantia.io


Re: Is simplenative in cassandra-stress still relevant?

2023-05-30 Thread Brad
+1 on removing it from cassandra-stress

If you're performing stress testing, why would you not want to use the
official driver?  I've spoken to several people who all have said they've
never used simplenative mode.

On Sat, May 27, 2023 at 3:57 AM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> I am doing some fixes for cassandra-stress and I stumbled upon this
>
> https://issues.apache.org/jira/browse/CASSANDRA-18529
>
> There is
>
> Usage: -mode native [unprepared] cql3 [compression=?] [port=?] [user=?]
> [password=?] [auth-provider=?] [maxPending=?] [connectionsPerHost=?]
> [protocolVersion=?]
>  OR
> Usage: -mode simplenative [prepared] cql3 [port=?]
>
> "-mode simplenative prepared cql3" throws: (it works without "prepared").
>
> java.lang.ClassCastException: [B cannot be cast to
> org.apache.cassandra.transport.messages.ResultMessage$Prepared
> java.io.IOException: Operation x10 on key(s) [373038504b3436363830]: Error
> executing: (ClassCastException): [B cannot be cast to
> org.apache.cassandra.transport.messages.ResultMessage$Prepared
>
> at org.apache.cassandra.stress.Operation.error(Operation.java:127)
> at
> org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:105)
> at
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:91)
> at
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:99)
> at
> org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:242)
> at
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:467)
> java.io.IOException: Operation x10 on key(s) [4e334f364c4c4b373530]: Error
> executing: (ClassCastException): [B cannot be cast to
> org.apache.cassandra.transport.messages.ResultMessage$Prepared
>
>
> I want to ask if this "simplenative" is still relevant and people are
> still using it. It seems to me that nobody is actually using this / I've
> never heard of anybody doing that but I may be wrong and people are using
> it all day and night ...
>
> simplenative uses SimpleClient which is used through the code base, e.g.
> in CQLTester so we are not going to get rid of that for sure.
>
> If simplenative in stress is not relevant, that whole -mode is
> questionable, if we get rid of simplenative, we would end up having "-mode
> native cql3" and since there is nothing but "native" as there is no Thrift
> anymore, "native" is a constant which can go away. If we end up having
> "-mode cql3" as the only mode possible, whole -mode can go away and we can
> rename it to "-cql3".
>
> Thoughts?


Re: Is simplenative in cassandra-stress still relevant?

2023-05-30 Thread Brandon Williams
On Tue, May 30, 2023 at 7:15 PM Brad  wrote:
> If you're performing stress testing, why would you not want to use the 
> official driver?  I've spoken to several people who all have said they've 
> never used simplenative mode.

I agree that it shouldn't be used normally, but I'm not sure we should
remove it, because we can't remove it fully: SimpleClient is still
used in many tests, and I think that should continue.

If you suspect any kind of native proto or driver issue it may be
useful to have another implementation easily accessible to aid in
debugging the problem, and the maintenance cost of keeping it in
stress is roughly zero in my opinion.  We can make it clear that it's
not recommended for use and is intended only as a debugging tool,
though.

Kind Regards,
Brandon


Re: Is simplenative in cassandra-stress still relevant?

2023-05-30 Thread Miklosovic, Stefan
Interesting point about the debuggability.

Yes, I agree that SimpleClient (as class) should not be removed because we are 
using it in tests. I have already mentioned in my original e-mail that for this 
reason that class is not going anywhere and we still need to use it.

The cost of keeping it there is not big, sure, but we clearly see that e.g. the 
usage of "prepared" is buggy and it does not work. That somehow indicates to me 
that it kind of atrophied and nobody seems to notice which further supports my 
case that it is actually not used too much if it went undetected for so long.

Anyway, I think that we might just look at that bug with "prepared" and fix it 
and keep it all there. I do not see any tests which would test cassandra-stress 
command, similarly what we have for nodetool in JUnit. We could cover 
cassandra-stress similarly, just to be sure that its invocation on the most 
important commands does not fail over time.



From: Brandon Williams 
Sent: Wednesday, May 31, 2023 2:33
To: dev@cassandra.apache.org
Subject: Re: Is simplenative in cassandra-stress still relevant?

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




On Tue, May 30, 2023 at 7:15 PM Brad  wrote:
> If you're performing stress testing, why would you not want to use the 
> official driver?  I've spoken to several people who all have said they've 
> never used simplenative mode.

I agree that it shouldn't be used normally, but I'm not sure we should
remove it, because we can't remove it fully: SimpleClient is still
used in many tests, and I think that should continue.

If you suspect any kind of native proto or driver issue it may be
useful to have another implementation easily accessible to aid in
debugging the problem, and the maintenance cost of keeping it in
stress is roughly zero in my opinion.  We can make it clear that it's
not recommended for use and is intended only as a debugging tool,
though.

Kind Regards,
Brandon