Please grant Wiki access for CEP

2023-03-21 Thread Doug Rohrer
Hi folks:

I’d like to post a CEP, but given it’s the first time I’m trying to contribute 
to the wiki, I don’t have access.

If someone with access could please grant user drohrer access to post, I’d 
greatly appreciate it.

Thanks,

Doug Rohrer

[DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-23 Thread Doug Rohrer
Hi everyone,

Wiki: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics

We’d like to propose this CEP for adoption by the community.

It is common for teams using Cassandra to find themselves looking for a way to 
interact with large amounts of data for analytics workloads. However, 
Cassandra’s standard APIs aren’t designed for large scale data egress/ingest as 
the native read/write paths weren’t designed for bulk analytics.

We’re proposing this CEP for this exact purpose. It enables the implementation 
of custom Spark (or similar) applications that can either read or write large 
amounts of Cassandra data at line rates, by accessing the persistent storage of 
nodes in the cluster via the Cassandra Sidecar.

This CEP proposes new APIs in the Cassandra Sidecar and a companion library 
that allows deep integration into Apache Spark that allows its users to bulk 
import or export data from a running Cassandra cluster with minimal to no 
impact to the read/write traffic.

We will shortly publish a branch with code that will accompany this CEP to help 
readers understand it better.

As a reminder, please keep the discussion here on the dev list vs. in the wiki, 
as we’ve found it easier to manage via email.

Sincerely,

Doug Rohrer & James Berragan

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-24 Thread Doug Rohrer
I agree that the analytics library will need to support vnodes. To be clear, 
there’s nothing preventing the solution from working with vnodes right now, and 
no assumptions about a 1:1 topology between a token and a node. However, we 
don’t, today, have the ability to test vnode support end-to-end. We are working 
towards that, however, and should be able to remove the caveat from the 
released analytics library once we can properly test vnode support.
If it helps, I can update the CEP to say something more like “Caveat: Currently 
untested with vnodes - work is ongoing to remove this limitation” if that helps?

Doug

> On Mar 24, 2023, at 11:43 AM, Brandon Williams  wrote:
> 
> On Fri, Mar 24, 2023 at 10:39 AM Jeremiah D Jordan
>  wrote:
>> 
>> I have concerns with the majority of this being in the sidecar and not in 
>> the database itself.  I think it would make sense for the server side of 
>> this to be a new service exposed by the database, not in the sidecar.  That 
>> way it can be able to properly integrate with the authentication and 
>> authorization apis, and to make it a first class citizen in terms of having 
>> unit/integration tests in the main DB ensuring no one breaks it.
> 
> I don't think this can/should happen until it supports the database's
> default configuration with vnodes.



Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-04-05 Thread Doug Rohrer
Sorry for the delay in responding here - yes, we can add some diagrams to the 
CEP - I’ll try to get that done by end-of-week.

Thanks,

Doug

> On Mar 28, 2023, at 1:14 PM, J. D. Jordan  wrote:
> 
> Maybe some data flow diagrams could be added to the cep showing some example 
> operations for read/write?
> 
>> On Mar 28, 2023, at 11:35 AM, Yifan Cai  wrote:
>> 
>> 
>> A lot of great discussions! 
>> 
>> On the sidecar front, especially what the role sidecar plays in terms of 
>> this CEP, I feel there might be some confusion. Once the code is published, 
>> we should have clarity.
>> Sidecar does not read sstables nor do any coordination for analytics 
>> queries. It is local to the companion Cassandra instance. For bulk read, it 
>> takes snapshots and streams sstables to spark workers to read. For bulk 
>> write, it imports the sstables uploaded from spark workers. All commands are 
>> existing jmx/nodetool functionalities from Cassandra. Sidecar adds the http 
>> interface to them. It might be an over simplified description. The complex 
>> computation is performed in spark clusters only.
>> 
>> In the long run, Cassandra might evolve into a database that does both OLTP 
>> and OLAP. (Not what this thread aims for) 
>> At the current stage, Spark is very suited for analytic purposes. 
>> 
>> On Tue, Mar 28, 2023 at 9:06 AM Benedict > > wrote:
>>> I disagree with the first claim, as the process has all the information it 
>>> chooses to utilise about which resources it’s using and what it’s using 
>>> those resources for.
>>> 
>>> The inability to isolate GC domains is something we cannot address, but 
>>> also probably not a problem if we were doing everything with memory 
>>> management as well as we could be.
>>> 
>>> But, not worth detailing this thread for. Today we do very little well on 
>>> this front within the process, and a separate process is well justified 
>>> given the state of play.
>>> 
 On 28 Mar 2023, at 16:38, Derek Chen-Becker >>> > wrote:
 
 
 
 On Tue, Mar 28, 2023 at 9:03 AM Joseph Lynch >>> > wrote:
 ...
 
> I think we might be underselling how valuable JVM isolation is,
> especially for analytics queries that are going to pass the entire
> dataset through heap somewhat constantly. 
 
 Big +1 here. The JVM simply does not have significant granularity of 
 control for resource utilization, but this is explicitly a feature of 
 separate processes. Add in being able to separate GC domains and you can 
 avoid a lot of noisy neighbor in-VM behavior for the disparate workloads.
 
 Cheers,
 
 Derek
 
 
 -- 
 +---+
 | Derek Chen-Becker |
 | GPG Key available at https://keybase.io/dchenbecker and   |
 | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
 | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
 +---+
 



Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-04-10 Thread Doug Rohrer
I’ve updated the CEP with two overview diagrams of the interactions between 
Sidecar, Cassandra, and the Bulk Analytics library.  Hope this helps folks 
better understand how things work, and thanks for the patience as it took a bit 
longer than expected for me to find the time for this.

Doug

> On Apr 5, 2023, at 11:18 AM, Doug Rohrer  wrote:
> 
> Sorry for the delay in responding here - yes, we can add some diagrams to the 
> CEP - I’ll try to get that done by end-of-week.
> 
> Thanks,
> 
> Doug
> 
>> On Mar 28, 2023, at 1:14 PM, J. D. Jordan  wrote:
>> 
>> Maybe some data flow diagrams could be added to the cep showing some example 
>> operations for read/write?
>> 
>>> On Mar 28, 2023, at 11:35 AM, Yifan Cai  wrote:
>>> 
>>> 
>>> A lot of great discussions! 
>>> 
>>> On the sidecar front, especially what the role sidecar plays in terms of 
>>> this CEP, I feel there might be some confusion. Once the code is published, 
>>> we should have clarity.
>>> Sidecar does not read sstables nor do any coordination for analytics 
>>> queries. It is local to the companion Cassandra instance. For bulk read, it 
>>> takes snapshots and streams sstables to spark workers to read. For bulk 
>>> write, it imports the sstables uploaded from spark workers. All commands 
>>> are existing jmx/nodetool functionalities from Cassandra. Sidecar adds the 
>>> http interface to them. It might be an over simplified description. The 
>>> complex computation is performed in spark clusters only.
>>> 
>>> In the long run, Cassandra might evolve into a database that does both OLTP 
>>> and OLAP. (Not what this thread aims for) 
>>> At the current stage, Spark is very suited for analytic purposes. 
>>> 
>>> On Tue, Mar 28, 2023 at 9:06 AM Benedict >> <mailto:bened...@apache.org>> wrote:
>>>> I disagree with the first claim, as the process has all the information it 
>>>> chooses to utilise about which resources it’s using and what it’s using 
>>>> those resources for.
>>>> 
>>>> The inability to isolate GC domains is something we cannot address, but 
>>>> also probably not a problem if we were doing everything with memory 
>>>> management as well as we could be.
>>>> 
>>>> But, not worth detailing this thread for. Today we do very little well on 
>>>> this front within the process, and a separate process is well justified 
>>>> given the state of play.
>>>> 
>>>>> On 28 Mar 2023, at 16:38, Derek Chen-Becker >>>> <mailto:de...@chen-becker.org>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Mar 28, 2023 at 9:03 AM Joseph Lynch >>>> <mailto:joe.e.ly...@gmail.com>> wrote:
>>>>> ...
>>>>> 
>>>>>> I think we might be underselling how valuable JVM isolation is,
>>>>>> especially for analytics queries that are going to pass the entire
>>>>>> dataset through heap somewhat constantly. 
>>>>> 
>>>>> Big +1 here. The JVM simply does not have significant granularity of 
>>>>> control for resource utilization, but this is explicitly a feature of 
>>>>> separate processes. Add in being able to separate GC domains and you can 
>>>>> avoid a lot of noisy neighbor in-VM behavior for the disparate workloads.
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Derek
>>>>> 
>>>>> 
>>>>> -- 
>>>>> +---+
>>>>> | Derek Chen-Becker |
>>>>> | GPG Key available at https://keybase.io/dchenbecker and   |
>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>>>> +---+
>>>>> 
> 



[VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Doug Rohrer
Hello all,

I’d like to put CEP-28 to a vote.

Proposal:

https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics

Jira:
https://issues.apache.org/jira/browse/CASSANDRA-16222

Draft implementation:

- Apache Cassandra Spark Analytics source code: 
https://github.com/frankgh/cassandra-analytics
- Changes required for Sidecar: 
https://github.com/frankgh/cassandra-sidecar/tree/CEP-28-bulk-apis

Discussion:
https://lists.apache.org/thread/lrww4d7cdxgtg8o3gt8b8foymzpvq7z3

The vote will be open for 72 hours. 
A vote passes if there are at least three binding +1s and no binding vetoes. 


Thanks,

Doug Rohrer




Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-07 Thread Doug Rohrer
The vote passes with 12 +1s (8 binding) and no -1.

Thank you all for taking the time to consider CEP-28. This has been a 
years-long effort by a bunch of people, and we’re really excited to be able to 
share the Cassandra Analytics library with the community and work together to 
continue improving it.

Doug Rohrer

> On May 6, 2023, at 1:52 PM, Dinesh Joshi  wrote:
> 
> +1
> 
>> On May 4, 2023, at 9:46 AM, Doug Rohrer  wrote:
>> 
>> Hello all,
>> 
>> I’d like to put CEP-28 to a vote.
>> 
>> Proposal:
>> 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics
>> 
>> Jira:
>> https://issues.apache.org/jira/browse/CASSANDRA-16222
>> 
>> Draft implementation:
>> 
>> - Apache Cassandra Spark Analytics source code: 
>> https://github.com/frankgh/cassandra-analytics
>> - Changes required for Sidecar: 
>> https://github.com/frankgh/cassandra-sidecar/tree/CEP-28-bulk-apis
>> 
>> Discussion:
>> https://lists.apache.org/thread/lrww4d7cdxgtg8o3gt8b8foymzpvq7z3
>> 
>> The vote will be open for 72 hours. 
>> A vote passes if there are at least three binding +1s and no binding vetoes. 
>> 
>> 
>> Thanks,
>> 
>> Doug Rohrer
>> 
>> 
> 



Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-12 Thread Doug Rohrer
+1 (nb)

> On May 8, 2023, at 4:52 AM, Piotr Kołaczkowski  wrote:
> 
> Let's vote.
> 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator
> 
> Piotr Kołaczkowski
> e. pkola...@datastax.com
> w. www.datastax.com



Re: [VOTE] Release dtest-api 0.0.14

2023-05-15 Thread Doug Rohrer
+1 (nb)

Doug Rohrer

> On May 15, 2023, at 7:17 PM, Brandon Williams  wrote:
> 
> +1
> 
> Kind Regards,
> Brandon
> 
>> On Mon, May 15, 2023 at 5:12 PM Dinesh Joshi  wrote:
>> 
>> Proposing the test build of in-jvm dtest API 0.0.14 for release.
>> 
>> Repository:
>> https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git
>> 
>> Candidate SHA:
>> https://github.com/apache/cassandra-in-jvm-dtest-api/commit/ea4b44e0ed0a4f0bbe9b18fb40ad927b49a73a32
>> tagged with 0.0.14
>> 
>> Artifacts:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1289/org/apache/cassandra/dtest-api/0.0.14/
>> 
>> Key signature: 53371F9B1B425A336988B6A03B6042413D323470
>> 
>> Changes since last release:
>> 
>> * CASSANDRA-18511: Add support for JMX in jvm-dtest
>> 
>> The vote will be open for 24 hours. Everyone who has tested the build
>> is invited to vote. Votes by PMC members are considered binding. A
>> vote passes if there are at least three binding +1s.


Re: [VOTE] Release dtest-api 0.0.15

2023-05-24 Thread Doug Rohrer
+1 (nb)

> On May 24, 2023, at 11:32 AM, Brandon Williams  wrote:
> 
> +1
> 
> Kind Regards,
> Brandon
> 
> On Wed, May 24, 2023 at 10:31 AM Dinesh Joshi  wrote:
>> 
>> Proposing the test build of in-jvm dtest API 0.0.15 for release.
>> 
>> Repository:
>> https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git
>> 
>> Candidate SHA:
>> https://github.com/apache/cassandra-in-jvm-dtest-api/commit/48af78d1d4b5f285d3dd4991afd4df3101e3983a
>> tagged with 0.0.15
>> 
>> Artifacts:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1290/org/apache/cassandra/dtest-api/0.0.15/
>> 
>> Key signature: 53371F9B1B425A336988B6A03B6042413D323470
>> 
>> Changes since last release:
>> 
>> * CASSANDRA-18537: Add JMX utility class to in-jvm dtest to ease
>> development of new tests using JMX
>> 
>> The vote will be open for 24 hours. Everyone who has tested the build
>> is invited to vote. Votes by PMC members are considered binding. A
>> vote passes if there are at least three binding +1s.



Re: [VOTE] Release dtest-api 0.0.16

2023-08-16 Thread Doug Rohrer
+1 (nb) - Thanks Dinesh!

Doug

> On Aug 16, 2023, at 5:34 PM, Dinesh Joshi  wrote:
> 
> Proposing the test build of in-jvm dtest API 0.0.16 for release.
> 
> Repository:
> https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git
> 
> Candidate SHA:
> https://github.com/apache/cassandra-in-jvm-dtest-api/commit/1ba6ef93d0721741b5f6d6d72cba3da03fe78438
> tagged with 0.0.16
> 
> Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1307/org/apache/cassandra/dtest-api/0.0.16/
> 
> Key signature: 53371F9B1B425A336988B6A03B6042413D323470
> 
> Changes since last release:
> 
> * CASSANDRA-18727 - JMXUtil.getJmxConnector should retry connection attempts
> 
> The vote will be open for 24 hours. Everyone who has tested the build
> is invited to vote. Votes by PMC members are considered binding. A
> vote passes if there are at least three binding +1s.
> 



Re: [DISCUSS] CASSANDRA-18743 Deprecation of metrics-reporter-config

2023-08-16 Thread Doug Rohrer
My only concern about removal in 5.1 would be that removing it in a “minor” 
release would really be a breaking change, and semver says that should happen 
in a major version.

If we really want to be semver compliant, it shouldn’t be removed until 6.0 
(or, if we remove it in the next release, we should call that 6.0, but that 
conflicts with the idea of a “yearly major” so I’m not sure where we land at 
the end of the day).

Doug

> On Aug 16, 2023, at 4:14 PM, Abe Ratnofsky  wrote:
> 
> There's consensus here to deprecate metrics-reporter-config in 5.0.
> 
> Is there any objection to removing it in 5.1?
> 
>> On Aug 11, 2023, at 10:01 AM, Maxim Muzafarov  wrote:
>> 
>> +1
>> 
>> The rationale for deprecating/removing this library is not just that
>> it is obsolete and doesn't get updates. In fact, when the
>> metrics-reporter-config [1] was added the dropwizard metrics library
>> (formerly com.yammer.metrics [2]) didn't support exporting metrics to
>> files like csv, so it made sense at that time. Now it is fully covered
>> by the drowpwizrd reporters [3], so users can achieve the same
>> behaviour without the need for metrics-reporter-config. And that's why
>> I have a lot of doubts about it being used by anyone, but deprecation
>> is friendlier because there's no rush to remove it. :-)
>> 
>> 
>> [1] https://issues.apache.org/jira/browse/CASSANDRA-4430
>> [2] https://issues.apache.org/jira/browse/CASSANDRA-5838
>> [3] https://metrics.dropwizard.io/4.2.0/getting-started.html#other-reporting
>> 
>> On Fri, 11 Aug 2023 at 16:50, Caleb Rackliffe  
>> wrote:
>>> 
>>> +1
>>> 
 On Aug 11, 2023, at 8:10 AM, Brandon Williams  wrote:
 
 +1
 
 Kind Regards,
 Brandon
 
> On Fri, Aug 11, 2023 at 8:08 AM Ekaterina Dimitrova
>  wrote:
> 
> 
> “ The rationale for this proposed deprecation is that the upcoming 5.0 
> release is a good time to evaluate dependencies that are no longer 
> receiving updates and will become risks in the future.”
> 
> Thank you for raising it, I support your proposal for deprecation
> 
>> On Fri, 11 Aug 2023 at 8:55, Abe Ratnofsky  wrote:
>> 
>> Hey folks,
>> 
>> Opening a thread to get input on a proposed dependency deprecation in 
>> 5.0: metrics-reporter-config has been archived for 3 years and not 
>> updated in nearly 6 years.
>> 
>> This project has a minor security issue with its usage of unsafe YAML 
>> loading via snakeyaml’s unprotected Constructor: 
>> https://nvd.nist.gov/vuln/detail/CVE-2022-1471
>> 
>> This CVE is reasonable to suppress, since operators should be able to 
>> trust their YAML configuration files.
>> 
>> The rationale for this proposed deprecation is that the upcoming 5.0 
>> release is a good time to evaluate dependencies that are no longer 
>> receiving updates and will become risks in the future.
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-18743
>> 
>> —
>> Abe
>> 
> 



Re: [Discuss] Enabling JMX in in-jvm dtests (by default)

2023-08-25 Thread Doug Rohrer
I’d agree that anywhere we’re calling `nodetoolResult` or `nodetool` in a test, 
it would be better to enable JMX and use it rather than the older mocks we set 
up to enable calling the mbeans directly. I don’t think enabling JMX by default 
is the right way to go mostly due to the added resources/time required to run 
the tests (it’s only a few seconds of additional startup/shutdown time, but 
when running lots of tests every second counts).  Also, all other features are 
only enabled when requested, so making JMX on by default would require us to 
change the general pattern and have a `without` method to turn off a feature?

Better, I think, just to require it to be explicitly turned on and then have 
the methods that call into nodetool on Instance just throw a clear exception if 
jmx is disabled.

Doug

> On Aug 25, 2023, at 6:35 AM, Brandon Williams  wrote:
> 
> I would prefer to have one standard way to do it, and given the
> options I would prefer it be proper JMX instead of mocking.
> 
> Kind Regards,
> Brandon
> 
> On Fri, Aug 25, 2023 at 4:20 AM Miklosovic, Stefan
>  wrote:
>> 
>> Hi list,
>> 
>> I want to gather a feedback for this comment (1).
>> 
>> Long story short, until JMX feature was introduced, we kind of hacked / 
>> mocked the calls to MBeans from IInstance, like this (2). If you notice, 
>> there is a lot of methods throwing UnsupportedOperationException because we 
>> had no proper JMX connection in place. That in turn means that tests which 
>> call nodetool commands which are using these MBeans / operations are not 
>> possible.
>> 
>> The fix I made in CASSANDRA-18572 will use JMX feature and it will hook 
>> nodetool to a proper JMX connection where we are not mocking anything etc 
>> ... It will use same stuff as in production.
>> 
>> However, this is happening only if one uses JMX feature. So all existing 
>> tests calling nodetool without this feature will still use it like it was. 
>> The patch I made takes care of both scenarios.
>> 
>> My question is if we should not make JMX feature turned on by default. That 
>> way we might further simplify the code base and get rid of the hacks.
>> 
>> Another possibility is to not turn it on by default but we would add JMX 
>> feature to each test which is using nodetool. That would also mean that any 
>> future test which will use nodetool will fail if it does not have JMX 
>> feature enabled.
>> 
>> What would you like to see - dual solution (proper JMX connection if such 
>> feature is used as well as the legacy way) or only one solution with a 
>> proper JMX? (enabled by default or not).
>> 
>> Regards
>> 
>> (1) 
>> https://issues.apache.org/jira/browse/CASSANDRA-18572?focusedCommentId=17758920&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17758920
>> (2) 
>> https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/mock/nodetool/InternalNodeProbe.java



Re: [VOTE] Accept java-driver

2023-10-03 Thread Doug Rohrer
+1 (nb)

> On Oct 3, 2023, at 10:37 AM, C. Scott Andreas  wrote:
> 
> +1 (nb)
> 
> Accepting this donation would mark a huge milestone for the project.
> 
>> On Oct 3, 2023, at 4:25 AM, Josh McKenzie  wrote:
>> 
>> 
>>> I see now this will likely be instead apache/cassandra-java-driver
>> I was wondering about that. apache/java-driver seemed pretty broad. :)
>> 
>> From the linked page:
>> Check that all active committers have a signed CLA on record. TODO – attach 
>> list
>> I've been part of these discussions and work so am familiar with the status 
>> of it (as well as guidance and clearance from the foundation re: folks we 
>> couldn't reach) - but might be worthwhile to link to the sheet or perhaps 
>> instead provide a summary of the 49 java contributors, their CLA signing 
>> status, attempts to reach out, etc for other PMC members that weren't 
>> actively involved back when we were working through it.
>> 
>> As for my vote: +1
>> 
>> Thanks everyone for the hard work getting to this point. This really is a 
>> significant contribution to the project.
>> 
>> On Tue, Oct 3, 2023, at 6:48 AM, Brandon Williams wrote:
>>> +1
>>> 
>>> Kind Regards,
>>> Brandon
>>> 
>>> On Mon, Oct 2, 2023 at 11:53 PM Mick Semb Wever >> > wrote:
>>> >
>>> > The donation of the java-driver is ready for its IP Clearance vote.
>>> > https://incubator.apache.org/ip-clearance/cassandra-java-driver.html
>>> >
>>> > The SGA has been sent to the ASF.  This does not require acknowledgement 
>>> > before the vote.
>>> >
>>> > Once the vote passes, and the SGA has been filed by the ASF Secretary, we 
>>> > will request ASF Infra to move the datastax/java-driver as-is to 
>>> > apache/java-driver
>>> >
>>> > This means all branches and tags, with all their history, will be kept.  
>>> > A cleaning effort has already cleaned up anything deemed not needed.
>>> >
>>> > Background for the donation is found in CEP-8: 
>>> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
>>> >
>>> > PMC members, please take note of (and check) the IP Clearance 
>>> > requirements when voting.
>>> >
>>> > The vote will be open for 72 hours (or longer). Votes by PMC members are 
>>> > considered binding. A vote passes if there are at least three binding +1s 
>>> > and no -1's.
>>> >
>>> > regards,
>>> > Mick
>>> 
>> 
> 
> 



Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-06 Thread Doug Rohrer
+1 on reason string, especially some way to indicate what replaces a method if 
it’s being moved into some other class/new method with more parameters/etc. 
I’ve found lots of cases (in code bases in general, not C* in particular) where 
something is marked as Deprecated but there’s no mention of a replacement even 
when there is one.

As someone who has spent a bunch of time using parts of Cassandra as a library, 
this would be hugely beneficial, but it would also clearly be useful for 
maintainers of the core codebase.

Doug

> On Oct 6, 2023, at 7:49 AM, Josh McKenzie  wrote:
> 
> Might be nice to support a 3rd param that's a String for the reason it's 
> deprecated. i.e. "Replaced by X",  "Unmaintained", "Obsolete", "See 
> CASSANDRA-N", link to a dev ML thread on pony mail, etc. That way if 
> someone comes across it in the codebase they have some context to follow up 
> on if it's the shape of a thing they need w/out having to go full-bore w/git 
> blame and JQL.
> 
> On Fri, Oct 6, 2023, at 4:43 AM, Miklosovic, Stefan wrote:
>> Hi list,
>> 
>> I have a ticket to discuss (1). 
>> 
>> When we deprecate APIs / methods etc, what I want to suggest is that we 
>> might start to explicitly add the version when that happened. For example, 
>> if you deprecated something which goes to 5.0, would you be so nice to do 
>> this?
>> 
>> @Deprecated(since = "5.0") 
>> 
>> Similarly, that annotation offers one more field - forRemoval, so using it 
>> like this: 
>> 
>> @Deprecated(since = "5.0", forRemoval = true) 
>> 
>> means that this is eligible to be deleted in Cassandra 6.0. 
>> 
>> With this information, it is way more comfortable to just "grep" where we 
>> are at when it comes to deprecations eligible to be deleted in the next 
>> version. Currently, we basically have to go one by one and figure out if it 
>> is not old enough to remove. I believe this would bring more transparency 
>> into what is planned to be removed and when as well it will be clearly 
>> visible what should be removed in the next version and it is not. 
>> 
>> Tangential question to this is if everything we deprecated is eligible for 
>> removal? In other words, are there any cases when forRemoval would be false? 
>> Could you elaborate on that and give such examples or do you all think that 
>> everything which is deprecated will be eventually removed?
>> 
>> (1) https://issues.apache.org/jira/browse/CASSANDRA-18912
>> 
>> Thanks and regards



Re: CASSANDRA-18941 produce size bounded SSTables from CQLSSTableWriter

2023-10-25 Thread Doug Rohrer
+1 (nb) - wiłl be nice for the analytics writer to be able to size SSTables 
appropriately and efficiently.

Doug

> On Oct 24, 2023, at 10:36 PM, guo Maxwell  wrote:
> 
> 😄
> 
> Chris Lohfink mailto:clohfin...@gmail.com>> 
> 于2023年10月25日周三 05:02写道:
>> +1
>> 
>> On Tue, Oct 24, 2023 at 11:24 AM Brandon Williams > > wrote:
>>> +1
>>> 
>>> Kind Regards,
>>> Brandon
>>> 
>>> On Mon, Oct 23, 2023 at 6:22 PM Yifan Cai >> > wrote:
>>> >
>>> > Hi,
>>> >
>>> > I want to propose merging the patch in CASSANDRA-18941 to 4.0 and up to 
>>> > trunk and hope we are all OK with it.
>>> >
>>> > In CASSANDRA-18941, I am adding the capability to produce size-bounded 
>>> > SSTables in CQLSSTableWriter for sorted data. It can greatly benefit 
>>> > Cassandra Analytics (https://github.com/apache/cassandra-analytics) for 
>>> > bulk writing SSTables, since it avoids buffering and sorting on flush, 
>>> > given the data source is sorted already in the bulk write process. 
>>> > Cassandra Analytics supports Cassandra 4.0 and depends on the 
>>> > cassandra-all 4.0.x library. Therefore, we are mostly interested in using 
>>> > the new capability in 4.0.
>>> >
>>> > CQLSSTableWriter is only used in offline tools and never in the code path 
>>> > of Cassandra server.
>>> >
>>> > Any objections to merging the patch to 4.0 and up to trunk?
>>> >
>>> > - Yifan



Re: [DISCUSS] CASSANDRA-19113: Publishing dtest-shaded JARs on release

2023-11-28 Thread Doug Rohrer
+1 (nb, but not a vote, so ¯\_(ツ)_/¯ ) - would be lovely to not have to deal 
with this individually for each project in which we use the in-jvm dtest 
framework. As Francisco noted, we’re using this in the sidecar and Analytics 
projects now and I’ve had to jump through a lot of hoops to get everything 
building consistently.

I’ve got some minor modifications to the way in which the existing shading 
works that I can contribute back to the core Cassandra project (mostly, a few 
additional relocations and not using the user’s default Maven cache as the 
temporary installation location as it was difficult to make sure you had the 
correct dtest jar with a bunch of them in the `.m2` directory).

Doug

> On Nov 28, 2023, at 2:51 PM, Josh McKenzie  wrote:
> 
> Building these jars every time we run every CI job is just silly.
> 
> +1.
> 
> On Tue, Nov 28, 2023, at 2:08 PM, Francisco Guerrero wrote:
>> Hi Abe,
>> 
>> I'm +1 on this. Several Cassandra-ecosystem projects build the dtest jar in 
>> CI. We'd very
>> much prefer to just consumed shaded dtest jars from Cassandra releases for 
>> testing
>> purposes.
>> 
>> Best,
>> - Francisco
>> 
>> On 2023/11/28 19:02:17 Abe Ratnofsky wrote:
>> > Hey folks - wanted to raise a separate thread to discuss publishing of 
>> > dtest-shaded JARs on release.
>> > 
>> > Currently, adjacent projects that want to use the jvm-dtest framework need 
>> > to build the shaded JARs themselves. This is a decent amount of work, and 
>> > is duplicated across each project. This is mainly relevant for projects 
>> > like Sidecar and Driver. Currently, those projects need to clone and build 
>> > apache/cassandra themselves, run ant dtest-jar, and move the JAR into the 
>> > appropriate place. Different build systems treat local JARs differently, 
>> > and the whole process can be a bit complicated. Would be great to be able 
>> > to treat these as normal dependencies.
>> > 
>> > https://issues.apache.org/jira/browse/CASSANDRA-19113
>> > 
>> > Any objections?
>> > 
>> > --
>> > Abe



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Doug Rohrer
To me, the difference between system-level guardrails and table-level 
constraints is the difference between operational concerns (guardrails) and 
business concerns (table-level constraints). The two things are only related to 
one another because they both may limit the value of a field in some way, and 
there are some limited interactions between the two, but otherwise are 
essentially unrelated, and both have independent value.

I absolutely agree that trying to make configuration somehow 
transactional/cluster-wide vs. depending on operators to properly configure 
yaml files on each node is a useful feature. It is, however, a much broader 
conversation than just guardrails / constraints, and I don’t think the lack of 
a solution to the “operator misconfigured node X and it has different settings 
than node Y" in any way decreases the value of a table-level constraint system 
for enforcing use-case-specific constraints on the data in a table.

The CEP does say:
Interaction with Guardrails Framework
As we mentioned in the motivation section, we currently have some guardrails 
for columns size in place which can be extended for other data types.
Those guardrails will take preference over the defined constraints in the 
schema, and a SCHEMA ALTER adding constraints that break the limits defined by 
the guardrails framework will fail.
If the guardrails themselves are modified, operator should get a warning 
mentioning that there are schemas with offending constraints.

Other than throwing warnings in the logs, I’m not sure how exactly you’d warn 
the operator that there are schemas w/ offending constraints… but I suppose 
that would be enough. Given they are settable via JMX, I suppose any time you 
set one of them it’ll have to scan every constraint definition to make sure it 
doesn’t somehow violate the new guardrail value, which may require some 
additional interaction between the two systems, but again, it seems like this 
would be unrelated to where the configuration comes from, and we should be able 
to isolate it in the initial CEP-42 implementation.

In summary, I'm not seeing how the new constraint framework would require 
significant changes if the guardrails system, and configuration more generally, 
was rewritten to somehow provide a consistent view of the configuration across 
the cluster. In fact, the implementation of Guardrails, already has a 
“configuration provider” that, by default, happens to wrap the Yaml, but 
otherwise could pull configuration from other sources, so it’s already fairly 
insulated from configuration storage, which should make changing the underlying 
storage to something cluster-wide a fairly isolated change.


Doug

> On Jun 6, 2024, at 2:12 PM, Štefan Miklošovič  
> wrote:
> 
> OK so let's modify that example like this:
> 
> T0 - a node is started with no guardrails set
> T1 - guardrail is set via JMX to not allow anything bigger than size of 10 
> (whatever size means)
> T2 - a user creates a table with a constraint that anything bigger than size 
> of 8 is forbidden
> T3 - a user inserts a mutation with size of 7
> T4 - node is restarted and guardrail in cassandra.yaml is set to forbid sizes 
> bigger than 5
> T5 - mutation with size of 7 is replayed from FQL and it will fail to replay 
> it because of "global guardrail" in yaml
> 
> In general, the problem I see with this CEP is that I feel like we clearly 
> see that it is a little bit hairy around the configuration and it _can_ be 
> broken or misconfigured etc but the feedback I see is that "yeah but ... it 
> is possible to break it already, so what?"
> 
> I do not follow this logic. If we see that it "leaks", why is the leakage an 
> excuse to put more features on top of that? Should not we fix the leakage in 
> the first place? Why is that an excuse? I don't get that ... It is like "yeah 
> it is broken so by putting more stuff on top of that it can't be worse".
> 
> What if we focused our effort to make configuration transactional etc or at 
> least tried to fix this problem so it does not happen? If we do not do that 
> before we introduce this, then we will have more work to do once we go to 
> address that but it might be probably too late because we will need to live 
> with all our decisions made earlier, whatever ineffective they might be.
> 
> 
> 
> On Thu, Jun 6, 2024 at 7:33 PM Yifan Cai  > wrote:
>> Hi Stefan, 
>> 
>> Thanks for putting the FQL example! However, it seems to be incorrect. FQL 
>> only records the _successful_ queries. The query at T4 fails, and it will 
>> not be included in FQL log. 
>> I do agree that changing guardrails on the fly can cause confusion when FQL 
>> is enabled on the node. Operator should probably avoid doing so. But it 
>> seems unrelated with contraints. Besides, there are value size guardrails, 
>> i.e. columnValueSize and collectionSize, available in Cassandra already. 
>> 
>> On extensibility, I agree that the CEP should make it cle

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-07 Thread Doug Rohrer
There’s a difference between the two though. Constraints are part of the table 
schema, and (independent of the interaction with Guardrails), have no 
dependency on yaml files being perfectly in sync across the cluster. Therefore, 
the feature (Constraints) on its own doesn’t depend on configuration files to 
be correct in its own right. The only place where this isn’t true is it’s 
interaction with Guardrails, which happen to be yaml-file based and cause 
issues. 

CEP-24’s password length requirements, however, is intended to be implemented 
by adding a new guardrail, which is totally dependent on YAML files today (and 
thus the concerns around a single misconfigured server allowing someone to use 
an insecure password). If CEP-24 fixes guardrails’ dependence on yaml files, it 
would also fix the problematic interaction between guardrails and constraints.

I agree that it would be incredibly valuable to find a solution to the “yaml 
files need to be correct everywhere or something breaks” problem, and I think 
CEP-24, being security-focused, is more likely to be problematic without a 
solution to this issue. That said, I think Dinesh is right in that, at the end 
of the day, CEP-24 could be implemented without fixing the yaml config issue.

I do wonder if the “Guardrails should be transactional” should really be 
“configuration should be transactional”, or at least as much config as possible 
should be, but that would blow up CEP-24 fairly dramatically (maybe?). Maybe 
“cluster-wide configuration should be read from a distributed source on 
startup/joining the cluster” or something would make sense, so the yaml file 
works as the source of truth on startup, but as soon as possible it’s read from 
a TCM-backed data source, and anything the node can get from other nodes it 
would… but now I’m designing a different CEP in a discuss thread, which is 
probably a bad idea...

Regardless, I hope that I’m explaining why I see a difference between 
constraints and guardrails, and why I think it makes sense that constraints can 
move forward without a solution the misconfiguration problem where I also think 
you were right in calling it out in CEP-24 (even if we eventually move forward 
on CEP-24 without the solution in place).

Doug



> On Jun 7, 2024, at 1:51 AM, Dinesh Joshi  wrote:
> 
> On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič  > wrote:
>> It is interesting to see this feedback. When I look at CEP-24 where I am 
>> obsessing about a user being able to misconfigure the password validation 
>> strength so if a user hits a "weak" node then she would be able to bypass 
>> it, and I see what is our approach here, then I am not sure what I was 
>> waiting so long for and I should probably be just more aggressive with the 
>> CEP and all the "caveats" could be just overlooked and deferred to 
>> "sometimes later".
> 
> Stefan, unfortunately I didn't participate in the CEP-24 DISCUSS thread. Had 
> I paid attention I would have suggested waiting on TCM doesn't make the 
> feature any different. The feature is less likely to be misconfigured in a 
> cluster. CEP-24 is valuable and password compliance with policies is a super 
> useful feature which IMO shouldn't have been held back due to lack of TCM.
>  



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-24 Thread Doug Rohrer
To your point about Guardrails vs. Constraints, I do think the distinct roles 
of “cluster operator” and “application developer” help show how these two 
frameworks are both valuable. I don’t think I’d expect a cluster operator to be 
involved in every table design decision, but being able to set warning and 
error-level guardrails allows an operator to set absolute limits on what the 
database itself accepts. Table-level constraints allow application developers 
(hopefully in concert with operators, where they are two distinct 
people/groups) to add additional, application-layer constraints that are likely 
to be app specific. To restate what I think you were getting at, your example 
of a production issue caused by the development team missing a key verbal 
agreement probably helps illustrate why both table-level constraints and 
guardrails are valuable. 

Imagine that, as an operator, you are generally comfortable with individual 
values in rows being, say, 256k, but because of the way in which this 
particular use case works, 64k chunks needed to be enforced. Your cluster-level 
guardrails could be set at 256k, but the table-level constraints could have 
enforced this 64k chunk size rule.

Doug

> On Jun 23, 2024, at 5:38 PM, Jordan West  wrote:
> 
> I am generally for this CEP, particularly the sizeOf guardrail. For example, 
> we recently had an incident caused by a client who wrote outside of the 
> contract we had verbally established. The constraint would have let us encode 
> that contract into the database. In this case, clients are writing large 
> blobs at the application layer and internally the client performs chunking.  
> We had established a chunk size of 64k, for example. However, the application 
> team wanted to use a different programming language than the ones we provide 
> clients for so they wrote their own. The new client had a bug that did not 
> honor the agreed upon chunk size and wrote chunks that were MBs in size. This 
> eventually led to a production incident and the issue was discovered as a 
> result of a bunch of analysis (dumping sstables, etc). Had we had the sizeOf 
> guardrail it would have turned a production incident with hours of 
> investigation into a bug found immediately during development. Could this be 
> done with a node-level guardrail? Likely. But config has the issues described 
> above and its possible to have two tables with different constraints around 
> similar fields (for example, two different chunk size configs due to data 
> shape). Could it be done at the client layer? Yes that's what we are doing 
> now, but this incident highlights the weakness with that approach (having to 
> implement the contract everywhere and having disjoint features across 
> clients).
>  
> I also think there is benefit to application owners. Encoding constraints in 
> the database ensures continuity as ownership and contributors change and 
> reduces the need for comments or documentation as the means to enforce or 
> share this knowledge. 
> 
> I think enforcing them at write time makes sense. Thinking about it in the 
> scope of compaction for example reminds me of a data loss incident where 
> someone ran a validation in an older version (like 2.0 or 2.1) and a bunch of 
> 4 byte ints were thrown away because the field expected an 8 byte long. 
> 
> My primary concern would be ensuring that we don't implement constraints that 
> require a read before right (not inList comes to mind as an example of one 
> that could imply reading before writing and could confuse a user if it 
> doesn't). 
> 
> Regarding the conflict with existing guardrails, I do think that is tougher. 
> On one hand I find this feature to be more evolved than those guardrails and 
> would be fine to see them be replaced by it. On the other, the guardrails 
> provide sole control to the operator which is nice but adds some complexity 
> that has been rightly called out.  But I don't see that as a reason not to go 
> forward with this feature. We should pick a path and accept the tradeoffs. 
>   
> Jordan
> 
> 
> On Thu, Jun 13, 2024 at 2:39 PM Bernardo Botella 
> mailto:conta...@bernardobotella.com>> wrote:
>> Thanks a lot for your comments Abe!
>> 
>> I do agree that the Constraint clause should be as simple as possible. I 
>> will add a note on the CEP along with some specifics about the proposed 
>> constraints (removing the ones that are contentious, and adding them to a 
>> possible future additions section). And yeah, I also think that these 
>> constraints will help different Cassandra operating paradigms (multi-tenant 
>> clusters and diverse workflows).
>> 
>> Besides that, I hope that I’ve addressed all the potential concerns and 
>> feedback on the thread. Let’s let a bit more time for others to chime in 
>> (any further feedback will be more than welcome), but I’d like to move 
>> forward with a voting soon if no other concerns are pointed out.
>> 
>> All and all, thanks a lot to everyo

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Doug Rohrer
On the Analytics side, as long as the CQLSSTableWriter understands and enforces 
the constraints (which it should be able to , given we provide the table 
schema) we should be good to go. We should try hard to avoid scanning the data 
on import, as the Analytics library does a bunch of things to push that kind of 
logic and CPU + I/O work off to the Spark executors that write the sstables, 
and reading the whole SSTable on import can drastically slow down that process.

I agree warning the users in docs that we don’t scan the existing data for data 
that violates constraints if the table wasn’t create with them is important, 
but I don’t think it would be feasible to do scan-on-DDL change.

Could we only support collection-level constraints on frozen lists/sets/maps, 
as that way the end user would have to be aware of the current size of the 
collection?

Doug

> On Jun 25, 2024, at 2:27 PM, Abe Ratnofsky  wrote:
> 
> If we're going to introduce a feature that looks like SQL constraints, we 
> should make sure it's "reasonably" compliant. In particular, we should avoid 
> situations where a user creates a constraint, writes some data, then reads 
> data that violates that constraint, unless they've expressed that violations 
> on read would be acceptable.
> 
> For Postgres, when adding a new constraint you can specify NOT VALID to avoid 
> scanning all existing relevant data[1]. If we want to avoid scan-on-DDL, this 
> tradeoff needs to be made clear to a user.
> 
> As we've already discussed, constraints must deal with operations that appear 
> within limits on the write path, but once reconciled on read or during 
> compaction can lead to a violation. Adding to non-frozen collections is one 
> example. Expecting users to understand the write path for collections feels 
> unrealistic to me; I wonder if we should express in the constraint itself 
> that it only applies during write.
> 
> Anything that uses "nodetool import" (including cassandra-analytics) could 
> theoretically push constraint-violating mutations to a table. We could update 
> import to scan table contents first, or add a flag to trust the data in 
> imported SSTables and make cassandra-analytics executors aware of table-level 
> constraints.
> 
> Some client implementations read the system_schema tables to build their 
> object mappers, I'd like to confirm that nothing will require clients to be 
> aware of these new schema constructs.
> 
> Overall, I'm supportive of the distinctions discussed between constraints and 
> guardrails and like the direction this is heading; I'd just like to make sure 
> the more detailed semantics aren't confusing or misleading for our users, and 
> semantics are much harder to change in the future.
> 
> [1]: https://www.postgresql.org/docs/current/sql-altertable.html
> 



Re: [VOTE] CEP-42: Constraints Framework

2024-07-01 Thread Doug Rohrer
+1 (nb) - Thanks for all of the suggestions and Bernardo for wrangling the CEP 
into shape!

Doug

> On Jul 1, 2024, at 3:06 PM, Dinesh Joshi  wrote:
> 
> +1
> 
> On Mon, Jul 1, 2024 at 11:58 AM Ariel Weisberg  > wrote:
>> Hi,
>> 
>> I am +1 on CEP-42 with the latest updates to the CEP to clarify syntax, 
>> error messages, constraint naming and generated naming, alter/drop, describe 
>> etc.
>> 
>> I think this now tracks very closely to how other SQL databases define 
>> constraints and the syntax is easily extensible to multi-column and 
>> multi-table constraints.
>> 
>> Ariel
>> 
>> On Mon, Jul 1, 2024, at 9:48 AM, Bernardo Botella wrote:
>>> With all the feedback that came in the discussion thread after the call for 
>>> votes, I’d like to extend the period another 72 hours starting today.
>>> 
>>> As before, a vote passes if there are at least 3 binding +1s and no binding 
>>> vetoes.
>>> 
>>> Thanks,
>>> Bernardo Botella
>>> 
 On Jun 24, 2024, at 7:17 AM, Bernardo Botella 
 mailto:conta...@bernardobotella.com>> wrote:
 
 Hi everyone,
 
 I would like to start the voting for CEP-42.
 
 Proposal: 
 https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework
 Discussion: 
 https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj
 
 The vote will be open for 72 hours. A vote passes if there are at least 3 
 binding +1s and no binding vetoes.
 
 Thanks,
 Bernardo Botella
>> 



Re: [VOTE] Backport CASSANDRA-19800 to Cassandra-4.0, 4.1 and 5.0

2024-08-05 Thread Doug Rohrer
+1 (nb)

> On Aug 4, 2024, at 2:18 AM, Yifan Cai  wrote:
> 
> Hi,
> 
> I am proposing backporting CASSANDRA-19800 to Cassandra-4.0, 4.1 and 5.0. 
> 
> There is a discussion thread 
>  on the 
> topic. In summary, the backport would benefit Cassandra Analytics by 
> providing a unified solution, and the patch is considered low-risk. While 
> there are concerns about adding features to 4.0 and 4.1, there is generally 
> support for 5.0.
> 
> The vote will be open for 72 hours (longer if needed). Votes by PMC members 
> are considered binding. A vote passes if there are at least three binding +1s 
> and no -1's.
> 
> Kind regards,
> Yifan Cai



Re: Welcome Doug Rohrer as Cassandra Committer

2024-08-23 Thread Doug Rohrer
Thanks Dinesh (and everyone else!).

Doug

> On Aug 23, 2024, at 2:55 PM, Dinesh Joshi  wrote:
> 
> The Apache Cassandra PMC is thrilled to announce that Doug Rohrer has
> accepted the invitation to become a committer!
> 
> Doug has worked on several aspects of Cassandra, Sidecar, and
> Analytics. Congratulations and welcome!
> 
> The Apache Cassandra PMC members



Re: Welcome Jordan West and Stefan Miklosovic as Cassandra PMC members!

2024-09-03 Thread Doug Rohrer
Congrats folks - well deserved.

Doug

> On Aug 30, 2024, at 4:18 PM, Jon Haddad  wrote:
> 
> The PMC's members are pleased to announce that Jordan West and Stefan 
> Miklosovic have accepted invitations to become PMC members.
> 
> Thanks a lot, Jordan and Stefan, for everything you have done for the project 
> all these years.
> 
> Congratulations and welcome!!
> 
> The Apache Cassandra PMC



Re: [VOTE] Release dtest-api 0.0.17

2024-09-11 Thread Doug Rohrer
As it stands, the vote passes with 7 binding and 3 non-binding +1s. Dinesh 
pointed out that we usually allow 72 hours for votes (and the Apache Release 
Policy says that the “SHOULD” last 72 hours), but all previous dtest-api votes 
I saw were only 24, which is why I left this one at 24 as well.

Given the standard is 72 hours (even though it is a “SHOULD” not a “MUST”), I’m 
inclined to extend this vote for another 48 hours (which would mean it would 
close Friday of this week). Any objections? Also, do we have any historical 
reason why these are only 24 hours vs. 72?

Thanks,

Doug

> On Sep 10, 2024, at 5:16 PM, Doug Rohrer  wrote:
> 
> It was pointed out that auto-correct removed the “d” from “dtest” so just 
> responding to this with a correct title in case “test-api” wasn’t clear. Vote 
> will still close at the initial 24-hour period as I don’t think this has 
> prevented folks from finding and voting so far.
> 
> Doug
> 
>> On Sep 10, 2024, at 1:33 PM, Francisco Guerrero  wrote:
>> 
>> +1 (nb)
>> 
>> On 2024/09/10 14:34:48 Doug Rohrer wrote:
>>> Proposing the test build of in-jvm dtest API 0.0.17 for release
>>> 
>>> Repository:
>>> https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git
>>> 
>>> Candidate SHA:
>>> https://github.com/apache/cassandra-in-jvm-dtest-api/commit/85b538ca8259dedc2aded8a633cf3174f551f664
>>> Tagged with 0.0.17
>>> 
>>> Artifacts:
>>> https://repository.apache.org/content/repositories/orgapachecassandra-1343/org/apache/cassandra/dtest-api/0.0.17/
>>> 
>>> Key signature: 9A648E3DEDA36EE374C4277B602ED2C52277
>>> 
>>> Changes since last release:
>>> * CASSANDRA-19783 - In-jvm dtest to detect InstanceClassLoaderLeaks
>>> * CASSANDRA-19239 - jvm-dtests crash on java 17
>>> 
>>> The vote will be open for 24 hours. Everyone who has tested the build
>>> is invited to vote. Votes by PMC members are considered binding. A
>>> vote passes if there are at least three binding +1s.
> 



Re: [DISCUSS] CASSANDRA-13704 Safer handling of out of range tokens

2024-09-12 Thread Doug Rohrer
+1 on rejection-by-default, for several reasons:

1) Jordan’s point on the fact that recovery from this kind of data misplacement 
is very difficult.
2) Without any sort of warning or error in existing Cassandra installations, 
how many operators/users would actually know that they have been hit by this 
particular issue in the past? My guess is that the folks who have actually 
identified instances of this kind of data loss are ones who have a significant 
amount of experience running Cassandra and a team that has the ability to track 
down these kinds of issues, where most users may have never even known this was 
happening.

If we only warn/log, most folks will likely either not even see the issue (we 
log a lot) or not know what to do when it happens.

Doug

> On Sep 12, 2024, at 2:19 PM, Brandon Williams  wrote:
> 
> On Thu, Sep 12, 2024 at 1:13 PM Caleb Rackliffe
>  wrote:
>> 
>> I think I can count at least 4 people on this thread who literally have lost 
>> sleep over this.
> 
> Probably good examples of not being the majority though, heh.
> 
> If we are counting on users to read NEWS.txt, can we not count on them
> to enable rejection if this is important to them?
> 
> Kind Regards,
> Brandon



Re: [VOTE] Release dtest-api 0.0.17

2024-09-16 Thread Doug Rohrer
Vote passes with 6 binding a 4 non-binding (accidentally added myself to the 
“binding” count before).

Thanks all. I’ll get the release out ASAP.

Doug

> On Sep 11, 2024, at 5:32 PM, Doug Rohrer  wrote:
> 
> As it stands, the vote passes with 7 binding and 3 non-binding +1s. Dinesh 
> pointed out that we usually allow 72 hours for votes (and the Apache Release 
> Policy says that the “SHOULD” last 72 hours), but all previous dtest-api 
> votes I saw were only 24, which is why I left this one at 24 as well.
> 
> Given the standard is 72 hours (even though it is a “SHOULD” not a “MUST”), 
> I’m inclined to extend this vote for another 48 hours (which would mean it 
> would close Friday of this week). Any objections? Also, do we have any 
> historical reason why these are only 24 hours vs. 72?
> 
> Thanks,
> 
> Doug
> 
>> On Sep 10, 2024, at 5:16 PM, Doug Rohrer  wrote:
>> 
>> It was pointed out that auto-correct removed the “d” from “dtest” so just 
>> responding to this with a correct title in case “test-api” wasn’t clear. 
>> Vote will still close at the initial 24-hour period as I don’t think this 
>> has prevented folks from finding and voting so far.
>> 
>> Doug
>> 
>>> On Sep 10, 2024, at 1:33 PM, Francisco Guerrero  wrote:
>>> 
>>> +1 (nb)
>>> 
>>> On 2024/09/10 14:34:48 Doug Rohrer wrote:
>>>> Proposing the test build of in-jvm dtest API 0.0.17 for release
>>>> 
>>>> Repository:
>>>> https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git
>>>> 
>>>> Candidate SHA:
>>>> https://github.com/apache/cassandra-in-jvm-dtest-api/commit/85b538ca8259dedc2aded8a633cf3174f551f664
>>>> Tagged with 0.0.17
>>>> 
>>>> Artifacts:
>>>> https://repository.apache.org/content/repositories/orgapachecassandra-1343/org/apache/cassandra/dtest-api/0.0.17/
>>>> 
>>>> Key signature: 9A648E3DEDA36EE374C4277B602ED2C52277
>>>> 
>>>> Changes since last release:
>>>> * CASSANDRA-19783 - In-jvm dtest to detect InstanceClassLoaderLeaks
>>>> * CASSANDRA-19239 - jvm-dtests crash on java 17
>>>> 
>>>> The vote will be open for 24 hours. Everyone who has tested the build
>>>> is invited to vote. Votes by PMC members are considered binding. A
>>>> vote passes if there are at least three binding +1s.
>> 
> 



Re: [DISCUSS] Donating easy-cass-stress to the project

2024-10-08 Thread Doug Rohrer
Clarification - there would be some real value in donating easy-cass-stress (as 
the subject says), not lab… The demo was about easy-cass-lab, which uses 
easy-cass-stress.

Thanks,

Doug

> On Oct 8, 2024, at 1:51 PM, Doug Rohrer  wrote:
> 
> Hey folks,
> 
> I just wanted to resurface this conversation, especially after Jon and 
> Jordon’s talk at Community over Code this week. I think there would be some 
> real value in getting easy-cass-lab donated and part of the ecosystem.
> 
> To try to summarize:
> 
> - Jon would like to donate if his active development of the project isn’t 
> negatively affected.
> 
> - It seems a separate repo/subproject is the right way to go rather than 
> bringing it in-tree
> 
> - Several other folks have stepped up to be co-maintainers (thanks!)
> 
> - Some form of IP clearance would need to be done if this were to move 
> forward.
> 
> It seems the major concerns other than IP clearance were taken care of in the 
> thread. Is there an appetite to bring easy-case-stress into the Apache 
> umbrella and, if so, how would we move forward from here?
> 
> Doug Rohrer
> 
>> On May 3, 2024, at 1:16 PM, Alexander DEJANOVSKI  
>> wrote:
>> 
>> 
>> Hi folks,
>> 
>> I'm familiar with the codebase and can help with the maintenance and 
>> evolution.
>> I already have some additional profiles that I can push there which were 
>> never merged in the main branch of tlp-cluster.
>> 
>> I love this tool (I know I'm biased) and hope it gets the attention it 
>> deserves.
>> 
>> Le mar. 30 avr. 2024, 23:17, Jordan West > <mailto:jw...@apache.org>> a écrit :
>>> I would likely commit to it as well
>>> 
>>> Jordan 
>>> 
>>> On Mon, Apr 29, 2024 at 10:55 David Capwell >> <mailto:dcapw...@apple.com>> wrote:
>>>>> So: besides Jon, who in the community expects/desires to maintain this 
>>>>> going forward? 
>>>> 
>>>> I have been maintaining a fork for years, so don’t mind helping maintain 
>>>> this project.
>>>> 
>>>>> On Apr 28, 2024, at 4:08 AM, Mick Semb Wever >>>> <mailto:m...@apache.org>> wrote:
>>>>> 
>>>>>> A separate subproject like dtest and the Java driver would maybe help 
>>>>>> address concerns with introducing a gradle build system and Kotlin.
>>>>> 
>>>>> 
>>>>> 
>>>>> Nit, dtest is a separate repository, not a subproject.  The Java driver 
>>>>> is one repository to be in the Drivers subproject.  Esoteric maybe, but 
>>>>> ASF terminology we need to get right :-) 
>>>>> 
>>>>> To your actual point (IIUC), it can be a separate repository and not a 
>>>>> separate subproject.  This permits it to be kotlin+gradle, while not 
>>>>> having the formal subproject procedures.  It still needs 3 responsible 
>>>>> committers from the get-go to show sustainability.  Would 
>>>>> easy-cass-stress have releases, or always be a codebase users work 
>>>>> directly with ?
>>>>> 
>>>>> Can/Should we first demote cassandra-stress by moving it out to a 
>>>>> separate repo ? 
>>>>>  ( Can its imports work off non-snapshot dependencies ? )
>>>>> It might feel like an extra prerequisite step to introduce, but maybe it 
>>>>> helps move the needle forward and make this conversation a bit 
>>>>> easier/obvious.
>>>>> 
>>>> 



Re: [DISCUSS] Donating easy-cass-stress to the project

2024-10-08 Thread Doug Rohrer
Hey folks,

I just wanted to resurface this conversation, especially after Jon and Jordon’s 
awesome talk/“live demo" at Community over Code this week. I think there would 
be some real value in getting easy-cass-lab donated and part of the ecosystem.

To try to summarize (please correct me if I’m wrong on any point here):

- Jon would like to donate if his active development of the project isn’t 
negatively affected.

- It seems a separate repo/subproject is the right way to go rather than 
bringing it in-tree

- Several other folks have stepped up to be co-maintainers/committers (thanks!)

- Some form of IP clearance would need to be done if this were to move forward.

It seems the major concerns other than IP clearance were taken care of in the 
thread. Is there an appetite to bring easy-case-stress into the Apache umbrella 
and, if so, how would we move forward from here?

Doug Rohrer


> On May 3, 2024, at 1:14 PM, Alexander DEJANOVSKI  
> wrote:
> 
> Hi folks,
> 
> I'm familiar with the codebase and can help with the maintenance and 
> evolution.
> I already have some additional profiles that I can push there which were 
> never merged in the main branch of tlp-cluster.
> 
> I love this tool (I know I'm biased) and hope it gets the attention it 
> deserves.
> 
> Le mar. 30 avr. 2024, 23:17, Jordan West  <mailto:jw...@apache.org>> a écrit :
>> I would likely commit to it as well
>> 
>> Jordan 
>> 
>> On Mon, Apr 29, 2024 at 10:55 David Capwell > <mailto:dcapw...@apple.com>> wrote:
>>>> So: besides Jon, who in the community expects/desires to maintain this 
>>>> going forward? 
>>> 
>>> I have been maintaining a fork for years, so don’t mind helping maintain 
>>> this project.
>>> 
>>>> On Apr 28, 2024, at 4:08 AM, Mick Semb Wever >>> <mailto:m...@apache.org>> wrote:
>>>> 
>>>>> A separate subproject like dtest and the Java driver would maybe help 
>>>>> address concerns with introducing a gradle build system and Kotlin.
>>>> 
>>>> 
>>>> 
>>>> Nit, dtest is a separate repository, not a subproject.  The Java driver is 
>>>> one repository to be in the Drivers subproject.  Esoteric maybe, but ASF 
>>>> terminology we need to get right :-) 
>>>> 
>>>> To your actual point (IIUC), it can be a separate repository and not a 
>>>> separate subproject.  This permits it to be kotlin+gradle, while not 
>>>> having the formal subproject procedures.  It still needs 3 responsible 
>>>> committers from the get-go to show sustainability.  Would easy-cass-stress 
>>>> have releases, or always be a codebase users work directly with ?
>>>> 
>>>> Can/Should we first demote cassandra-stress by moving it out to a separate 
>>>> repo ? 
>>>>  ( Can its imports work off non-snapshot dependencies ? )
>>>> It might feel like an extra prerequisite step to introduce, but maybe it 
>>>> helps move the needle forward and make this conversation a bit 
>>>> easier/obvious.
>>>> 
>>> 



Re: [DISCUSS] Donating easy-cass-stress to the project

2024-10-08 Thread Doug Rohrer
Hey folks,I just wanted to resurface this conversation, especially after Jon and Jordon’s talk at Community over Code this week. I think there would be some real value in getting easy-cass-lab donated and part of the ecosystem.To try to summarize:- Jon would like to donate if his active development of the project isn’t negatively affected.- It seems a separate repo/subproject is the right way to go rather than bringing it in-tree- Several other folks have stepped up to be co-maintainers (thanks!)- Some form of IP clearance would need to be done if this were to move forward.It seems the major concerns other than IP clearance were taken care of in the thread. Is there an appetite to bring easy-case-stress into the Apache umbrella and, if so, how would we move forward from here?Doug RohrerOn May 3, 2024, at 1:16 PM, Alexander DEJANOVSKI  wrote:Hi folks,I'm familiar with the codebase and can help with the maintenance and evolution.I already have some additional profiles that I can push there which were never merged in the main branch of tlp-cluster.I love this tool (I know I'm biased) and hope it gets the attention it deserves.Le mar. 30 avr. 2024, 23:17, Jordan West  a écrit :I would likely commit to it as wellJordan On Mon, Apr 29, 2024 at 10:55 David Capwell  wrote:So: besides Jon, who in the community expects/desires to maintain this going forward? I have been maintaining a fork for years, so don’t mind helping maintain this project.On Apr 28, 2024, at 4:08 AM, Mick Semb Wever  wrote:A separate subproject like dtest and the Java driver would maybe help address concerns with introducing a gradle build system and Kotlin.Nit, dtest is a separate repository, not a subproject.  The Java driver is one repository to be in the Drivers subproject.  Esoteric maybe, but ASF terminology we need to get right :-) To your actual point (IIUC), it can be a separate repository and not a separate subproject.  This permits it to be kotlin+gradle, while not having the formal subproject procedures.  It still needs 3 responsible committers from the get-go to show sustainability.  Would easy-cass-stress have releases, or always be a codebase users work directly with ?Can/Should we first demote cassandra-stress by moving it out to a separate repo ?  ( Can its imports work off non-snapshot dependencies ? )It might feel like an extra prerequisite step to introduce, but maybe it helps move the needle forward and make this conversation a bit easier/obvious.




Re: Request for review for CASSANDRA-18505 - Moving credentials and settings from cassandra-env.sh to cassandra.yaml for JMX configuration

2024-11-13 Thread Doug Rohrer
I should be able to pick this up and do a second-pass review tomorrow (since I 
already provided some feedback).

Doug

> On Nov 12, 2024, at 8:41 PM, Maulin Vasavada  
> wrote:
> 
> Yay! Thank you Stefan for the reviews. Looking forward to other reviews on 
> the PR as well from this group. Everytime I learn new things :) Hence I'll 
> keep doing this and I feel great support from the community so why not!
> 
> On Fri, Nov 8, 2024 at 1:01 AM Štefan Miklošovič  > wrote:
>> Hi,
>> 
>> I want to highlight this ticket (1) and its older variant here (2) which is 
>> about putting (sensitive) JMX information (passwords etc) to cassandra.yaml 
>> to a new section instead of having them in cassandra-env.sh (branch here 
>> (3)).
>> 
>> I think it is a good initiative to have all these settings consolidated at 
>> one place in cassandra.yaml, the ticket also says about leaking sensitive 
>> information and in general having all settings, jmx, client & server options 
>> at once place is a good idea.
>> 
>> If settings are still found in cassandra-env.sh, they are used, instead of 
>> these in cassandra.yaml so old deployments will work without any change. It 
>> is only in case there is nothing in cassandra-env.sh the settings will be 
>> taken from cassandra.yaml. 
>> 
>> Based on the number of watchers this ticket has, (10 and 15 respectively), 
>> it seems to me this is quite a valuable piece of work a lot of people wish 
>> to be delivered.
>> 
>> I thank a lot to Maulin Vasavada who did this work based on the initial 
>> attempts of others and I provided the first round of reviews for that. The 
>> CI is green and we are comfortable to invite other reviewers in order to get 
>> another binding +1 so we can deliver this (to trunk).
>> 
>> Regards
>> 
>> (1) https://issues.apache.org/jira/browse/CASSANDRA-18508
>> (2) https://issues.apache.org/jira/browse/CASSANDRA-11695
>> (3) https://github.com/apache/cassandra/pull/3638



Re: [VOTE] CEP-45: Mutation Tracking

2025-02-04 Thread Doug Rohrer
+1

Thanks!

Doug

> On Feb 3, 2025, at 1:33 PM, Blake Eggleston  wrote:
> 
> Hi dev@,
> 
> I’d like to start the voting for CEP-45: Mutation Tracking
> 
> Proposal: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-45:+Mutation+Tracking
> Discussion: https://lists.apache.org/thread/0rstj4bzbb2596o5vw1m863ofggdjc81
> 
> The vote will be open for 72 hours. A vote passes if there are at least 3 
> binding +1s and no binding vetoes.
> 
> Thanks,
> Blake Eggleston



Re: Supporting 2.2 -> 5.0 upgrades

2024-12-12 Thread Doug Rohrer
+1 on moving the read/write logic into its own jar.

Doug

> On Dec 11, 2024, at 7:21 PM, David Capwell  wrote:
> 
> From a disk format point of view the only thing I remember was the disk type 
> bug with UDTs.  Bringing that logic back was hard as the type system (in 5.0) 
> tries to avoid allowing construction of invalid states, and we would need to 
> weaken that in order to enable the migration. Assuming the user migrated from 
> 3.x to 4.x then the sstable metadata should have been rewritten to fix this 
> bug.
> 
> One thought (though know its a ton of effort).. we have talked about for a 
> long time about moving the reading/writing logic into its jar (so tools don’t 
> need cassandra-all and can limit the dependencies)… if we did that we could 
> try to solve this as an out of process migration… have the 2.2 reader then 
> write using 6.0 writer (ignoring compact storage… )… 
> 
>> On Dec 11, 2024, at 4:59 AM, Benedict  wrote:
>> 
>> I think 3.11 supported upgrade from 2.2, but I haven’t checked. I am fairly 
>> sure 4.x supported upgrade from 3.0.x also.
>> 
>> 
>>> On 11 Dec 2024, at 12:53, Miklosovic, Stefan via dev 
>>>  wrote:
>>> 
>>> I see. That makes sense. I think that by 3.x you meant basically the 
>>> latest 3.11, right? I guess 2.2 -> 3.0 already works, we would just try to 
>>> support 2.2 -> 3.11 straight away. I need to check where we are at in that 
>>> area.
>>> 
>>> 
>>> From: Benedict 
>>> Sent: Wednesday, December 11, 2024 13:09
>>> To: dev@cassandra.apache.org
>>> Cc: Miklosovic, Stefan; dev@cassandra.apache.org; Miklosovic, Stefan
>>> Subject: Re: Supporting 2.2 -> 5.0 upgrades
>>> 
>>> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments
>>> 
>>> 
>>> 
>>> 
>>> 2.2 is particularly hard because of the major storage format changes that 
>>> took place.
>>> 
>>> I think if we want to retain (restore) upgrade support from 3.x I would 
>>> support that, but 2.x is probably too burdensome and likely to have too 
>>> many hard edges.
>>> 
>>> I think if users only had to upgrade 2.2->3.x then eg 3.x->6.0 that would 
>>> be a pretty friendly upgrade path all things considered.
>>> 
 On 11 Dec 2024, at 12:03, Miklosovic, Stefan via dev 
  wrote:
 
 Hey,
 
 I want to fork the thread where we are mentioning that 2.2 -> 5.0 would be 
 cool to support.
 
 I was involved in checking that offline upgrades from 3.0 to 5.0 work and 
 fixed few issues along the way (1), hence I can imagine that supporting 
 2.2 -> 5.0 would be basically the same thing just on steroids and more 
 involved? Anyway, having a stab into this is not useless at all, I will at 
 least go deep into the upgrade stuff I have never given a lot of thought 
 to which is good learning experience.
 
 Any tips where to start? Was any progress done by anybody already in this 
 matter to not start from zero?
 
 (1) 
 https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CASSANDRA-19002__;!!Nhn8V6BzJA!RFZoz6sQSrP_qLd0K_eNWO3UAc1s8mTT5SkFalUMwM7_l9gWfb4cnfTFvdY68zsh5-REW7T8ALTPQwqMM_gWWSyp$
 
 Regards
>>> 
>> 
> 



Re: Capabilities

2024-12-19 Thread Doug Rohrer
+1 (nb) and will be happy to help, especially providing input from the 
Analytics side.

Thanks Jordan!

> On Dec 19, 2024, at 12:00 PM, Paulo Motta  wrote:
> 
> Nice stuff! I support this proposal and would be happy to help on this.
> 
> On Wed, Dec 18, 2024 at 6:00 PM Jordan West  > wrote:
>> In a recent discussion on the pains of upgrading one topic that came up is a 
>> feature that Riak had called Capabilities [1]. A major pain with upgrades is 
>> that each node independently decides when to start using new or modified 
>> functionality. Even when we put this behind a config (like storage 
>> compatibility mode) each node immediately enables the feature when the 
>> config is changed and the node is restarted. This causes various types of 
>> upgrade pain such as failed streams and schema disagreement. A recent 
>> example of this is CASSANRA-20118 [2]. In some cases operators can prevent 
>> this from happening through careful coordination (e.g. ensuring upgrade 
>> sstables only runs after the whole cluster is upgraded) but typically 
>> requires custom code in whatever control plane the operator is using. A 
>> capabilities framework would distribute the state of what features each node 
>> has (and their status e.g. enabled or not) so that the cluster can choose to 
>> opt in to new features once the whole cluster has them available. From 
>> experience, having this in Riak made upgrades a significantly less risky 
>> process and also paved a path towards repeatable downgrades. I think 
>> Cassandra would benefit from it as well.
>>   
>> Further, other tools like analytics could benefit from having this 
>> information since currently it's up to the operator to manually determine 
>> the state of the cluster in some cases. 
>> 
>> I am considering drafting a CEP proposal for this feature but wanted to take 
>> the general temperature of the community and get some early thoughts while 
>> working on the draft. 
>> 
>> Looking forward to hearing y'alls thoughts,
>> Jordan
>> 
>> [1] 
>> https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72
>> 
>> [2] https://issues.apache.org/jira/browse/CASSANDRA-20118



Re: Merging compaction improvements to 5.0

2025-02-13 Thread Doug Rohrer
+1 - Thanks for doing the work to figure this out and find a good fix.

Doug

> On Feb 13, 2025, at 11:28 AM, Patrick McFadin  wrote:
> 
> I’ve been following this for a while and I think it’s just some solid 
> engineering based on real-world challenges. Probably one of the best types of 
> contributions to have. I’m +1 on adding it to 5
> 
> Patrick
> 
> On Thu, Feb 13, 2025 at 7:31 AM Dmitry Konstantinov  > wrote:
>> +1 (nb) from my side, I raised a few comments for CASSANDRA-15452 some time 
>> ago and Jordan addressed them.
>> I have also backported CASSANDRA-15452 changes to my internal 4.1 fork and 
>> got about 15% reduction in compaction time even for a node with a local SSD.
>> 
>> On Thu, 13 Feb 2025 at 13:22, Jordan West > > wrote:
>>> For 15452 that’s correct (and I believe also for 20092). For 15452, the 
>>> trunk and 5.0 patch are basically identical. 
>>> 
>>> Jordan 
>>> 
>>> On Thu, Feb 13, 2025 at 01:06 C. Scott Andreas >> > wrote:
 Checking to confirm the specific patches proposed for backport – is it the 
 trunk commit for C-20092 and the open GitHub PR against the 5.0 branch for 
 C-15452 linked below?
 
 CASSANDRA-20092: Introduce SSTableSimpleScanner for compaction (committed 
 to trunk) 
 https://github.com/apache/cassandra/commit/3078aea1cfc70092a185bab8ac5dc8a35627330f
 
  CASSANDRA-15452: Improve disk access patterns during compaction and range 
 reads (PR available) https://github.com/apache/cassandra/pull/3606
 
 Thanks,
 
 – Scott
 
> On Feb 12, 2025, at 9:45 PM, guo Maxwell  > wrote:
> 
> 
> Of course, I definitely hope to see it merged into 5.0.x as soon as 
> possible
> 
> Jordan West mailto:jw...@apache.org>> 于2025年2月13日周四 
> 10:48写道:
>> Regarding the buffer size, it is configurable. My personal take is that 
>> we’ve tested this on a variety of hardware (from laptops to large 
>> instance sizes) already, as well as a few different disk configs (it’s 
>> also been run internally, in test, at a few places) and that it has been 
>> reviewed by four committers and another contributor. Always love to see 
>> more numbers. if folks want to take it for a spin on Alibaba cloud, 
>> azure, etc and determine the best buffer size that’s awesome. We could 
>> document which is suggested for the community. I don’t think it’s 
>> necessary to block on that however. 
>> 
>> Also I am of course +1 to including this in 5.0. 
>> 
>> Jordan 
>> 
>> On Wed, Feb 12, 2025 at 19:50 guo Maxwell > > wrote:
>>> What I understand is that there will be some differences in block 
>>> storage among various cloud platforms. More intuitively, the default 
>>> read-ahead size will be the same. For example, AWS ebs seems to be 
>>> 256K, and Alibaba Cloud seems to be 512K(If I remember correctly).
>>> 
>>> Just like 19488, give the test method, see who can assist in the test , 
>>> and provide the results.  
>>> 
>>> Jon Haddad mailto:j...@rustyrazorblade.com>> 
>>> 于2025年2月13日周四 08:30写道:
 Can you elaborate why?  This would be several hundred hours of work 
 and would cost me thousands of $$ to perform.
 
 Filesystems and block devices are well understood.  Could you give me 
 an example of what you think might be different here?  This is already 
 one of the most well tested and documented performance patches ever 
 contributed to the project.
 
 On Wed, Feb 12, 2025 at 4:26 PM guo Maxwell >>> > wrote:
>  I think it should be tested on most cloud platforms(at least 
> aws、azure、gcp) before merged into 5.0 . Just like  CASSANDRA-19488.
> 
> Paulo Motta mailto:pa...@apache.org>>于2025年2月13日 
> 周四上午6:10写道:
>> I'm looking forward to these improvements, compaction needs tlc. :-)
>> A couple of questions:
>> 
>> Has this been tested only on EBS, or also EC2/bare-metal/Azure/etc? 
>> My
>> only concern is if this is an optimization for EBS that can be a
>> deoptimization for other environments.
>> 
>> Are there reproducible scripts that anyone can run to verify the
>> improvements in their own environments ? This could help alleviate 
>> any
>> concerns and gain confidence to introduce a perf. improvement in a
>> patch release.
>> 
>> I have not read the ticket in detail, so apologies if this was 
>> already
>> discussed there or elsewhere.
>> 
>> On Wed, Feb 12, 2025 at 3:01 PM Jon Haddad > > wrote:
>> >
>> > Hey folks,
>

Re: [VOTE] Release Apache Sidecar Cassandra 0.1.0

2025-03-02 Thread Doug Rohrer
+1 (nb)

Thanks for putting in the work to get this ready to go!

Doug

> On Feb 28, 2025, at 7:46 AM, Brandon Williams  wrote:
> 
> +1, verified sigs/checksums, tested packaging.
> 
> Minor note: the packages do not declare any deps (like java.)  This is
> probably not an issue in practice since nobody will run a dedicated
> 'sidecar machine' but still could be improved.
> 
> Kind Regards,
> Brandon
> 
> On Thu, Feb 27, 2025 at 4:15 PM Francisco Guerrero  wrote:
>> 
>> Proposing the test build of Cassandra Sidecar 0.1.0 for release.
>> 
>> sha1: a2c19e8ccf04bd3ddbdf8ac4d792d2d55f2e497f
>> Git: https://github.com/apache/cassandra-sidecar/tree/0.1.0-tentative
>> Maven Artifacts:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-server/0.1.0/
>> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-client/0.1.0/
>> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-adapters-cassandra41/0.1.0/
>> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-adapters-base/0.1.0/
>> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-vertx-client/0.1.0/
>> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-client-common/0.1.0/
>> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-vertx-client-all/0.1.0/
>> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-server-common/0.1.0/
>> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-vertx-auth-mtls/0.1.0/
>> https://repository.apache.org/content/repositories/orgapachecassandra-1387/org/apache/cassandra/sidecar-client/0.1.0-jdk8/
>> https://repository.apache.org/content/repositories/orgapachecassandra-1387/org/apache/cassandra/sidecar-vertx-client/0.1.0-jdk8/
>> https://repository.apache.org/content/repositories/orgapachecassandra-1387/org/apache/cassandra/sidecar-client-common/0.1.0-jdk8/
>> https://repository.apache.org/content/repositories/orgapachecassandra-1387/org/apache/cassandra/sidecar-vertx-client-all/0.1.0-jdk8/
>> 
>> The Source and Build Artifacts, and the Debian and RPM packages and 
>> repositories, are available here:
>> https://dist.apache.org/repos/dist/dev/cassandra/cassandra-sidecar/0.1.0/
>> 
>> The vote will be open for 72 hours (longer if needed). Everyone who has 
>> tested the build is invited to vote. Votes by PMC members are considered 
>> binding. A vote passes if there are at least three binding +1s and no -1's.
>> 
>> [1]: CHANGES.txt: 
>> https://github.com/apache/cassandra-sidecar/blob/0.1.0-tentative/CHANGES.txt
>> [2]: NEWS.txt: 
>> https://github.com/apache/cassandra-sidecar/blob/0.1.0-tentative/NEWS.txt



Re: Welcome Caleb Rackliffe to the PMC

2025-03-01 Thread Doug Rohrer
Congrats Caleb!

> On Feb 26, 2025, at 10:14 PM, Jordan West  wrote:
> 
> Congrats Caleb!!
> 
> Jordan 
> On Wed, Feb 26, 2025 at 13:01 Mick Semb Wever  > wrote:
>>   .
>>
>>> 
>>> Please join us in welcoming Caleb to his new role!
>> 
>> 
>> 
>> Congratulations Caleb !!
>> 



A Roadmap to Cassandra Analytics 1.0

2025-04-22 Thread Doug Rohrer
Hello folks,

As many of you on the ASF Slack may have noticed, I’ve been creating a bunch of 
new tickets for the Cassandra Analytics project related to a 1.0 release. Since 
it was initially contributed, there have been many enhancements and fixes to 
the library, but there are still some gaps that need to be addressed. We’re 
putting together a plan to close those gaps, and would love to enlist more 
folks from the community in making the analytics library more useful. The gaps 
we see today include:
vnode support (and optimizations to the exiting code if necessary to make it 
work more efficiently with clusters using vnodes) (CASSANALYTICS-10 
<https://issues.apache.org/jira/browse/CASSANALYTICS-10>)
Cassandra 5.0 support (this is an epic with lots of subtasks, some of which are 
already being worked on by a variety of folks) (CASSANALYTICS-23 
<https://issues.apache.org/jira/browse/CASSANALYTICS-23>)
Documentation, including both docs on cassandra.apache.org 
<http://cassandra.apache.org/> and updated/improved developer docs in the 
repository itself (CASSANALYTICS-6 
<https://issues.apache.org/jira/browse/CASSANALYTICS-6>)
Build scripts for release (CASSANALYTICS-22 
<https://issues.apache.org/jira/browse/CASSANALYTICS-22>)
Miscellaneous bug fixes of known issues/improvements
Analytics writer should support all valid partition/clustering key types 
(CASSANALYTICS-35 <https://issues.apache.org/jira/browse/CASSANALYTICS-35>)
CassandraDataLayer uses configuration list of IPs instead of the full 
ring/datacenter (CASSANALYTICS-20 
<https://issues.apache.org/jira/browse/CASSANALYTICS-20>)
Bulk Reader should dynamically calculate number of cores to use to better 
utilize resources for smaller tables (CASSANALYTICS-36 
<https://issues.apache.org/jira/browse/CASSANALYTICS-36>)

Beyond 1.0, there’s a lot of improvements and enhancements on the roadmap to 
date:
Cassandra 6.0 Support (CASSANALYTICS-37 
<https://issues.apache.org/jira/browse/CASSANALYTICS-37>)
Spark 4.0 support (CASSANALYTICS-34 
<https://issues.apache.org/jira/browse/CASSANALYTICS-34>)
JDK Support Matrix (CASSANALYTICS-38 
<https://issues.apache.org/jira/browse/CASSANALYTICS-38>)
Improved Compaction/Repair load for bulk writes (CASSANALYTICS-39 
<https://issues.apache.org/jira/browse/CASSANALYTICS-39>)
Bandwidth reduction (especially cross-dc writes) (CASSANALYTICS-40 
<https://issues.apache.org/jira/browse/CASSANALYTICS-40>)
Consolidation of SBW-on-S3 and DIRECT mode code (CASSANALYTICS-41 
<https://issues.apache.org/jira/browse/CASSANALYTICS-41>)
Bulk reads via S3 (CASSANALYTICS-42 
<https://issues.apache.org/jira/browse/CASSANALYTICS-42>)

We’re also looking for input on what others think should be in the 1.0 release, 
or the long-term roadmap. If you’ve got ideas, don’t hesitate to respond to 
this thread. I’ll also be checking the existing JIRAs and making sure they are 
incorporated into the plan, which I believe most are already.

I want to thank the folks who have, so far, contributed most of the code for 
the Analytics library, and those in the community who have already started to 
use and improve it. We’re looking forward to getting more community members 
involved. If any of these items sounds interesting, please feel free to reach 
out to folks on Slack or reply on the dev list.

Thanks,

Doug Rohrer


Re: A Roadmap to Cassandra Analytics 1.0

2025-04-23 Thread Doug Rohrer
That’s great - thanks Štefan - please feel free to reach out in slack or via 
email if you’ve got any questions.

Doug

> On Apr 23, 2025, at 2:04 AM, Štefan Miklošovič  wrote:
> 
> Hi Doug,
> 
> I would love to help you with some of that. Spark 4.0 support seems appealing 
> to me. Let me check with my "backend" if there is any capacity doing so and 
> connecting privately to hash out the details.
> 
> Regards
> 
> On Tue, Apr 22, 2025 at 7:53 PM Doug Rohrer  <mailto:droh...@apple.com>> wrote:
>> Hello folks,
>> 
>> As many of you on the ASF Slack may have noticed, I’ve been creating a bunch 
>> of new tickets for the Cassandra Analytics project related to a 1.0 release. 
>> Since it was initially contributed, there have been many enhancements and 
>> fixes to the library, but there are still some gaps that need to be 
>> addressed. We’re putting together a plan to close those gaps, and would love 
>> to enlist more folks from the community in making the analytics library more 
>> useful. The gaps we see today include:
>> vnode support (and optimizations to the exiting code if necessary to make it 
>> work more efficiently with clusters using vnodes) (CASSANALYTICS-10 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-10>)
>> Cassandra 5.0 support (this is an epic with lots of subtasks, some of which 
>> are already being worked on by a variety of folks) (CASSANALYTICS-23 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-23>)
>> Documentation, including both docs on cassandra.apache.org 
>> <http://cassandra.apache.org/> and updated/improved developer docs in the 
>> repository itself (CASSANALYTICS-6 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-6>)
>> Build scripts for release (CASSANALYTICS-22 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-22>)
>> Miscellaneous bug fixes of known issues/improvements
>> Analytics writer should support all valid partition/clustering key types 
>> (CASSANALYTICS-35 <https://issues.apache.org/jira/browse/CASSANALYTICS-35>)
>> CassandraDataLayer uses configuration list of IPs instead of the full 
>> ring/datacenter (CASSANALYTICS-20 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-20>)
>> Bulk Reader should dynamically calculate number of cores to use to better 
>> utilize resources for smaller tables (CASSANALYTICS-36 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-36>)
>> 
>> Beyond 1.0, there’s a lot of improvements and enhancements on the roadmap to 
>> date:
>> Cassandra 6.0 Support (CASSANALYTICS-37 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-37>)
>> Spark 4.0 support (CASSANALYTICS-34 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-34>)
>> JDK Support Matrix (CASSANALYTICS-38 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-38>)
>> Improved Compaction/Repair load for bulk writes (CASSANALYTICS-39 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-39>)
>> Bandwidth reduction (especially cross-dc writes) (CASSANALYTICS-40 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-40>)
>> Consolidation of SBW-on-S3 and DIRECT mode code (CASSANALYTICS-41 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-41>)
>> Bulk reads via S3 (CASSANALYTICS-42 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-42>)
>> 
>> We’re also looking for input on what others think should be in the 1.0 
>> release, or the long-term roadmap. If you’ve got ideas, don’t hesitate to 
>> respond to this thread. I’ll also be checking the existing JIRAs and making 
>> sure they are incorporated into the plan, which I believe most are already.
>> 
>> I want to thank the folks who have, so far, contributed most of the code for 
>> the Analytics library, and those in the community who have already started 
>> to use and improve it. We’re looking forward to getting more community 
>> members involved. If any of these items sounds interesting, please feel free 
>> to reach out to folks on Slack or reply on the dev list.
>> 
>> Thanks,
>> 
>> Doug Rohrer



Re: A Roadmap to Cassandra Analytics 1.0

2025-04-23 Thread Doug Rohrer
I put everything into Jira directly - there are two epics, one for the 
“Analytics 1.0 <https://issues.apache.org/jira/browse/CASSANALYTICS-21>” 
release and one for “Cassandra 5.0 support. 
<https://issues.apache.org/jira/browse/CASSANALYTICS-23>”, figuring that once 
we started work on these things (which some folks actually have) a Confluence 
page would quickly become out of date.

If folks feel like there’s some value in putting something up there we could do 
that, but I think epics in Jira capture the plan fairly well.

Thanks,

Doug

> On Apr 22, 2025, at 6:15 PM, Patrick McFadin  wrote:
> 
> Is the current roadmap published somewhere? I went to Confluence and couldn't 
> find anything.
> 
> Patrick
> 
> On Tue, Apr 22, 2025 at 10:53 AM Doug Rohrer  <mailto:droh...@apple.com>> wrote:
>> Hello folks,
>> 
>> As many of you on the ASF Slack may have noticed, I’ve been creating a bunch 
>> of new tickets for the Cassandra Analytics project related to a 1.0 release. 
>> Since it was initially contributed, there have been many enhancements and 
>> fixes to the library, but there are still some gaps that need to be 
>> addressed. We’re putting together a plan to close those gaps, and would love 
>> to enlist more folks from the community in making the analytics library more 
>> useful. The gaps we see today include:
>> vnode support (and optimizations to the exiting code if necessary to make it 
>> work more efficiently with clusters using vnodes) (CASSANALYTICS-10 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-10>)
>> Cassandra 5.0 support (this is an epic with lots of subtasks, some of which 
>> are already being worked on by a variety of folks) (CASSANALYTICS-23 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-23>)
>> Documentation, including both docs on cassandra.apache.org 
>> <http://cassandra.apache.org/> and updated/improved developer docs in the 
>> repository itself (CASSANALYTICS-6 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-6>)
>> Build scripts for release (CASSANALYTICS-22 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-22>)
>> Miscellaneous bug fixes of known issues/improvements
>> Analytics writer should support all valid partition/clustering key types 
>> (CASSANALYTICS-35 <https://issues.apache.org/jira/browse/CASSANALYTICS-35>)
>> CassandraDataLayer uses configuration list of IPs instead of the full 
>> ring/datacenter (CASSANALYTICS-20 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-20>)
>> Bulk Reader should dynamically calculate number of cores to use to better 
>> utilize resources for smaller tables (CASSANALYTICS-36 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-36>)
>> 
>> Beyond 1.0, there’s a lot of improvements and enhancements on the roadmap to 
>> date:
>> Cassandra 6.0 Support (CASSANALYTICS-37 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-37>)
>> Spark 4.0 support (CASSANALYTICS-34 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-34>)
>> JDK Support Matrix (CASSANALYTICS-38 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-38>)
>> Improved Compaction/Repair load for bulk writes (CASSANALYTICS-39 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-39>)
>> Bandwidth reduction (especially cross-dc writes) (CASSANALYTICS-40 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-40>)
>> Consolidation of SBW-on-S3 and DIRECT mode code (CASSANALYTICS-41 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-41>)
>> Bulk reads via S3 (CASSANALYTICS-42 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-42>)
>> 
>> We’re also looking for input on what others think should be in the 1.0 
>> release, or the long-term roadmap. If you’ve got ideas, don’t hesitate to 
>> respond to this thread. I’ll also be checking the existing JIRAs and making 
>> sure they are incorporated into the plan, which I believe most are already.
>> 
>> I want to thank the folks who have, so far, contributed most of the code for 
>> the Analytics library, and those in the community who have already started 
>> to use and improve it. We’re looking forward to getting more community 
>> members involved. If any of these items sounds interesting, please feel free 
>> to reach out to folks on Slack or reply on the dev list.
>> 
>> Thanks,
>> 
>> Doug Rohrer



Re: [VOTE][IP CLEARANCE] easy-cass-stress

2025-04-30 Thread Doug Rohrer
+1 (nb) - glad to see this moving forward.

Doug

> On Apr 30, 2025, at 11:33 AM, Patrick McFadin  wrote:
> 
> +1
> 
> On Wed, Apr 30, 2025 at 8:30 AM Yifan Cai  > wrote:
>> +1 (nb)
>> 
>> From: Jon Haddad mailto:j...@rustyrazorblade.com>>
>> Sent: Wednesday, April 30, 2025 8:24:44 AM
>> To: dev@cassandra.apache.org  
>> mailto:dev@cassandra.apache.org>>
>> Cc: gene...@incubator.apache.org  
>> mailto:gene...@incubator.apache.org>>
>> Subject: Re: [VOTE][IP CLEARANCE] easy-cass-stress
>>  
>> +1
>> 
>> On Wed, Apr 30, 2025 at 8:18 AM Jordan West > > wrote:
>> +1
>> 
>> On Wed, Apr 30, 2025 at 8:15 AM Jordan West > > wrote:
>> (general@incubator cc'd)
>> 
>> Please vote on the acceptance of the easy-cass-stress (to be renamed 
>> cassandra-stress) and its IP Clearance:
>> 
>> https://incubator.apache.org/ip-clearance/cassandra-easy-cass-stress.html
>> 
>> All consent from original authors of the donation, and tracking of collected 
>> CLAs, is found in
>> 
>> https://github.com/rustyrazorblade/easy-cass-stress/pull/41/files and  
>> https://delicate-tail-8c0.notion.site/easy-cass-stress-submission-141ac849cc9d80a4972cc8623aa54667
>> 
>> These do not all require acknowledgement before the vote.
>> 
>> The code is prepared for donation at 
>> https://github.com/rustyrazorblade/easy-cass-stress
>> 
>> Once this vote passes we will request ASF Infra to move the 
>> rustyrazorblade/easy-cass-stress as-is to apache/cassandra-stress. The main 
>> branch and gh-pages branches, all tags, and all history, will be kept.  The 
>> main branch will continue to be named main.
>> 
>> PMC members, please check carefully the IP Clearance requirements before 
>> voting.
>> 
>> The vote will be open for 72 hours (or longer). Votes by PMC members
>> 
>> are considered binding. A vote passes if there are at least three binding 
>> +1s and no -1's.
>> 
>> Thanks,
>> 
>> Jordan
>> 



Re: A Roadmap to Cassandra Analytics 1.0

2025-05-01 Thread Doug Rohrer
Patrick,

Thanks for the clarification - makes sense. I can put the contents here up on 
Confluence and we can work together to tweak it if necessary.

> On Apr 30, 2025, at 11:15 AM, Patrick McFadin  wrote:
> 
> I'm not thinking that the Confluence page would be a status page or try to 
> get too close to being a tracker. 
> 
> My motivation here is for the millions of users not watching the project 
> intently and completely missing that this is happening. Case in point. I was 
> recently in a Reddit thread with a guy trying to build his own CDC mechanism 
> for Kafka topics. I pointed out that not only did sidecar exist, but maybe he 
> would like to contribute? It's this kind of non-coding activity that has an 
> awesome downstream effect on our project codebase by finding more 
> contributors/users. My thoughts about this page in Confluence is a 
> semi-dynamic page that explains what the project does, what's being worked on 
> and potential areas of contribution. The latter being the most dynamic. If 
> you have time, I can get on a zoom with you, take some notes and put it up. 
> Doesn't have to be a big effort. 
> 
> Patrick
> 
> On Wed, Apr 23, 2025 at 6:52 AM Doug Rohrer  <mailto:droh...@apple.com>> wrote:
>> I put everything into Jira directly - there are two epics, one for the 
>> “Analytics 1.0 <https://issues.apache.org/jira/browse/CASSANALYTICS-21>” 
>> release and one for “Cassandra 5.0 support. 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-23>”, figuring that 
>> once we started work on these things (which some folks actually have) a 
>> Confluence page would quickly become out of date.
>> 
>> If folks feel like there’s some value in putting something up there we could 
>> do that, but I think epics in Jira capture the plan fairly well.
>> 
>> Thanks,
>> 
>> Doug
>> 
>>> On Apr 22, 2025, at 6:15 PM, Patrick McFadin >> <mailto:pmcfa...@gmail.com>> wrote:
>>> 
>>> Is the current roadmap published somewhere? I went to Confluence and 
>>> couldn't find anything.
>>> 
>>> Patrick
>>> 
>>> On Tue, Apr 22, 2025 at 10:53 AM Doug Rohrer >> <mailto:droh...@apple.com>> wrote:
>>>> Hello folks,
>>>> 
>>>> As many of you on the ASF Slack may have noticed, I’ve been creating a 
>>>> bunch of new tickets for the Cassandra Analytics project related to a 1.0 
>>>> release. Since it was initially contributed, there have been many 
>>>> enhancements and fixes to the library, but there are still some gaps that 
>>>> need to be addressed. We’re putting together a plan to close those gaps, 
>>>> and would love to enlist more folks from the community in making the 
>>>> analytics library more useful. The gaps we see today include:
>>>> vnode support (and optimizations to the exiting code if necessary to make 
>>>> it work more efficiently with clusters using vnodes) (CASSANALYTICS-10 
>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-10>)
>>>> Cassandra 5.0 support (this is an epic with lots of subtasks, some of 
>>>> which are already being worked on by a variety of folks) (CASSANALYTICS-23 
>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-23>)
>>>> Documentation, including both docs on cassandra.apache.org 
>>>> <http://cassandra.apache.org/> and updated/improved developer docs in the 
>>>> repository itself (CASSANALYTICS-6 
>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-6>)
>>>> Build scripts for release (CASSANALYTICS-22 
>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-22>)
>>>> Miscellaneous bug fixes of known issues/improvements
>>>> Analytics writer should support all valid partition/clustering key types 
>>>> (CASSANALYTICS-35 <https://issues.apache.org/jira/browse/CASSANALYTICS-35>)
>>>> CassandraDataLayer uses configuration list of IPs instead of the full 
>>>> ring/datacenter (CASSANALYTICS-20 
>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-20>)
>>>> Bulk Reader should dynamically calculate number of cores to use to better 
>>>> utilize resources for smaller tables (CASSANALYTICS-36 
>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-36>)
>>>> 
>>>> Beyond 1.0, there’s a lot of improvements and enhancements on the roadmap 
>>>> to date:
>>>> Cassandra 6.0 Support (CASSANALYTICS-37 
>>>> 

Re: A Roadmap to Cassandra Analytics 1.0

2025-05-06 Thread Doug Rohrer
Posted to 
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Analytics+Roadmap
 - happy to discuss/edit further as well.

Doug

> On May 1, 2025, at 9:39 AM, Doug Rohrer  wrote:
> 
> Patrick,
> 
> Thanks for the clarification - makes sense. I can put the contents here up on 
> Confluence and we can work together to tweak it if necessary.
> 
>> On Apr 30, 2025, at 11:15 AM, Patrick McFadin  wrote:
>> 
>> I'm not thinking that the Confluence page would be a status page or try to 
>> get too close to being a tracker. 
>> 
>> My motivation here is for the millions of users not watching the project 
>> intently and completely missing that this is happening. Case in point. I was 
>> recently in a Reddit thread with a guy trying to build his own CDC mechanism 
>> for Kafka topics. I pointed out that not only did sidecar exist, but maybe 
>> he would like to contribute? It's this kind of non-coding activity that has 
>> an awesome downstream effect on our project codebase by finding more 
>> contributors/users. My thoughts about this page in Confluence is a 
>> semi-dynamic page that explains what the project does, what's being worked 
>> on and potential areas of contribution. The latter being the most dynamic. 
>> If you have time, I can get on a zoom with you, take some notes and put it 
>> up. Doesn't have to be a big effort. 
>> 
>> Patrick
>> 
>> On Wed, Apr 23, 2025 at 6:52 AM Doug Rohrer > <mailto:droh...@apple.com>> wrote:
>>> I put everything into Jira directly - there are two epics, one for the 
>>> “Analytics 1.0 <https://issues.apache.org/jira/browse/CASSANALYTICS-21>” 
>>> release and one for “Cassandra 5.0 support. 
>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-23>”, figuring that 
>>> once we started work on these things (which some folks actually have) a 
>>> Confluence page would quickly become out of date.
>>> 
>>> If folks feel like there’s some value in putting something up there we 
>>> could do that, but I think epics in Jira capture the plan fairly well.
>>> 
>>> Thanks,
>>> 
>>> Doug
>>> 
>>>> On Apr 22, 2025, at 6:15 PM, Patrick McFadin >>> <mailto:pmcfa...@gmail.com>> wrote:
>>>> 
>>>> Is the current roadmap published somewhere? I went to Confluence and 
>>>> couldn't find anything.
>>>> 
>>>> Patrick
>>>> 
>>>> On Tue, Apr 22, 2025 at 10:53 AM Doug Rohrer >>> <mailto:droh...@apple.com>> wrote:
>>>>> Hello folks,
>>>>> 
>>>>> As many of you on the ASF Slack may have noticed, I’ve been creating a 
>>>>> bunch of new tickets for the Cassandra Analytics project related to a 1.0 
>>>>> release. Since it was initially contributed, there have been many 
>>>>> enhancements and fixes to the library, but there are still some gaps that 
>>>>> need to be addressed. We’re putting together a plan to close those gaps, 
>>>>> and would love to enlist more folks from the community in making the 
>>>>> analytics library more useful. The gaps we see today include:
>>>>> vnode support (and optimizations to the exiting code if necessary to make 
>>>>> it work more efficiently with clusters using vnodes) (CASSANALYTICS-10 
>>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-10>)
>>>>> Cassandra 5.0 support (this is an epic with lots of subtasks, some of 
>>>>> which are already being worked on by a variety of folks) 
>>>>> (CASSANALYTICS-23 
>>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-23>)
>>>>> Documentation, including both docs on cassandra.apache.org 
>>>>> <http://cassandra.apache.org/> and updated/improved developer docs in the 
>>>>> repository itself (CASSANALYTICS-6 
>>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-6>)
>>>>> Build scripts for release (CASSANALYTICS-22 
>>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-22>)
>>>>> Miscellaneous bug fixes of known issues/improvements
>>>>> Analytics writer should support all valid partition/clustering key types 
>>>>> (CASSANALYTICS-35 
>>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-35>)
>>>>> CassandraDataLayer uses configuration list of IPs instead of the full 
>>>>> ring/datacente