Re: [VOTE][IP CLEARANCE] Cassandra Cluster Manager (CCM)

2025-03-10 Thread Joseph Lynch
+1

On Mon, Mar 10, 2025 at 1:28 PM Patrick McFadin  wrote:

> +1
>
> On Mon, Mar 10, 2025 at 9:28 AM Dinesh Joshi  wrote:
>
>> +1
>>
>> On Sun, Mar 9, 2025 at 5:18 AM Mick Semb Wever  wrote:
>>
>>> Please vote on the acceptance of the Cassandra Cluster Manager (CCM)
>>> and its IP Clearance:
>>> https://incubator.apache.org/ip-clearance/cassandra-ccm.html
>>>
>>> All consent from original authors of the donation, and tracking of
>>> collected CLAs, is found in:
>>>  - https://github.com/riptano/ccm/issues/773
>>>  -
>>> https://docs.google.com/spreadsheets/d/1lXDK3c7_-TZh845knVZ8zvJf65x2o03ACqY3pfdXZR8
>>>
>>> These do not require acknowledgement before the vote.
>>>
>>> The code is prepared for donation at https://github.com/riptano/ccm
>>> (Only `master` and `cassandra-test` refs will be brought over.)
>>>
>>> Once this vote passes we will request ASF Infra to move the
>>> riptano/ccm as-is to apache/cassandra-ccm  . The master branch and the
>>> cassandra-test tag, with all its history, will be kept.  Because
>>> consent and CLAs were not received from all original authors the
>>> NOTICE file keeps additional reference to these earlier copyright
>>> authors.
>>>
>>> PMC members, please check carefully the IP Clearance requirements before
>>> voting.
>>>
>>> The vote will be open for 72 hours (or longer). Votes by PMC members
>>> are considered binding. A vote passes if there are at least three
>>> binding +1s and no -1's.
>>>
>>> regards,
>>> Mick
>>>
>>


Re: CEP-15 Update

2025-03-10 Thread Benedict Elliott Smith
My understanding then is that we are free to merge once we are ready? We will 
be directing our resources on this basis, so please pipe up promptly if you 
disagree. We will update the list once we have our final patches merged (which 
should be a good time to kick the tyres for those inclined), and once the 
rebase has completed.

Jordan, I am sorry to hear of your disappointment. Of course, we have plans to 
address each of the caveats - as promised in the opening email. Hopefully 
others will have the wherewithal to reply to your other queries.


> On 7 Mar 2025, at 19:38, Caleb Rackliffe  wrote:
> 
> Just a quick reminder that CASSANDRA-18196 
>  is where we had been 
> tracking this previously with respect to things like the feature flag 
> (CASSANDRA-18195 ), 
> etc. I'm not sure if we want to officially tie up/resolve the subtasks there 
> for Jira hygiene...



Re: Cassandra Java Driver and OpenJDK CRaC

2025-03-10 Thread Radim Vansa

Hello Josh,

thanks for reaching back; answers inline:

On 10. 03. 25 13:03, Josh McKenzie wrote:


From skimming the PR on the Spring side and the conversation there, it 
looks like the argument is to have this live inside the java driver 
for Cassandra instead of in the spring-boot lib which I can see the 
argument for.



Yes; for us it does not really matter where the fix lives as long as 
it's available for the end users. Pushing it towards Cassandra has the 
advantage to provide the greatest fan-out to users, even those not 
consuming through frameworks.




If we distill this to speak to precisely the problem we're trying to 
address or improvement we're going for here, how would you phrase 
that? i.e. "Take application startup from Nms down to Mms"?




Yes, optimizing startup time is the most common use-case for CRaC. It's 
rather hard to provide such general numbers: it should be order(s) of 
magnitude. If we speak about hello-world style Spring Boot application 
booting, CRaC improves the startup from seconds to tens of milliseconds. 
That shouldn't differ too much from the expected times for a small 
micro-service, improving latency in scale-from-zero situations. This is 
not limited to microservices, though; we've been experimenting with real 
applications consuming hundreds of GB of memory. In that case the 
application boot can be rather complex, loading and pre-processing data 
from DB etc. where the boot takes minutes or more. CRaC can restore such 
instance in a few seconds.




I ask because that's the "pro" we'll need to weigh against updating 
the driver's topology map of the cluster, resource handling and 
potential leaks on shutdown/startup, and the complexity of taking an 
implementation like this into the driver code. Nothing insurmountable 
of course, just worth weighing the two.


Can you elaborate about other use cases where the nodes are forced down, 
and what risk does that bring to the overall stability? Is there a 
difference between marking only a subset of nodes down and taking all of 
the nodes down? When we force-close the control connection (as the first 
step), is it possible to get a topology update at all and race on the 
cluster members?


Thank you!

Radim




On Thu, Mar 6, 2025, at 3:34 PM, Radim Vansa wrote:

Hi all,

I would like to make applications using Cassandra Java Driver,
particularly those built with Spring Boot, Quarkus or similar
frameworks, work with OpenJDK CRaC project [1]. I've already created a
patch for Spring Boot [2] but Spring folks think that these changes are
too dependent on driver internals, suggesting to contribute a support to
Cassandra directly.

The patch involves closing all connections before checkpoint, and
re-establishing these after restore. I have implemented that though
sending a `NodeStateEvent -> FORCED_DOWN` on the bus for all connected
nodes. As a follow-up I could develop some way to inform the session
about a new topology e.g. if the cluster addresses change.

Before jumping onto implementing a PR I would like to ask what you think
is the best approach to do this. I can think of two ways:

1) Native CRaC support

The driver would have a dependency on `org.crac:crac` [3]; this is a
small (13kB) library that provides the interfaces and a dummy noop
implementation if the target JVM does not support CRaC. Then
`DefaultSession` would register a `org.crac.Resource` implementation
that would handle the checkpoint. This has the advantage of providing
best fan-out into any project consuming the driver without any 
further work.


2) Exposing neutral methods

To save frameworks of relying on internals, `DefaultSession` would
expose `.suspend()` and `.resume()` methods that would implement the
connection cut-off without importing any dependency. After upgrade to
latest release, frameworks could use these methods in a way that suits
them. I wouldn't add those methods to the `CqlSession` interface (as
that would be breaking change) but only to `DefaultSession`.

Would Cassandra accept either of these, to let people checkpoint
(snapshot) their applications and restore them within tens of
milliseconds? Naturally it is possible to close the session object
completely and create a new one, but the ideal solution would require no
application changes beyond dependency upgrade.

Btw. I am aware that there is an inherent race between possible topology
change and shutdown of current nodes (and I am listening for hints that
would let us prevent that), but it is reasonable to expect that users
will checkpoint the application in a quiescent state. And if the
topology update breaks the checkpoint, it is always possible to try it
again.

Thank you for your opinions and ideas!

Radim Vansa


[1] https://wiki.openjdk.org/display/crac

[2] https://github.com/spring-projects/spring-boot/pull/44505

[3] https://mvnrepository.com/artifact/org.crac/crac/1.5.0




Re: [VOTE][IP CLEARANCE] Cassandra Cluster Manager (CCM)

2025-03-10 Thread Josh McKenzie
+1

On Sun, Mar 9, 2025, at 8:01 PM, Blake Eggleston wrote:
> +1
> 
> On Sun, Mar 9, 2025, at 5:17 AM, Mick Semb Wever wrote:
>> Please vote on the acceptance of the Cassandra Cluster Manager (CCM)
>> and its IP Clearance:
>> https://incubator.apache.org/ip-clearance/cassandra-ccm.html
>> 
>> All consent from original authors of the donation, and tracking of
>> collected CLAs, is found in:
>> - https://github.com/riptano/ccm/issues/773
>> - 
>> https://docs.google.com/spreadsheets/d/1lXDK3c7_-TZh845knVZ8zvJf65x2o03ACqY3pfdXZR8
>> 
>> These do not require acknowledgement before the vote.
>> 
>> The code is prepared for donation at https://github.com/riptano/ccm
>> (Only `master` and `cassandra-test` refs will be brought over.)
>> 
>> Once this vote passes we will request ASF Infra to move the
>> riptano/ccm as-is to apache/cassandra-ccm  . The master branch and the
>> cassandra-test tag, with all its history, will be kept.  Because
>> consent and CLAs were not received from all original authors the
>> NOTICE file keeps additional reference to these earlier copyright
>> authors.
>> 
>> PMC members, please check carefully the IP Clearance requirements before 
>> voting.
>> 
>> The vote will be open for 72 hours (or longer). Votes by PMC members
>> are considered binding. A vote passes if there are at least three
>> binding +1s and no -1's.
>> 
>> regards,
>> Mick
>> 

Re: Cassandra Java Driver and OpenJDK CRaC

2025-03-10 Thread Josh McKenzie
Thanks for reaching out to the list Radim; this is interesting stuff.

>From skimming the PR on the Spring side and the conversation there, it looks 
>like the argument is to have this live inside the java driver for Cassandra 
>instead of in the spring-boot lib which I can see the argument for.

If we distill this to speak to precisely the problem we're trying to address or 
improvement we're going for here, how would you phrase that? i.e. "Take 
application startup from Nms down to Mms"?

I ask because that's the "pro" we'll need to weigh against updating the 
driver's topology map of the cluster, resource handling and potential leaks on 
shutdown/startup, and the complexity of taking an implementation like this into 
the driver code. Nothing insurmountable of course, just worth weighing the two.

On Thu, Mar 6, 2025, at 3:34 PM, Radim Vansa wrote:
> Hi all,
> 
> I would like to make applications using Cassandra Java Driver, 
> particularly those built with Spring Boot, Quarkus or similar 
> frameworks, work with OpenJDK CRaC project [1]. I've already created a 
> patch for Spring Boot [2] but Spring folks think that these changes are 
> too dependent on driver internals, suggesting to contribute a support to 
> Cassandra directly.
> 
> The patch involves closing all connections before checkpoint, and 
> re-establishing these after restore. I have implemented that though 
> sending a `NodeStateEvent -> FORCED_DOWN` on the bus for all connected 
> nodes. As a follow-up I could develop some way to inform the session 
> about a new topology e.g. if the cluster addresses change.
> 
> Before jumping onto implementing a PR I would like to ask what you think 
> is the best approach to do this. I can think of two ways:
> 
> 1) Native CRaC support
> 
> The driver would have a dependency on `org.crac:crac` [3]; this is a 
> small (13kB) library that provides the interfaces and a dummy noop 
> implementation if the target JVM does not support CRaC. Then 
> `DefaultSession` would register a `org.crac.Resource` implementation 
> that would handle the checkpoint. This has the advantage of providing 
> best fan-out into any project consuming the driver without any further work.
> 
> 2) Exposing neutral methods
> 
> To save frameworks of relying on internals, `DefaultSession` would 
> expose `.suspend()` and `.resume()` methods that would implement the 
> connection cut-off without importing any dependency. After upgrade to 
> latest release, frameworks could use these methods in a way that suits 
> them. I wouldn't add those methods to the `CqlSession` interface (as 
> that would be breaking change) but only to `DefaultSession`.
> 
> Would Cassandra accept either of these, to let people checkpoint 
> (snapshot) their applications and restore them within tens of 
> milliseconds? Naturally it is possible to close the session object 
> completely and create a new one, but the ideal solution would require no 
> application changes beyond dependency upgrade.
> 
> Btw. I am aware that there is an inherent race between possible topology 
> change and shutdown of current nodes (and I am listening for hints that 
> would let us prevent that), but it is reasonable to expect that users 
> will checkpoint the application in a quiescent state. And if the 
> topology update breaks the checkpoint, it is always possible to try it 
> again.
> 
> Thank you for your opinions and ideas!
> 
> Radim Vansa
> 
> 
> [1] https://wiki.openjdk.org/display/crac
> 
> [2] https://github.com/spring-projects/spring-boot/pull/44505
> 
> [3] https://mvnrepository.com/artifact/org.crac/crac/1.5.0
> 
>