Re: Cassandra Java Driver and OpenJDK CRaC

2025-03-11 Thread Radim Vansa

Hi Abe,

I would expect that if the control connection is terminated, on the next 
request it would be re-established including the handshake you mention. 
This shouldn't be different from a connection being broken due to 
network error. In the POC PR I am calling just `DriverChannel.close()`, 
though as I see this sends a graceful termination to the wire as well.


> Would it be possible to have a separate CqlSession implementation 
that includes CRaC's checkpoint and restore hooks[2] to close and open 
the session at the appropriate times


That is an option, though how convenient would that be for use? In 
Spring Boot case, we would could add Spring Boot-specific module that 
would 'override' the type that should be used, with a dependency on a 
'generic'  module implementing the session. I think that this could be 
transparent enough, but if these are both managed as third-party module, 
it would be almost easier to just keep the `CassandraSessionLifecycle` 
managing the connections 'from the outside'. The fragility problem 
remains (modifications that break the third-party module are not 
observed immediately, e.g. in a testsuite). The advantage is in being 
able to eventually take the unmodified class into the driver.


For non-framework use-cases, I think that this is a bigger issue: since 
CqlSession.builder().withWhatever(...).build() will always return the 
`DefaultSession` the code would have to be modified to use a different 
builder. There's no way to 'configure' this (reflection, service 
loader...); the TPM would have a `CracableSessionBuilder` with a 
non-trivial override of `SessionBuilder.buildDefaultSessionAsync` and 
the application would have to change its code to use it.


Thanks for your thoughts!

Radim

[1] 
https://github.com/spring-projects/spring-boot/pull/44505/files#diff-780d2cdd9f860039f0a5a198303af2fe04ea05991ec15e63b09d3f094e3ea8a9R92


On 10. 03. 25 17:28, Abe Ratnofsky wrote:



Caution: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender 
and know the content is safe.



Hey Radim, thanks for bringing this to the list.

In general, I'm supportive of the second option you shared ("Exposing 
neutral methods") but want to make sure I understand how CRaC would 
work in practice.


Could you clarify this part:

> Naturally it is possible to close the session object completely and 
create a new one, but the ideal solution would require no application 
changes beyond dependency upgrade.


CRaC doesn't support checkpointing with open sockets, and the 
Cassandra client protocol requires a few roundtrips after connection 
establishment before a session can be used[1]. Would it be possible to 
have a separate CqlSession implementation that includes CRaC's 
checkpoint and restore hooks[2] to close and open the session at the 
appropriate times? This CRaC implementation could live as a 
third-party module initially as it is proven out.


[1]: 
https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec#L176
[2]: 
https://docs.azul.com/core/crac/crac-guidelines#implementing-crac-resource


Re: Cassandra Java Driver and OpenJDK CRaC

2025-03-11 Thread Radim Vansa

Hi Patrick,

> attacking some of the same requirements that Graal and Quarkus are 
trying to solve


thanks for the support! Yes, Graal (and Leyden) are kind of competing 
solutions for the startup problem. We're trying to hit the sweet spot 
between not requiring significant redesign (as is sometimes the case 
with Graal AOT) and having more bang than what a fully transparent 
solution can give. Quarkus is also known for fast startup but it is more 
orthogonal to CRaC - in fact Quarkus already has some support for CRaC 
and the two can be aligned to get even better performance together.


> Topology information shouldn't be assumed

Is there already an automatic process that will update the topology 
information on reconnect? I guess that what we should prevent is the 
'manual' update (forcing node back up) to override fresh topology 
update. Also, if there's a process invoking the driver concurrently to 
checkpoint, we might get the control connection established too early; 
that's not a big problem since the checkpoint will fail and we can retry.


Thank you!

Radim

On 10. 03. 25 18:17, Patrick McFadin wrote:



Caution: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender 
and know the content is safe.



Just speaking up as a supporter for considering this change. From a 
userland perspective, I've been reading up on CRaC, and I see this 
attacking some of the same requirements that Graal and Quarkus are 
trying to solve. This is a worth direction to pursue.


The CqlSession will need to re-connect, and I think that's worth 
testing. Topology information shouldn't be assumed, especially with 
something like Token-Aware Routing. Some shortcuts could speed it up, 
but I can't think of any right now. I like the idea of making it 
optional and putting it through some scenarios.


Patrick

On Mon, Mar 10, 2025 at 8:03 AM Radim Vansa  wrote:

Hello Josh,

thanks for reaching back; answers inline:
On 10. 03. 25 13:03, Josh McKenzie wrote:


From skimming the PR on the Spring side and the conversation
there, it looks like the argument is to have this live inside the
java driver for Cassandra instead of in the spring-boot lib which
I can see the argument for.



Yes; for us it does not really matter where the fix lives as long
as it's available for the end users. Pushing it towards Cassandra
has the advantage to provide the greatest fan-out to users, even
those not consuming through frameworks.



If we distill this to speak to precisely the problem we're trying
to address or improvement we're going for here, how would you
phrase that? i.e. "Take application startup from Nms down to Mms"?



Yes, optimizing startup time is the most common use-case for CRaC.
It's rather hard to provide such general numbers: it should be
order(s) of magnitude. If we speak about hello-world style Spring
Boot application booting, CRaC improves the startup from seconds
to tens of milliseconds. That shouldn't differ too much from the
expected times for a small micro-service, improving latency in
scale-from-zero situations. This is not limited to microservices,
though; we've been experimenting with real applications consuming
hundreds of GB of memory. In that case the application boot can be
rather complex, loading and pre-processing data from DB etc. where
the boot takes minutes or more. CRaC can restore such instance in
a few seconds.



I ask because that's the "pro" we'll need to weigh against
updating the driver's topology map of the cluster, resource
handling and potential leaks on shutdown/startup, and the
complexity of taking an implementation like this into the driver
code. Nothing insurmountable of course, just worth weighing the two.


Can you elaborate about other use cases where the nodes are forced
down, and what risk does that bring to the overall stability? Is
there a difference between marking only a subset of nodes down and
taking all of the nodes down? When we force-close the control
connection (as the first step), is it possible to get a topology
update at all and race on the cluster members?

Thank you!

Radim




On Thu, Mar 6, 2025, at 3:34 PM, Radim Vansa wrote:

Hi all,

I would like to make applications using Cassandra Java Driver,
particularly those built with Spring Boot, Quarkus or similar
frameworks, work with OpenJDK CRaC project [1]. I've already
created a
patch for Spring Boot [2] but Spring folks think that these
changes are
too dependent on driver internals, suggesting to contribute a
support to
Cassandra directly.

The patch involves closing all connections before checkpoint, and
re-establishing these after restore. I have implemented that though
sending a `NodeStateEvent -> FORCED_DOWN` on the bus for all

Re: [VOTE][IP CLEARANCE] Cassandra Cluster Manager (CCM)

2025-03-11 Thread Francisco Guerrero
+1 (nb)

On 2025/03/09 12:17:34 Mick Semb Wever wrote:
> Please vote on the acceptance of the Cassandra Cluster Manager (CCM)
> and its IP Clearance:
> https://incubator.apache.org/ip-clearance/cassandra-ccm.html
> 
> All consent from original authors of the donation, and tracking of
> collected CLAs, is found in:
>  - https://github.com/riptano/ccm/issues/773
>  - 
> https://docs.google.com/spreadsheets/d/1lXDK3c7_-TZh845knVZ8zvJf65x2o03ACqY3pfdXZR8
> 
> These do not require acknowledgement before the vote.
> 
> The code is prepared for donation at https://github.com/riptano/ccm
> (Only `master` and `cassandra-test` refs will be brought over.)
> 
> Once this vote passes we will request ASF Infra to move the
> riptano/ccm as-is to apache/cassandra-ccm  . The master branch and the
> cassandra-test tag, with all its history, will be kept.  Because
> consent and CLAs were not received from all original authors the
> NOTICE file keeps additional reference to these earlier copyright
> authors.
> 
> PMC members, please check carefully the IP Clearance requirements before 
> voting.
> 
> The vote will be open for 72 hours (or longer). Votes by PMC members
> are considered binding. A vote passes if there are at least three
> binding +1s and no -1's.
> 
> regards,
> Mick
> 


Re: CEP-15 Update

2025-03-11 Thread Benedict Elliott Smith
Because we want to validate against the latest code in trunk, else we are 
validating stale behaviours. The cost of rebasing is high, so we do not do it 
frequently. That means we will likely stop developing OSS-first, as the focus 
will have to move to our internal branch that satisfies these criteria.

Exactly what this might be for upstreaming I cannot say. Personally, I aim to 
work exclusively on the branch we are stabilising. If that is not trunk, the 
latency for my contributions being made public might be high, as I have a huge 
imbalance of over-investment to recoup, and anything unnecessary will be 
deferred.

Since the feature is disabled, and the code is almost entirely isolated, I 
cannot imagine the cost to the community to removing this work would be very 
high. But, I do not intend to argue Accord’s case here. I will let you all 
decide.

Please decide soon though, as it shapes our work planning. The positive 
reception so far had lead me to consider prioritising a move to trunk-first 
development within the next week or two, and the associated work that entails. 
However, if that was optimistic we will have to shift our plans.



> On 6 Mar 2025, at 20:16, Jordan West  wrote:
> 
> The work and effort in accord has been amazing. And I’m sure it sets a new 
> standard for code quality and correctness testing which I’m also entirely 
> behind. I also trust the folks working on it want to take it to the a fully 
> production ready solution. But I’m worried about circumstances out of our 
> control leaving us with a very complex feature that isn’t complete. 
> 
> I do have some questions. Could folks help me better understand why testing 
> real workloads necessitates a merge (my understanding from the original 
> reason is this is the impetus for why we would merge now)? Also I think the 
> performance and scheme change caveats are rather large ones. One of accords 
> promise was better performance and I think making schema changes with nodes 
> down not being supported is a big gap. Could we have some criteria like 
> “supports all the operations PaxosV2 supports” or “performs as well or better 
> than PaxosV2 on [workload(s)]”? 
> 
> I understand waiting asks a lot of the authors in terms of baring the burden 
> of a more complex merge. But I think we also need to consider what merging is 
> asking the community to bear if the worst happens and we are unable to take 
> the feature from its current state to something that can be widely used in 
> production.
> 
> 
> Jordan 
> 
> 
> On Wed, Mar 5, 2025 at 15:52 Blake Eggleston  > wrote:
>> +1 to merging it
>> 
>> On Wed, Mar 5, 2025, at 12:22 PM, Patrick McFadin wrote:
>>> You have my +1
>>> 
>>> On Wed, Mar 5, 2025 at 12:16 PM Benedict >> > wrote:
>>> >
>>> > Correct, these caveats should only apply to tables that have opted-in to 
>>> > accord.
>>> >
>>> > On 5 Mar 2025, at 20:08, Jeremiah Jordan >> > > wrote:
>>> >
>>> > 
>>> > So great to see all this hard work about to pay off!
>>> >
>>> > On the questions/concerns front, the only concern I would have towards 
>>> > merging this to trunk is if any of the caveats apply when someone is not 
>>> > using Accord.  Assuming they only apply when the feature flag is enabled, 
>>> > I see no reason not to get this merged into trunk once everyone involved 
>>> > is happy with the state of it.
>>> >
>>> > -Jeremiah
>>> >
>>> > On Mar 5, 2025 at 12:15:23 PM, Benedict Elliott Smith 
>>> > mailto:bened...@apache.org>> wrote:
>>> >>
>>> >> That depends on all of you lovely people :D
>>> >>
>>> >> I think we should have finished merging everything we want before QA by 
>>> >> ~Monday; certainly not much later.
>>> >>
>>> >> I think we have some upgrade and python dtest failures to address as 
>>> >> well.
>>> >>
>>> >> So it could be pretty soon if the community is supportive.
>>> >>
>>> >> On 5 Mar 2025, at 17:22, Patrick McFadin >> >> > wrote:
>>> >>
>>> >>
>>> >> What is the timing for starting the merge process? I'm asking because
>>> >>
>>> >> I have (yet another) presentation and this would be a cool update.
>>> >>
>>> >>
>>> >> On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith
>>> >>
>>> >> mailto:bened...@apache.org>> wrote:
>>> >>
>>> >> >
>>> >>
>>> >> > Thanks everyone.
>>> >>
>>> >> >
>>> >>
>>> >> > Jon - your help will be greatly appreciated. We’ll let you know when 
>>> >> > we’ve got the cycles to invest in performance work (hopefully fairly 
>>> >> > soon). I expect the first step will be improving visibility so we can 
>>> >> > better understand what the system is doing (particularly the caching 
>>> >> > layers), but we can dig in together when ready.
>>> >>
>>> >> >
>>> >>
>>> >> > On 4 Mar 2025, at 18:15, Jon Haddad >> >> > > wrote:
>>> >>
>>> >> >
>>> >>
>>> >> > Very exciting!
>>> >>
>>> >> >
>>> >>
>>> >> > I have a client that's ver

Re: [VOTE][IP CLEARANCE] Cassandra Cluster Manager (CCM)

2025-03-11 Thread Patrick McFadin
+1

On Mon, Mar 10, 2025 at 9:28 AM Dinesh Joshi  wrote:

> +1
>
> On Sun, Mar 9, 2025 at 5:18 AM Mick Semb Wever  wrote:
>
>> Please vote on the acceptance of the Cassandra Cluster Manager (CCM)
>> and its IP Clearance:
>> https://incubator.apache.org/ip-clearance/cassandra-ccm.html
>>
>> All consent from original authors of the donation, and tracking of
>> collected CLAs, is found in:
>>  - https://github.com/riptano/ccm/issues/773
>>  -
>> https://docs.google.com/spreadsheets/d/1lXDK3c7_-TZh845knVZ8zvJf65x2o03ACqY3pfdXZR8
>>
>> These do not require acknowledgement before the vote.
>>
>> The code is prepared for donation at https://github.com/riptano/ccm
>> (Only `master` and `cassandra-test` refs will be brought over.)
>>
>> Once this vote passes we will request ASF Infra to move the
>> riptano/ccm as-is to apache/cassandra-ccm  . The master branch and the
>> cassandra-test tag, with all its history, will be kept.  Because
>> consent and CLAs were not received from all original authors the
>> NOTICE file keeps additional reference to these earlier copyright
>> authors.
>>
>> PMC members, please check carefully the IP Clearance requirements before
>> voting.
>>
>> The vote will be open for 72 hours (or longer). Votes by PMC members
>> are considered binding. A vote passes if there are at least three
>> binding +1s and no -1's.
>>
>> regards,
>> Mick
>>
>


Re: Cassandra Java Driver and OpenJDK CRaC

2025-03-11 Thread Abe Ratnofsky
Hey Radim, thanks for bringing this to the list.

In general, I'm supportive of the second option you shared ("Exposing neutral 
methods") but want to make sure I understand how CRaC would work in practice.

Could you clarify this part:

> Naturally it is possible to close the session object completely and create a 
> new one, but the ideal solution would require no application changes beyond 
> dependency upgrade.

CRaC doesn't support checkpointing with open sockets, and the Cassandra client 
protocol requires a few roundtrips after connection establishment before a 
session can be used[1]. Would it be possible to have a separate CqlSession 
implementation that includes CRaC's checkpoint and restore hooks[2] to close 
and open the session at the appropriate times? This CRaC implementation could 
live as a third-party module initially as it is proven out.

[1]: 
https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec#L176
[2]: https://docs.azul.com/core/crac/crac-guidelines#implementing-crac-resource



Fwd: [CFP] Community Over Code NA 2025

2025-03-11 Thread Paulo Motta
Hi,

Please see message below with instructions on submitting talk proposals for
Community Over Code 2025.

*The deadline for submissions is April 21st 2025.*

Please note there is usually no deadline extension for this conference, so
I'd highly recommend submitting your proposals on time.

You can submit draft proposals and update title and abstract after the
deadline, in case your proposal is accepted.

Feel free to contact me or Brian if you have any questions.

Cheers,

Paulo

-- Forwarded message -
From: Brian Proffitt 
Date: Mon, 10 Mar 2025 at 10:26
Subject: [CFP] Community Over Code NA 2025
To:


All:

The call for presentations for the Community Over Code NA 2025 event is now
open[1]! Please submit proposals by 23:59 UTC on April 21, 2025.

Community Over Code NA is accepting presentation proposals for any topic
that is related to the ASF mission of producing free software for the
public good. These include:

   - AI Plumbers
   - Data (compute, engineering, and storage)
   - Cassandra
   - Community
   - Developer experience
   - Fintech: Building Secure Solutions
   - Geospatial
   - Groovy
   - Incubator
   - Industrial Internet of Things
   - Infrastructure
   - OpenLakehouse
   - Performance engineering
   - Search
   - Security
   - Streaming
   - Web servers/Tomcat

On behalf of the C/C NA planners, we look forward to receiving your ideas
for amazing content to present in Minneapolis this year!

Peace,
BKP

[1] https://sessionize.com/community-over-code-na-2025

Brian Proffitt
VP, Marketing & Publicity
VP, Conferences


Re: [VOTE] Release Apache Sidecar Cassandra 0.1.0

2025-03-11 Thread Francisco Guerrero
The vote has passed with three binding +1s and no vetoes

On 2025/02/28 20:30:38 Bernardo Botella wrote:
> +1 (nb)
> 
> Awesome milestone
> 
> On Fri, Feb 28, 2025 at 11:06 Josh McKenzie  wrote:
> 
> > +1 - great work everyone!
> >
> > On Fri, Feb 28, 2025, at 1:58 PM, Dinesh Joshi wrote:
> >
> > +1, thanks to everyone who worked towards this milestone.
> >
> > On Fri, Feb 28, 2025 at 10:47 AM Doug Rohrer  wrote:
> >
> > +1 (nb)
> >
> > Thanks for putting in the work to get this ready to go!
> >
> > Doug
> >
> > > On Feb 28, 2025, at 7:46 AM, Brandon Williams  wrote:
> > >
> > > +1, verified sigs/checksums, tested packaging.
> > >
> > > Minor note: the packages do not declare any deps (like java.)  This is
> > > probably not an issue in practice since nobody will run a dedicated
> > > 'sidecar machine' but still could be improved.
> > >
> > > Kind Regards,
> > > Brandon
> > >
> > > On Thu, Feb 27, 2025 at 4:15 PM Francisco Guerrero 
> > wrote:
> > >>
> > >> Proposing the test build of Cassandra Sidecar 0.1.0 for release.
> > >>
> > >> sha1: a2c19e8ccf04bd3ddbdf8ac4d792d2d55f2e497f
> > >> Git: https://github.com/apache/cassandra-sidecar/tree/0.1.0-tentative
> > >> Maven Artifacts:
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-server/0.1.0/
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-client/0.1.0/
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-adapters-cassandra41/0.1.0/
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-adapters-base/0.1.0/
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-vertx-client/0.1.0/
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-client-common/0.1.0/
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-vertx-client-all/0.1.0/
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-server-common/0.1.0/
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-vertx-auth-mtls/0.1.0/
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1387/org/apache/cassandra/sidecar-client/0.1.0-jdk8/
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1387/org/apache/cassandra/sidecar-vertx-client/0.1.0-jdk8/
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1387/org/apache/cassandra/sidecar-client-common/0.1.0-jdk8/
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1387/org/apache/cassandra/sidecar-vertx-client-all/0.1.0-jdk8/
> > >>
> > >> The Source and Build Artifacts, and the Debian and RPM packages and
> > repositories, are available here:
> > >>
> > https://dist.apache.org/repos/dist/dev/cassandra/cassandra-sidecar/0.1.0/
> > >>
> > >> The vote will be open for 72 hours (longer if needed). Everyone who has
> > tested the build is invited to vote. Votes by PMC members are considered
> > binding. A vote passes if there are at least three binding +1s and no -1's.
> > >>
> > >> [1]: CHANGES.txt:
> > https://github.com/apache/cassandra-sidecar/blob/0.1.0-tentative/CHANGES.txt
> > >> [2]: NEWS.txt:
> > https://github.com/apache/cassandra-sidecar/blob/0.1.0-tentative/NEWS.txt
> >
> >
> >
> 


Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-11 Thread Josh McKenzie
> Having something like a registry and standardizing/enforcing all metric types 
> is something we should be sure to maintain.
A registry w/documentation on each metric indicating *what it's actually 
measuring and what it means* would be great for our users.

On Mon, Mar 10, 2025, at 3:46 PM, Chris Lohfink wrote:
> Just something to be mindful about what we had *before* codahale in Cassandra 
> and avoid that again. Pre 1.1 it was pretty much impossible to collect 
> metrics without looking at code (there were efficient custom made things, but 
> each metric was reported differently) and that stuck through until 2.2 days. 
> Having something like a registry and standardizing/enforcing all metric types 
> is something we should be sure to maintain.
> 
> Chris
> 
> On Fri, Mar 7, 2025 at 1:33 PM Jon Haddad  wrote:
>> As long as operators are able to use all the OTel tooling, I'm happy.  I'm 
>> not looking to try to decide what the metrics API looks like, although I 
>> think trying to plan for 15 years out is a bit unnecessary. A lot of the DB 
>> will be replaced by then.  That said, I'm mostly hands off on code and you 
>> guys are more than capable of making the smart decision here.
>> 
>> Regarding virtual tables, I'm looking at writing a custom OTel receiver [1] 
>> to ingest them.  I was really impressed with the performance work you did 
>> there and it got my wheels turning on how to best make use of it.  I am 
>> planning on using it with easy-cass-lab to pull DB metrics and logs down to 
>> my local machine along with kernel metrics via eBPF. 
>> 
>> Jon
>> 
>> [1] https://opentelemetry.io/docs/collector/building/receiver/
>> 
>> 
>> 
>> On Wed, Mar 5, 2025 at 1:06 PM Maxim Muzafarov  wrote:
>>> If we do swap, we may run into the same issues with third-party
>>> metrics libraries in the next 10-15 years that we are discussing now
>>> with the Codahale we added ~10-15 years ago, and given the fact that a
>>> proposed new API is quite small my personal feeling is that it would
>>> be our best choice for the metrics.
>>> 
>>> Having our own API also doesn't prevent us from having all the
>>> integrations with new 3-rd party libraries the world will develop in
>>> future, just by writing custom adapters to our own -- this will be
>>> possible for the Codahale (with some suboptimal considerations), where
>>> we have to support backwards compatibility, and for the OpenTelemetry
>>> as well. We already have the CEP-32[1] proposal to instrument metrics;
>>> in this sense, it doesn't change much for us.
>>> 
>>> Another point of having our own API is the virtual tables we have --
>>> it gives us enough flexibility and latitude to export the metrics
>>> efficiently via the virtual tables by implementing the access patterns
>>> we consider important.
>>> 
>>> [1] 
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255071749#CEP32:(DRAFT)OpenTelemetryintegration-ExportingMetricsthroughOpenTelemetry
>>> [2 https://opentelemetry.io/docs/languages/java/instrumentation/
>>> 
>>> On Wed, 5 Mar 2025 at 21:35, Jeff Jirsa  wrote:
>>> >
>>> > I think widely accepted that otel in general has won this stage of 
>>> > observability, as most metrics systems allow it and most saas providers 
>>> > support it. So Jon’s point there is important.
>>> >
>>> > The promise of unifying logs/traces/metrics usually (aka wide events) is 
>>> > far more important in the tracing side of our observability than in the 
>>> > areas we use Codahale/DropWizard.
>>> >
>>> > Scott: if we swap, we can (probably should) deprecate like everything 
>>> > else, and run both side by side for a release so people don’t lose 
>>> > metrics entirely on bounce? FF both, to control double cost during the 
>>> > transition.
>>> >
>>> >
>>> >
>>> >
>>> > On Mar 5, 2025, at 8:21 PM, C. Scott Andreas  wrote:
>>> >
>>> > No strong opinion on particular choice of metrics library.
>>> >
>>> > My primary feedback is that if we swap metrics implementations and the 
>>> > new values are *different*, we can anticipate broad user 
>>> > confusion/interest.
>>> >
>>> > In particular if latency stats are reported higher post-upgrade, we 
>>> > should expect users to interpret this as a performance regression, 
>>> > dedicating significant resources to investigating the change, and 
>>> > expending credibility with stakeholders in their systems.
>>> >
>>> > - Scott
>>> >
>>> > On Mar 5, 2025, at 11:57 AM, Benedict  wrote:
>>> >
>>> > 
>>> > I really like the idea of integrating tracing, metrics and logging 
>>> > frameworks.
>>> >
>>> > I would like to have the time to look closely at the API before we decide 
>>> > to adopt it though. I agree that a widely deployed API has inherent 
>>> > benefits, but any API we adopt also shapes future evolution of our 
>>> > capabilities. Hopefully this is also a good API that allows us plenty of 
>>> > evolutionary headroom.
>>> >
>>> >
>>> > On 5 Mar 2025, at 19:45, Josh McKenzie  wrote:
>>> >
>>> > 
>>> >

Re: [VOTE][IP CLEARANCE] Cassandra Cluster Manager (CCM)

2025-03-11 Thread Dinesh Joshi
+1

On Sun, Mar 9, 2025 at 5:18 AM Mick Semb Wever  wrote:

> Please vote on the acceptance of the Cassandra Cluster Manager (CCM)
> and its IP Clearance:
> https://incubator.apache.org/ip-clearance/cassandra-ccm.html
>
> All consent from original authors of the donation, and tracking of
> collected CLAs, is found in:
>  - https://github.com/riptano/ccm/issues/773
>  -
> https://docs.google.com/spreadsheets/d/1lXDK3c7_-TZh845knVZ8zvJf65x2o03ACqY3pfdXZR8
>
> These do not require acknowledgement before the vote.
>
> The code is prepared for donation at https://github.com/riptano/ccm
> (Only `master` and `cassandra-test` refs will be brought over.)
>
> Once this vote passes we will request ASF Infra to move the
> riptano/ccm as-is to apache/cassandra-ccm  . The master branch and the
> cassandra-test tag, with all its history, will be kept.  Because
> consent and CLAs were not received from all original authors the
> NOTICE file keeps additional reference to these earlier copyright
> authors.
>
> PMC members, please check carefully the IP Clearance requirements before
> voting.
>
> The vote will be open for 72 hours (or longer). Votes by PMC members
> are considered binding. A vote passes if there are at least three
> binding +1s and no -1's.
>
> regards,
> Mick
>


Re: Cassandra Java Driver and OpenJDK CRaC

2025-03-11 Thread Patrick McFadin
Just speaking up as a supporter for considering this change. From a
userland perspective, I've been reading up on CRaC, and I see this
attacking some of the same requirements that Graal and Quarkus are trying
to solve. This is a worth direction to pursue.

The CqlSession will need to re-connect, and I think that's worth testing.
Topology information shouldn't be assumed, especially with something like
Token-Aware Routing. Some shortcuts could speed it up, but I can't think of
any right now. I like the idea of making it optional and putting it through
some scenarios.

Patrick

On Mon, Mar 10, 2025 at 8:03 AM Radim Vansa  wrote:

> Hello Josh,
> thanks for reaching back; answers inline:
>
> On 10. 03. 25 13:03, Josh McKenzie wrote:
>
>
> From skimming the PR on the Spring side and the conversation there, it
> looks like the argument is to have this live inside the java driver for
> Cassandra instead of in the spring-boot lib which I can see the argument
> for.
>
>
> Yes; for us it does not really matter where the fix lives as long as it's
> available for the end users. Pushing it towards Cassandra has the advantage
> to provide the greatest fan-out to users, even those not consuming through
> frameworks.
>
>
> If we distill this to speak to precisely the problem we're trying to
> address or improvement we're going for here, how would you phrase that?
> i.e. "Take application startup from Nms down to Mms"?
>
>
> Yes, optimizing startup time is the most common use-case for CRaC. It's
> rather hard to provide such general numbers: it should be order(s) of
> magnitude. If we speak about hello-world style Spring Boot application
> booting, CRaC improves the startup from seconds to tens of milliseconds.
> That shouldn't differ too much from the expected times for a small
> micro-service, improving latency in scale-from-zero situations. This is not
> limited to microservices, though; we've been experimenting with real
> applications consuming hundreds of GB of memory. In that case the
> application boot can be rather complex, loading and pre-processing data
> from DB etc. where the boot takes minutes or more. CRaC can restore such
> instance in a few seconds.
>
>
> I ask because that's the "pro" we'll need to weigh against updating the
> driver's topology map of the cluster, resource handling and potential leaks
> on shutdown/startup, and the complexity of taking an implementation like
> this into the driver code. Nothing insurmountable of course, just worth
> weighing the two.
>
>
> Can you elaborate about other use cases where the nodes are forced down,
> and what risk does that bring to the overall stability? Is there a
> difference between marking only a subset of nodes down and taking all of
> the nodes down? When we force-close the control connection (as the first
> step), is it possible to get a topology update at all and race on the
> cluster members?
>
> Thank you!
>
> Radim
>
>
>
> On Thu, Mar 6, 2025, at 3:34 PM, Radim Vansa wrote:
>
> Hi all,
>
> I would like to make applications using Cassandra Java Driver,
> particularly those built with Spring Boot, Quarkus or similar
> frameworks, work with OpenJDK CRaC project [1]. I've already created a
> patch for Spring Boot [2] but Spring folks think that these changes are
> too dependent on driver internals, suggesting to contribute a support to
> Cassandra directly.
>
> The patch involves closing all connections before checkpoint, and
> re-establishing these after restore. I have implemented that though
> sending a `NodeStateEvent -> FORCED_DOWN` on the bus for all connected
> nodes. As a follow-up I could develop some way to inform the session
> about a new topology e.g. if the cluster addresses change.
>
> Before jumping onto implementing a PR I would like to ask what you think
> is the best approach to do this. I can think of two ways:
>
> 1) Native CRaC support
>
> The driver would have a dependency on `org.crac:crac` [3]; this is a
> small (13kB) library that provides the interfaces and a dummy noop
> implementation if the target JVM does not support CRaC. Then
> `DefaultSession` would register a `org.crac.Resource` implementation
> that would handle the checkpoint. This has the advantage of providing
> best fan-out into any project consuming the driver without any further
> work.
>
> 2) Exposing neutral methods
>
> To save frameworks of relying on internals, `DefaultSession` would
> expose `.suspend()` and `.resume()` methods that would implement the
> connection cut-off without importing any dependency. After upgrade to
> latest release, frameworks could use these methods in a way that suits
> them. I wouldn't add those methods to the `CqlSession` interface (as
> that would be breaking change) but only to `DefaultSession`.
>
> Would Cassandra accept either of these, to let people checkpoint
> (snapshot) their applications and restore them within tens of
> milliseconds? Naturally it is possible to close the session object
> completely and create a new o

Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-11 Thread Chris Lohfink
Just something to be mindful about what we had *before* codahale in
Cassandra and avoid that again. Pre 1.1 it was pretty much impossible to
collect metrics without looking at code (there were efficient custom made
things, but each metric was reported differently) and that stuck through
until 2.2 days. Having something like a registry and
standardizing/enforcing all metric types is something we should be sure to
maintain.

Chris

On Fri, Mar 7, 2025 at 1:33 PM Jon Haddad  wrote:

> As long as operators are able to use all the OTel tooling, I'm happy.  I'm
> not looking to try to decide what the metrics API looks like, although I
> think trying to plan for 15 years out is a bit unnecessary. A lot of the DB
> will be replaced by then.  That said, I'm mostly hands off on code and you
> guys are more than capable of making the smart decision here.
>
> Regarding virtual tables, I'm looking at writing a custom OTel receiver
> [1] to ingest them.  I was really impressed with the performance work you
> did there and it got my wheels turning on how to best make use of it.  I am
> planning on using it with easy-cass-lab to pull DB metrics and logs down to
> my local machine along with kernel metrics via eBPF.
>
> Jon
>
> [1] https://opentelemetry.io/docs/collector/building/receiver/
>
>
>
> On Wed, Mar 5, 2025 at 1:06 PM Maxim Muzafarov  wrote:
>
>> If we do swap, we may run into the same issues with third-party
>> metrics libraries in the next 10-15 years that we are discussing now
>> with the Codahale we added ~10-15 years ago, and given the fact that a
>> proposed new API is quite small my personal feeling is that it would
>> be our best choice for the metrics.
>>
>> Having our own API also doesn't prevent us from having all the
>> integrations with new 3-rd party libraries the world will develop in
>> future, just by writing custom adapters to our own -- this will be
>> possible for the Codahale (with some suboptimal considerations), where
>> we have to support backwards compatibility, and for the OpenTelemetry
>> as well. We already have the CEP-32[1] proposal to instrument metrics;
>> in this sense, it doesn't change much for us.
>>
>> Another point of having our own API is the virtual tables we have --
>> it gives us enough flexibility and latitude to export the metrics
>> efficiently via the virtual tables by implementing the access patterns
>> we consider important.
>>
>> [1]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255071749#CEP32:(DRAFT)OpenTelemetryintegration-ExportingMetricsthroughOpenTelemetry
>> [2 https://opentelemetry.io/docs/languages/java/instrumentation/
>>
>> On Wed, 5 Mar 2025 at 21:35, Jeff Jirsa  wrote:
>> >
>> > I think widely accepted that otel in general has won this stage of
>> observability, as most metrics systems allow it and most saas providers
>> support it. So Jon’s point there is important.
>> >
>> > The promise of unifying logs/traces/metrics usually (aka wide events)
>> is far more important in the tracing side of our observability than in the
>> areas we use Codahale/DropWizard.
>> >
>> > Scott: if we swap, we can (probably should) deprecate like everything
>> else, and run both side by side for a release so people don’t lose metrics
>> entirely on bounce? FF both, to control double cost during the transition.
>> >
>> >
>> >
>> >
>> > On Mar 5, 2025, at 8:21 PM, C. Scott Andreas 
>> wrote:
>> >
>> > No strong opinion on particular choice of metrics library.
>> >
>> > My primary feedback is that if we swap metrics implementations and the
>> new values are *different*, we can anticipate broad user confusion/interest.
>> >
>> > In particular if latency stats are reported higher post-upgrade, we
>> should expect users to interpret this as a performance regression,
>> dedicating significant resources to investigating the change, and expending
>> credibility with stakeholders in their systems.
>> >
>> > - Scott
>> >
>> > On Mar 5, 2025, at 11:57 AM, Benedict  wrote:
>> >
>> > 
>> > I really like the idea of integrating tracing, metrics and logging
>> frameworks.
>> >
>> > I would like to have the time to look closely at the API before we
>> decide to adopt it though. I agree that a widely deployed API has inherent
>> benefits, but any API we adopt also shapes future evolution of our
>> capabilities. Hopefully this is also a good API that allows us plenty of
>> evolutionary headroom.
>> >
>> >
>> > On 5 Mar 2025, at 19:45, Josh McKenzie  wrote:
>> >
>> > 
>> >
>> > if the plan is to rip out something old and unmaintained and replace
>> with something new, I think there's a huge win to be had by implementing
>> the standard that everyone's using now.
>> >
>> > Strong +1 on anything that's an ecosystem integration inflection point.
>> The added benefit here is that if we architect ourselves to gracefully
>> integrate with whatever system's are ubiquitous today, we'll inherit the
>> migration work that any new industry-wide replacement system would need to
>

Re: CEP-15 Update

2025-03-11 Thread Jordan West
Merging is certainly not blocked on my account. Benedict, I wouldn’t
describe myself as disappointed. It’s awesome work and I’ve tried to
acknowledge the amazing correctness testing that’s been done. I think we
should have a high bar for big changes like this and I was curious about
how we will address some of the issues that concern me. I would’ve
personally like to see a bit more written on how but we don’t currently
have a good structure for my ask and I recognize that.

Jordan

On Mon, Mar 10, 2025 at 06:01 Alex Petrov  wrote:

> While I agree that time spent working on a feature is not necessarily a
> clear indicator of maturity, one can judge the scope of work and thought
> that went into Accord by both its separate repository, and the working
> branch.
>
> I think that merging/accepting SASI was not a mistake. There were several
> efforts to make it work, and back in 2016 we could've made it quite viable
> with just CASSANDRA-11990 and a lot of testing. It did get superseded by
> SAI, but I can imagine a universe where SASI would have been developed into
> a stable feature.
>
> > is there a known path forward to fix the drop schema w nodes down issue
> and anything written on it?
> Yes, there is a clear known path for fixing schema changes, and gladly
> they do not require a protocol change, just a slightly deeper integration
> with TCM.
>
>
> On Fri, Mar 7, 2025, at 4:44 PM, Jordan West wrote:
>
> I would love to have my questions answered and see some graphs I don’t
> think those are unreasonable asks nor do they take away from the awesome
> work done. I was suggesting 1-2 weeks for folks to have the opportunity to
> produce that data if the original authors didn’t have time. I also don’t
> think that’s unreasonable. but to be clear I’m not blocking anything. If
> folks want to merge I am not objecting.
>
> I do think we should hold features to a high standard and personally “time
> worked on a feature” is not a criteria for me when considering why we
> should merge. It is absolutely worth recognizing and celebrating the
> massive invest and effort made here. It’s just an orthogonal point to me.
> As a contrived example: If 15452 was not as impactful performance wise
> after a year of on and off work I would’ve happily continue to address it
> or take a different approach. SASI took a year and a half or more and I
> still regret that we merged it into 3.x in the form we did using the same
> early contribution model. That was an example of an extreme, and out of our
> control case, of an entire team disbanding right after merge.
>
> Jordan
>
> On Fri, Mar 7, 2025 at 06:28 Jon Haddad  wrote:
>
> I defer to the judgement of the folks that are most impacted by it - ones
> that are in the code, working on the next release.  If you all think it's
> good to merge, then I am 100% in support of it.  I suspect merging will
> help get it out faster, and I don't see any future in which we don't ship
> this in the next release.
>
> I will be happy to help answer the "how does it compare to paxos v2"
> question post-merge.
>
> Jon
>
>
>
> On Fri, Mar 7, 2025 at 5:52 AM Josh McKenzie  wrote:
>
>
> 3.5 years is an incredible amount of time and work; it really is
> significant and thanks to everyone involved for the investment of time and
> energy.
>
> We have a rocky history with large, disruptive contributions in the past
> that have either blocked forward progress post-merge (CASSANDRA-8099), or
> lingered in the code-base increasing maintenance burden on other
> contributors for minimal or no user benefit (early open post SSD
> transition, witness replicas, materialized views). I'm sympathetic to where
> Jordan's questions stem from, as our history of leaving things in the
> codebase long after they've become vestigial or abandoned has slowed down
> our collective momentum maintaining the project on actively used features.
>
> That said, I don't think Accord will run afoul of some of those same
> patterns. Aside from the degree of investment already in it and sheer
> number of pmc members and committers involved, I believe it's a feature
> that's universally impactful and that if we had a metaphorical bus-factor
> change (entire group of people working on it disappeared the day after
> merge or decided to go on vacation for 5 years), others in the community
> would be willing to pick things up and keep it moving given its proximity
> to release readiness.
>
> The 2 questions Jordan asked resonate with me: 1) do we have line of sight
> to a fix on the schema issues, and I'll take the liberty of reframing 2) do
> we have line of sight to improvement on the performance front to be usable
> for multi-key transactions? (subtle: I don't think "parity with PaxosV2" is
> the right target, but rather "fast enough to be usable for multi-key
> transactions" since it's a new query paradigm).
>
> Given the context on contributor backing and if the answer is yes to those
> 2 questions (which I believe it is), I think we sho

Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-11 Thread Jon Haddad
Definitely +1 on registry + docs.  I believe that's part of the OTel Java
SDK [1][2]

I did some performance testing yesterday and was able to replicate the
findings where the codahale code path took 7-10% of CPU time.  The only
caveat is that it only happens with compaction disabled.  Once compaction
is enabled, it's in the 2-3% realm.  Allocations related to codahale were <
1%.

I'm not discouraging anyone from pursuing performance optimizations, just
looking to set expectations on what the real world benefits will be.  This
will likely yield a ~ 2% improvement in throughput based on the earlier
discussion.

For comparison, eliminating a single byte buffer allocation in
ByteArrayAccessor.read in the BTree code path would reduce heap allocations
by 40%, with default compaction throughput of 64MB/s.  Addressing this, in
conjunction with the recently merged CASSANDRA-15452 + Branimir's
CASSANDRA-20092 patch, would allow for much faster compaction which in
turn, would improve density, and significantly reduce latency.  If you're
chasing perf issues, this is one of the top problems in the codebase.



Jon

[1] https://opentelemetry.io/docs/languages/java/api/#meterprovider
[2] https://opentelemetry.io/docs/specs/semconv/attributes-registry/



On Tue, Mar 11, 2025 at 8:02 AM Josh McKenzie  wrote:

> Having something like a registry and standardizing/enforcing all metric
> types is something we should be sure to maintain.
>
> A registry w/documentation on each metric indicating *what it's actually
> measuring and what it means* would be great for our users.
>
> On Mon, Mar 10, 2025, at 3:46 PM, Chris Lohfink wrote:
>
> Just something to be mindful about what we had *before* codahale in
> Cassandra and avoid that again. Pre 1.1 it was pretty much impossible to
> collect metrics without looking at code (there were efficient custom made
> things, but each metric was reported differently) and that stuck through
> until 2.2 days. Having something like a registry and
> standardizing/enforcing all metric types is something we should be sure to
> maintain.
>
> Chris
>
> On Fri, Mar 7, 2025 at 1:33 PM Jon Haddad  wrote:
>
> As long as operators are able to use all the OTel tooling, I'm happy.  I'm
> not looking to try to decide what the metrics API looks like, although I
> think trying to plan for 15 years out is a bit unnecessary. A lot of the DB
> will be replaced by then.  That said, I'm mostly hands off on code and you
> guys are more than capable of making the smart decision here.
>
> Regarding virtual tables, I'm looking at writing a custom OTel receiver
> [1] to ingest them.  I was really impressed with the performance work you
> did there and it got my wheels turning on how to best make use of it.  I am
> planning on using it with easy-cass-lab to pull DB metrics and logs down to
> my local machine along with kernel metrics via eBPF.
>
> Jon
>
> [1] https://opentelemetry.io/docs/collector/building/receiver/
>
>
>
> On Wed, Mar 5, 2025 at 1:06 PM Maxim Muzafarov  wrote:
>
> If we do swap, we may run into the same issues with third-party
> metrics libraries in the next 10-15 years that we are discussing now
> with the Codahale we added ~10-15 years ago, and given the fact that a
> proposed new API is quite small my personal feeling is that it would
> be our best choice for the metrics.
>
> Having our own API also doesn't prevent us from having all the
> integrations with new 3-rd party libraries the world will develop in
> future, just by writing custom adapters to our own -- this will be
> possible for the Codahale (with some suboptimal considerations), where
> we have to support backwards compatibility, and for the OpenTelemetry
> as well. We already have the CEP-32[1] proposal to instrument metrics;
> in this sense, it doesn't change much for us.
>
> Another point of having our own API is the virtual tables we have --
> it gives us enough flexibility and latitude to export the metrics
> efficiently via the virtual tables by implementing the access patterns
> we consider important.
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255071749#CEP32:(DRAFT)OpenTelemetryintegration-ExportingMetricsthroughOpenTelemetry
> [2 https://opentelemetry.io/docs/languages/java/instrumentation/
>
> On Wed, 5 Mar 2025 at 21:35, Jeff Jirsa  wrote:
> >
> > I think widely accepted that otel in general has won this stage of
> observability, as most metrics systems allow it and most saas providers
> support it. So Jon’s point there is important.
> >
> > The promise of unifying logs/traces/metrics usually (aka wide events) is
> far more important in the tracing side of our observability than in the
> areas we use Codahale/DropWizard.
> >
> > Scott: if we swap, we can (probably should) deprecate like everything
> else, and run both side by side for a release so people don’t lose metrics
> entirely on bounce? FF both, to control double cost during the transition.
> >
> >
> >
> >
> > On Mar 5, 2025, at 8:21 PM,

Re: CEP-15 Update

2025-03-11 Thread Nate McCall
It sounds like we are all pretty interested in seeing this feature land and
the branch maintenance is causing overhead that could be spent on
finalisation. +1 on merging, particularly given the feature flag work.

Once more unto the breach 💪

On Fri, 7 Mar 2025 at 6:56 PM, Benedict  wrote:

> There are essentially three possible timelines to choose from here:
>
> 1) We agree in the next few days to merge to trunk. We will then
> prioritise rebasing onto trunk and resolving any pre-merge items starting
> next week.
> 2) There’s some more debate and agreement to merge to trunk in a week or
> two. In the meantime we will shift to internal-first development but we’ll
> likely prioritise the above work as soon as we can, which may be in a few
> weeks, so we can shift to trunk first development.
> 3) We don’t agree to merge accord anytime soon, so we shift to
> internal-first development for the time being. I’m not sure when we will
> prioritise any of the above.
>
> Our resources are finite and we’ve exhausted them (literally), so it’s
> pretty much pick one of the above. I don’t really mind which you pick, but
> I won’t personally be prioritising merge after this third attempt.
>
> On 6 Mar 2025, at 22:01, Jon Haddad  wrote:
>
> 
>
> Hmm... I took a look at the cep-15-accord branch in GitHub, it looks like
> it's several hundred commits behind trunk.  Since you'll need to rebase
> again before merge *anyways*, would it make sense to do it once more, and I
> can publish easy-cass-lab with the latest branch?  If folks have concerns,
> it's easy to fire up a cluster (I do it constantly) and try it out.
>
> I think if we were to do this, out of consideration we should time box the
> amount of time for an evaluation and unless someone raises an objection,
> consider lazy consensus achieved.
>
> Jon
>
>
>
> On Thu, Mar 6, 2025 at 12:46 PM Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> Because we want to validate against the latest code in trunk, else we are
>> validating stale behaviours. The cost of rebasing is high, so we do not do
>> it frequently. That means we will likely stop developing OSS-first, as the
>> focus will have to move to our internal branch that satisfies these
>> criteria.
>>
>> Exactly what this might be for upstreaming I cannot say. Personally, I
>> aim to work exclusively on the branch we are stabilising. If that is not
>> trunk, the latency for my contributions being made public might be high, as
>> I have a huge imbalance of over-investment to recoup, and anything
>> unnecessary will be deferred.
>>
>> Since the feature is disabled, and the code is almost entirely isolated,
>> I cannot imagine the cost to the community to removing this work would be
>> very high. But, I do not intend to argue Accord’s case here. I will let you
>> all decide.
>>
>> Please decide soon though, as it shapes our work planning. The positive
>> reception so far had lead me to consider prioritising a move to trunk-first
>> development within the next week or two, and the associated work that
>> entails. However, if that was optimistic we will have to shift our plans.
>>
>>
>>
>> On 6 Mar 2025, at 20:16, Jordan West  wrote:
>>
>> The work and effort in accord has been amazing. And I’m sure it sets a
>> new standard for code quality and correctness testing which I’m also
>> entirely behind. I also trust the folks working on it want to take it to
>> the a fully production ready solution. But I’m worried about circumstances
>> out of our control leaving us with a very complex feature that isn’t
>> complete.
>>
>> I do have some questions. Could folks help me better understand why
>> testing real workloads necessitates a merge (my understanding from the
>> original reason is this is the impetus for why we would merge now)? Also I
>> think the performance and scheme change caveats are rather large ones. One
>> of accords promise was better performance and I think making schema changes
>> with nodes down not being supported is a big gap. Could we have some
>> criteria like “supports all the operations PaxosV2 supports” or “performs
>> as well or better than PaxosV2 on [workload(s)]”?
>>
>> I understand waiting asks a lot of the authors in terms of baring the
>> burden of a more complex merge. But I think we also need to consider what
>> merging is asking the community to bear if the worst happens and we are
>> unable to take the feature from its current state to something that can be
>> widely used in production.
>>
>>
>> Jordan
>>
>>
>> On Wed, Mar 5, 2025 at 15:52 Blake Eggleston 
>> wrote:
>>
>>> +1 to merging it
>>>
>>> On Wed, Mar 5, 2025, at 12:22 PM, Patrick McFadin wrote:
>>>
>>> You have my +1
>>>
>>> On Wed, Mar 5, 2025 at 12:16 PM Benedict  wrote:
>>> >
>>> > Correct, these caveats should only apply to tables that have opted-in
>>> to accord.
>>> >
>>> > On 5 Mar 2025, at 20:08, Jeremiah Jordan  wrote:
>>> >
>>> > 
>>> > So great to see all this hard work about to pay off!
>>> >
>>> > On the quest

Re: CEP-15 Update

2025-03-11 Thread C. Scott Andreas
I’m also supportive of proceeding to merge.The CEP contributors have worked incredibly hard on the protocol and feature and I’m proud it’s reached this point.If the concern is trunk stability - we have been deploying trunk-derived builds for over six months and will continue to do so post-merge. Maintaining a stable trunk is incredibly important to me and will remain so. I’m not sure if others have taken the step of deploying trunk. But it’s helped us immensely in maintaining continuous and incremental confidence in what’s landing in top of tree, and enabled us to identify and resolve issues much more quickly. The CEP authors have a huge interest in maintaining a stable trunk.Most databases would have a build engineering team the size of the CEP team focused on rebases and integration. I can attest to the challenge of continuing to maintain separate source trees.I believe the work has reached a level of completeness that integration into trunk is the best next step for continued development, performance, and polish.I also believe there is more value generated for the project and its users by proceeding to merge than delaying to a potential future merge window which may be some time away.– ScottOn Mar 6, 2025, at 9:54 PM, Benedict  wrote:There are essentially three possible timelines to choose from here: 1) We agree in the next few days to merge to trunk. We will then prioritise rebasing onto trunk and resolving any pre-merge items starting next week.2) There’s some more debate and agreement to merge to trunk in a week or two. In the meantime we will shift to internal-first development but we’ll likely prioritise the above work as soon as we can, which may be in a few weeks, so we can shift to trunk first development.3) We don’t agree to merge accord anytime soon, so we shift to internal-first development for the time being. I’m not sure when we will prioritise any of the above.Our resources are finite and we’ve exhausted them (literally), so it’s pretty much pick one of the above. I don’t really mind which you pick, but I won’t personally be prioritising merge after this third attempt.On 6 Mar 2025, at 22:01, Jon Haddad  wrote:Hmm... I took a look at the cep-15-accord branch in GitHub, it looks like it's several hundred commits behind trunk.  Since you'll need to rebase again before merge *anyways*, would it make sense to do it once more, and I can publish easy-cass-lab with the latest branch?  If folks have concerns, it's easy to fire up a cluster (I do it constantly) and try it out.I think if we were to do this, out of consideration we should time box the amount of time for an evaluation and unless someone raises an objection, consider lazy consensus achieved.JonOn Thu, Mar 6, 2025 at 12:46 PM Benedict Elliott Smith  wrote:Because we want to validate against the latest code in trunk, else we are validating stale behaviours. The cost of rebasing is high, so we do not do it frequently. That means we will likely stop developing OSS-first, as the focus will have to move to our internal branch that satisfies these criteria.Exactly what this might be for upstreaming I cannot say. Personally, I aim to work exclusively on the branch we are stabilising. If that is not trunk, the latency for my contributions being made public might be high, as I have a huge imbalance of over-investment to recoup, and anything unnecessary will be deferred.Since the feature is disabled, and the code is almost entirely isolated, I cannot imagine the cost to the community to removing this work would be very high. But, I do not intend to argue Accord’s case here. I will let you all decide.Please decide soon though, as it shapes our work planning. The positive reception so far had lead me to consider prioritising a move to trunk-first development within the next week or two, and the associated work that entails. However, if that was optimistic we will have to shift our plans.On 6 Mar 2025, at 20:16, Jordan West  wrote:The work and effort in accord has been amazing. And I’m sure it sets a new standard for code quality and correctness testing which I’m also entirely behind. I also trust the folks working on it want to take it to the a fully production ready solution. But I’m worried about circumstances out of our control leaving us with a very complex feature that isn’t complete. I do have some questions. Could folks help me better understand why testing real workloads necessitates a merge (my understanding from the original reason is this is the impetus for why we would merge now)? Also I think the performance and scheme change caveats are rather large ones. One of accords promise was better performance and I think making schema changes with nodes down not being supported is a big gap. Could we have some criteria like “supports all the operations PaxosV2 supports” or “performs as well or better than PaxosV2 on [workload(s)]”? I understand waiting asks a lot of the authors in terms of baring the bu

Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-11 Thread Jon Haddad
Absolutely, happy to share.  All tests were done using easy-cass-stress v9
and easy-cass-lab, with the latest released 5.0 (not including 15452 or
20092).  Instructions at the end.

> Regarding allocation rate vs throughput, unfortunately allocation rate vs
throughput are not connected linearly,

Yes, agreed, they're not linearly related.  However, allocation rate does
correlate linearly to GC pause frequency, and does increase GC pause time.
When you increase your write throughput, you put more pressure on
compaction.  In order to keep up, you need to increase compaction
throughput.  This leads to excess allocation, and the longer pauses.  For
teams with a low SLO (say10ms p99), compaction allocation becomes one of
the factors that prevent them from increasing node density due to it's
effect on GC pause times.  Reducing the allocation rate will allow for much
faster compaction with less impact on GC.

> So, while I agree that the mentioned compaction logic (cells
deserializing) is a subject to improve from an allocation point of view I
am not sure if we get dramatic improvements in throughput just because of
reducing it..

I am _quite_ confident that reducing the total allocation in Cassandra by
almost 50% we will see a _significant_ performance improvement, but
obviously we need hard numbers, not just my gut feelings and unbridled
confidence.

I'll have to dig up the profile, I'm switching between a bunch of tests and
sadly I didn't label all of them, I collected quite a few.  The % number I
referenced was from a different load test that I looked up several days ago
earlier in the thread, and I have several hundred of them hanging around.

Here's the process of setting up the cluster with easy-cass-lab (I have ecl
aliased to easy-cass-lab on my laptop):

mkdir test
cd test
ecl init -i r5d.2xlarge -c 3 -s 1 test
ecl up
ecl use 5.0
cat <<'EOF' >> cassandra.patch.yaml

memtable:
  configurations:
skiplist:
  class_name: SkipListMemtable
trie:
  class_name: TrieMemtable
default:
  inherits: trie

memtable_offheap_space: 8GiB
memtable_allocation_type: offheap_objects
EOF



Then apply these JVM settings to the jvm.options file in the local dir:

### G1 Settings
## Use the Hotspot garbage-first collector.
-XX:+UseG1GC
-XX:+ParallelRefProcEnabled
-XX:MaxTenuringThreshold=2
-XX:G1HeapRegionSize=16m
-XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=50
-Xms30G
-Xmx30G

#
## Have the JVM do less remembered set work during STW, instead
## preferring concurrent GC. Reduces p99.9 latency.
-XX:G1RSetUpdatingPauseTimePercent=5
#
## Main G1GC tunable: lowering the pause target will lower throughput and
vise versa.
## 200ms is the JVM default and lowest viable setting
## 1000ms increases throughput. Keep it smaller than the timeouts in
cassandra.yaml.
-XX:MaxGCPauseMillis=200


Then have it update the configs and start the cluster:

ecl uc
ecl start
source env.sh

You can disable compaction on a one node:

c0 nodetool disableautocompaction

Connect to the stress instance using the shortcut defined in env.sh:

s0

Running the stress workload is best done with shenandoah and java 17 to
avoid long pauses:

sudo update-java-alternatives -s java-1.17.0-openjdk-amd64
export EASY_CASS_STRESS_OPTS="-XX:+UseShenandoahGC"

Here's a workload that's writes only, very small values:

easy-cass-stress run KeyValue -d 1h --field.keyvalue.value='random(4,8)'
--maxwlat 50 --rate 200k -r 0

Let that ramp up for a bit.

Then back in your local dir, (make sure you source env.sh first)

cflame cassandra0

It'll take a profile and run for a minute.

You can also get an allocation profile by doing this:

cflame cassandra0 -e alloc

Feel free to ping me directly with questions.
Jon

On Tue, Mar 11, 2025 at 3:20 PM Dmitry Konstantinov 
wrote:

> Jon, thank you for testing!, can you share your CPU profile and test load
> details? Have you tested it with CASSANDRA-20092 changes included?
>
> >> Allocations related to codahale were < 1%.
> Just to clarify: in the initial mail by memory footprint I mean the static
> amount of memory used to store metric objects, not a dynamic allocation
> during requests processing (it should be almost zero and not a target
> to optimize).
>
> >> Once compaction is enabled, it's in the 2-3% realm
> What percent of CPU profile do you have spent for compaction in your load?
> (to dilute 7-8% to 2-3% it should be around 50%.., because compaction does
> not change the ratio between between total efforts spent for request
> processing vs metrics part of it)
>
> Regarding allocation rate vs throughput, unfortunately allocation rate vs
> throughput are not connected linearly, for example here:
> https://issues.apache.org/jira/browse/CASSANDRA-20165 I reduced
> allocation almost 2 times and got about 8% improvement in throughput (which
> is still a good result).
> So, while I agree that the mentioned compaction logic (cells
> deserializing) is a subject to improve from an allocation point of view I

Re: CEP-15 Update

2025-03-11 Thread Alex Petrov
While I agree that time spent working on a feature is not necessarily a clear 
indicator of maturity, one can judge the scope of work and thought that went 
into Accord by both its separate repository, and the working branch. 

I think that merging/accepting SASI was not a mistake. There were several 
efforts to make it work, and back in 2016 we could've made it quite viable with 
just CASSANDRA-11990 and a lot of testing. It did get superseded by SAI, but I 
can imagine a universe where SASI would have been developed into a stable 
feature. 

> is there a known path forward to fix the drop schema w nodes down issue and 
> anything written on it?
Yes, there is a clear known path for fixing schema changes, and gladly they do 
not require a protocol change, just a slightly deeper integration with TCM.


On Fri, Mar 7, 2025, at 4:44 PM, Jordan West wrote:
> I would love to have my questions answered and see some graphs I don’t think 
> those are unreasonable asks nor do they take away from the awesome work done. 
> I was suggesting 1-2 weeks for folks to have the opportunity to produce that 
> data if the original authors didn’t have time. I also don’t think that’s 
> unreasonable. but to be clear I’m not blocking anything. If folks want to 
> merge I am not objecting.
> 
> I do think we should hold features to a high standard and personally “time 
> worked on a feature” is not a criteria for me when considering why we should 
> merge. It is absolutely worth recognizing and celebrating the massive invest 
> and effort made here. It’s just an orthogonal point to me. As a contrived 
> example: If 15452 was not as impactful performance wise after a year of on 
> and off work I would’ve happily continue to address it or take a different 
> approach. SASI took a year and a half or more and I still regret that we 
> merged it into 3.x in the form we did using the same early contribution 
> model. That was an example of an extreme, and out of our control case, of an 
> entire team disbanding right after merge. 
> 
> Jordan 
> 
> On Fri, Mar 7, 2025 at 06:28 Jon Haddad  wrote:
>> I defer to the judgement of the folks that are most impacted by it - ones 
>> that are in the code, working on the next release.  If you all think it's 
>> good to merge, then I am 100% in support of it.  I suspect merging will help 
>> get it out faster, and I don't see any future in which we don't ship this in 
>> the next release.
>> 
>> I will be happy to help answer the "how does it compare to paxos v2" 
>> question post-merge.
>> 
>> Jon
>> 
>> 
>> 
>> On Fri, Mar 7, 2025 at 5:52 AM Josh McKenzie  wrote:
>>> __
>>> 3.5 years is an incredible amount of time and work; it really is 
>>> significant and thanks to everyone involved for the investment of time and 
>>> energy.
>>> 
>>> We have a rocky history with large, disruptive contributions in the past 
>>> that have either blocked forward progress post-merge (CASSANDRA-8099), or 
>>> lingered in the code-base increasing maintenance burden on other 
>>> contributors for minimal or no user benefit (early open post SSD 
>>> transition, witness replicas, materialized views). I'm sympathetic to where 
>>> Jordan's questions stem from, as our history of leaving things in the 
>>> codebase long after they've become vestigial or abandoned has slowed down 
>>> our collective momentum maintaining the project on actively used features.
>>> 
>>> That said, I don't think Accord will run afoul of some of those same 
>>> patterns. Aside from the degree of investment already in it and sheer 
>>> number of pmc members and committers involved, I believe it's a feature 
>>> that's universally impactful and that if we had a metaphorical bus-factor 
>>> change (entire group of people working on it disappeared the day after 
>>> merge or decided to go on vacation for 5 years), others in the community 
>>> would be willing to pick things up and keep it moving given its proximity 
>>> to release readiness.
>>> 
>>> The 2 questions Jordan asked resonate with me: 1) do we have line of sight 
>>> to a fix on the schema issues, and I'll take the liberty of reframing 2) do 
>>> we have line of sight to improvement on the performance front to be usable 
>>> for multi-key transactions? (subtle: I don't think "parity with PaxosV2" is 
>>> the right target, but rather "fast enough to be usable for multi-key 
>>> transactions" since it's a new query paradigm).
>>> 
>>> Given the context on contributor backing and if the answer is yes to those 
>>> 2 questions (which I believe it is), I think we should generally be 
>>> comfortable with merging the feature as experimental at this time.
>>> 
>>> On Fri, Mar 7, 2025, at 12:54 AM, Benedict wrote:
 
 There are essentially three possible timelines to choose from here: 
 
 1) We agree in the next few days to merge to trunk. We will then 
 prioritise rebasing onto trunk and resolving any pre-merge items starting 
 next week.
 2) There’s some more deb

Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-11 Thread Dmitry Konstantinov
Jon, thank you for testing!, can you share your CPU profile and test load
details? Have you tested it with CASSANDRA-20092 changes included?

>> Allocations related to codahale were < 1%.
Just to clarify: in the initial mail by memory footprint I mean the static
amount of memory used to store metric objects, not a dynamic allocation
during requests processing (it should be almost zero and not a target
to optimize).

>> Once compaction is enabled, it's in the 2-3% realm
What percent of CPU profile do you have spent for compaction in your load?
(to dilute 7-8% to 2-3% it should be around 50%.., because compaction does
not change the ratio between between total efforts spent for request
processing vs metrics part of it)

Regarding allocation rate vs throughput, unfortunately allocation rate vs
throughput are not connected linearly, for example here:
https://issues.apache.org/jira/browse/CASSANDRA-20165 I reduced allocation
almost 2 times and got about 8% improvement in throughput (which is still a
good result).
So, while I agree that the mentioned compaction logic (cells deserializing)
is a subject to improve from an allocation point of view I am not sure if
we get dramatic improvements in throughput just because of reducing it..

Regarding the metric registry - yes, I do not see a reason to move away
from it, in any case we need a common place to access metrics to provide
the correspondent virtual tables at least.
Regarding docs - I like this. I actually did something similar in one of my
non-open source projects by adding a description to each metric to be able
to render docs + to validate during a build that adding metrics are
properly documented.



On Tue, 11 Mar 2025 at 17:56, Jon Haddad  wrote:

> Definitely +1 on registry + docs.  I believe that's part of the OTel Java
> SDK [1][2]
>
> I did some performance testing yesterday and was able to replicate the
> findings where the codahale code path took 7-10% of CPU time.  The only
> caveat is that it only happens with compaction disabled.  Once compaction
> is enabled, it's in the 2-3% realm.  Allocations related to codahale were <
> 1%.
>
> I'm not discouraging anyone from pursuing performance optimizations, just
> looking to set expectations on what the real world benefits will be.  This
> will likely yield a ~ 2% improvement in throughput based on the earlier
> discussion.
>
> For comparison, eliminating a single byte buffer allocation in
> ByteArrayAccessor.read in the BTree code path would reduce heap allocations
> by 40%, with default compaction throughput of 64MB/s.  Addressing this, in
> conjunction with the recently merged CASSANDRA-15452 + Branimir's
> CASSANDRA-20092 patch, would allow for much faster compaction which in
> turn, would improve density, and significantly reduce latency.  If you're
> chasing perf issues, this is one of the top problems in the codebase.
>
> 
>
> Jon
>
> [1] https://opentelemetry.io/docs/languages/java/api/#meterprovider
> [2] https://opentelemetry.io/docs/specs/semconv/attributes-registry/
>
>
>
> On Tue, Mar 11, 2025 at 8:02 AM Josh McKenzie 
> wrote:
>
>> Having something like a registry and standardizing/enforcing all metric
>> types is something we should be sure to maintain.
>>
>> A registry w/documentation on each metric indicating *what it's actually
>> measuring and what it means* would be great for our users.
>>
>> On Mon, Mar 10, 2025, at 3:46 PM, Chris Lohfink wrote:
>>
>> Just something to be mindful about what we had *before* codahale in
>> Cassandra and avoid that again. Pre 1.1 it was pretty much impossible to
>> collect metrics without looking at code (there were efficient custom made
>> things, but each metric was reported differently) and that stuck through
>> until 2.2 days. Having something like a registry and
>> standardizing/enforcing all metric types is something we should be sure to
>> maintain.
>>
>> Chris
>>
>> On Fri, Mar 7, 2025 at 1:33 PM Jon Haddad 
>> wrote:
>>
>> As long as operators are able to use all the OTel tooling, I'm happy.
>> I'm not looking to try to decide what the metrics API looks like, although
>> I think trying to plan for 15 years out is a bit unnecessary. A lot of the
>> DB will be replaced by then.  That said, I'm mostly hands off on code and
>> you guys are more than capable of making the smart decision here.
>>
>> Regarding virtual tables, I'm looking at writing a custom OTel receiver
>> [1] to ingest them.  I was really impressed with the performance work you
>> did there and it got my wheels turning on how to best make use of it.  I am
>> planning on using it with easy-cass-lab to pull DB metrics and logs down to
>> my local machine along with kernel metrics via eBPF.
>>
>> Jon
>>
>> [1] https://opentelemetry.io/docs/collector/building/receiver/
>>
>>
>>
>> On Wed, Mar 5, 2025 at 1:06 PM Maxim Muzafarov  wrote:
>>
>> If we do swap, we may run into the same issues with third-party
>> metrics libraries in the next 10-15 years that we are discussing now
>> with th