Re: Welcome Ekaterina Dimitrova as Cassandra PMC member

2025-03-05 Thread Jordan West
Congratulations!!!
On Wed, Mar 5, 2025 at 07:01 Abe Ratnofsky  wrote:

> Congratulations Ekaterina! 🎉
>


Re: Welcome Ekaterina Dimitrova as Cassandra PMC member

2025-03-05 Thread Tolbert, Andy
Congratulations Ekaterina!!

On Wed, Mar 5, 2025 at 8:52 PM Jordan West  wrote:

> Congratulations!!!
> On Wed, Mar 5, 2025 at 07:01 Abe Ratnofsky  wrote:
>
>> Congratulations Ekaterina! 🎉
>>
>


Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-05 Thread Benedict Elliott Smith
Some quick thoughts of my own…

=== Performance ===
- I have seen heap dumps with > 1GiB dedicated to metric counters. This patch 
should improve this, while opening up room to cut it further, steeply.
- The performance improvement in relative terms for the metrics being replaced 
is rather dramatic - about 80%.. We can also improve this further.
- Cheaper metrics (in terms of both cpu and memory) means we can readily have 
more of them, exposing finer-grained details. This is hard to understate the 
value of.

=== Reporting ===
- We’re already non-standard for our most important metrics, because we had to 
replace the Codahale histogram years ago
- We can continue implementing the Codahale interfaces, so that exporting 
libraries have minimal work to support us
- We can probably push patches upstream to a couple of selected libraries we 
consider important
- I would anyway also support picking a new reporting framework to support, but 
I would like us to do this with great care to avoid repeating our mistakes. I 
won’t have cycles to actually implement this, so it would be down to others to 
decide if they are willing to undertake this work

I think the fallback option for now, however, is to abuse unsafe to allow us to 
override the implementation details of Codahale metrics. So we can decouple the 
performance discussion for now from the deprecation discussion, but I think we 
should have a target of deprecating Codahale/DropWizard for the reasons Dmitry 
outlines, however we decide to do it.

> On 4 Mar 2025, at 21:17, Jon Haddad  wrote:
> 
> I've got a few thoughts...
> 
> On the performance side, I took a look at a few CPU profiles from past 
> benchmarks and I'm seeing DropWizard taking ~ 3% of CPU time.  Is there a 
> specific workload you're running where you're seeing it take up a significant 
> % of CPU time?  Could you share some metrics, profile data, or a workload so 
> I can try to reproduce your findings?  In my testing I've found the majority 
> of the overhead from metrics to come from JMX, not DropWizard.
> 
> On the operator side, inventing our own metrics lib means risks making it 
> harder to instrument Cassandra.  There are libraries out there that allow you 
> to tap into DropWizard metrics directly.  For example, Sarma Pydipally did a 
> presentation on this last year [1] based on some code I threw together.
> 
> If you're planning on making it easier to instrument C* by supporting sending 
> metrics to the OTel collector [2], then I could see the change being a net 
> win as long as the perf is no worse than the status quo.
> 
> It's hard to know the full extent of what you're planning and the impact, so 
> I'll save any opinions till I know more about the plan.
> 
> Thanks for bringing this up!
> Jon
> 
> [1] 
> https://planetcassandra.org/leaf/apache-cassandra-lunch-62-grafana-dashboard-for-apache-cassandra-business-platform-team/
> [2] https://opentelemetry.io/docs/collector/
> 
> On Tue, Mar 4, 2025 at 12:40 PM Dmitry Konstantinov  > wrote:
>> Hi all,
>> 
>> After a long conversation with Benedict and Maxim in CASSANDRA-20250 
>>  I would like to 
>> raise and discuss a proposal to deprecate Dropwizard/Codahale metrics usage 
>> in the next major release of Cassandra server and drop it in the following 
>> major release.
>> Instead of it our own Java API and implementation should be introduced. For 
>> the next major release Dropwizard/Codahale API is still planned to support 
>> by extending Codahale implementations, to give potential users of this API 
>> enough time for transition.
>> The proposal does not affect JMX API for metrics, it is only about local 
>> Java API changes within Cassandra server classpath, so it is about the cases 
>> when somebody outside of Cassandra server code relies on Codahale API in 
>> some kind of extensions or agents.
>> 
>> Reasons:
>> 1) Codahale metrics implementation is not very efficient from CPU and memory 
>> usage point of view. In the past we already replaced default Codahale 
>> implementations for Reservoir with our custom one and now in CASSANDRA-20250 
>>  we (Benedict and I) 
>> want to add a more efficient implementation for Counter and Meter logic. So, 
>> in total we do not have so much logic left from the original library (mostly 
>> a MetricRegistry as container for metrics) and the majority of logic is 
>> implemented by ourselves.
>> We use metrics a lot along the read and write paths and they contribute a 
>> visible overhead (for example for plain write load it is about 9-11% 
>> according to async profiler CPU profile), so we want them to be highly 
>> optimized.
>> From memory perspective Counter and Meter are built based on LongAdder and 
>> they are quite heavy for the amounts which we create and use.
>> 
>> 2) Codahale metrics does not provide any way to replace Counter and Meter 

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-05 Thread Štefan Miklošovič
Scott,

what you wrote is all correct, but I have a feeling that both you and Jeff
are talking about something different, some other aspect of using that.

It seems that I still need to explain myself that I don't consider object
storage to be useless, it is as if everybody has to make the point about
the opposite. I agree with you already.

It seems to me that the use cases you want to use s3 for (or any object
storage for that matter) is actively reading / writing to satisfy queries,
offload some stuff etc.

What I am talking about when it comes to s3 mounted locally is just to use
it for copying SSTables there which were taken by a snapshot. In, (1) I
never wanted to use s3 for anything else than to literally copy SSTable
there as part of snapshotting and be done with it. If I exaggerate to make
a point, where do I need to "hurry" so I care about the speed specifically
in this particular case? I have never said I consider it to be important.

What I consider important is that it is super easy to use - this approach
is cloud agnostic, we do not need to implement anything in Cassandra, no
messing with dependencies etc. It is "future-proof" in such a sense that
whatever cloud somebody wants to use for storing snapshots, all it takes is
to _somehow_ mount it locally and all will work out of the box.

You want to leverage object storage for more "involved" use cases.

I do not see how mounting a dir and copy files there would "clash" with
your way of looking at it. Why can't we have both?

I keep repeating this all over again but I still don't know how it is going
to be actually done. If you want to e.g. support object storage like, I
don't know, GCP (I do not have any clue if that is even code-able), then
there would need to be _somebody_ who will integrate with it. With mounting
a dir for snapshotting purposes, we do not need to deal with that. This
aspect of additional complexity when it comes to coding, integrating,
deploying etc. seems to be repeatedly overlooked and I would really
appreciate it if we spent a little bit more time expanding that area.

I do not have a problem with using object storage for things you want, but
I do not get why it should automatically disqualify scenarios when a user
figures out that mounting that storage locally is sufficient.

(1) https://lists.apache.org/thread/8cz5fh835ojnxwtn1479q31smm5x7nxt

On Wed, Mar 5, 2025 at 6:22 AM C. Scott Andreas 
wrote:

> To Jeff’s point on tactical vs. strategic, here’s the big picture for me
> on object storage:
>
> *– Object storage is 70% cheaper:*
> Replicated flash block storage is extremely expensive, and more so with
> compute resources constantly attached. If one were to build a storage
> platform on top of a cloud provider’s compute and storage infrastructure,
> selective use of object storage is essential to even being in the ballpark
> of managed offerings on price. EBS is 8¢/GB. S3 is 2.3¢/GB. It’s over 70%
> cheaper.
>
> *– Local/block storage is priced on storage *provisioned*. Object storage
> is priced on storage *consumed*:*
> It’s actually better than 70%. Local/block storage is priced based on the
> size of disks/volumes provisioned. While they may be resizable, resizing is
> generally inelastic. This typically produces a large gap between storage
> consumed vs. storage provisioned - and poor utilization. Object storage is
> typically priced on storage that is actually consumed.
>
> *– Object storage integration is the simplest path to complete decoupling
> of CPU and storage:*
> Block volumes are more fungible than local disk, but aren’t even close in
> flexibility to an SSTable that can be accessed by any function. Object is
> also the only sensible path to implementing a serverless database whose
> query facilities can be deployed on a function-as-a-service platform. That
> enables one to reach an idle compute cost of zero and an idle storage cost
> of 2.3¢/GB/month (S3).
>
> *– Object storage enables scale-to-zero:*
> Object storage integration is the only path for most databases to provide
> a scale-to-zero offering that doesn’t rely on keeping hot NVMe or block
> storage attached 24/7 while a database receives zero queries per second.
>
> *– Scale to zero is one of the easiest paths to zero marginal cost (the
> other is multitenancy - and not mutually exclusive):*
> Database platforms operated in a cluster-as-a-service model incur a
> constant fixed cost of provisioned resources regardless of whether they are
> in use. That’s fine for platforms that pass the full cost of resources
> consumed back to someone — but it produces poor economics and resource
> waste. Ability to scale to zero dramatically reduces the cost of
> provisioning and maintaining an un/underutilized database.
>
> *– It’s not all or nothing:*
> There are super sensible ways to pair local/block storage and object
> storage. One might be to store upper-level SSTable data components in
> object storage; and all other SSTable components (TOC, Compres

Re: Welcome Ekaterina Dimitrova as Cassandra PMC member

2025-03-05 Thread Piotr Kołaczkowski
Congratulations Ekaterina! Well deserved!

> Wiadomość napisana przez Andrés de la Peña  w dniu 
> 05.03.2025, o godz. 11:41:
> 
> Caution: The sender name (Andrés de la Peña) is different from their email 
> address (a.penya.gar...@gmail.com), which may indicate an impersonation 
> attempt. Verify the email's authenticity with the sender using your 
> organization's trusted contact list before replying or taking further action.
> Secured by Check Point 
> 
> Congratulations Ekaterina!
> 
> On Wed, 5 Mar 2025 at 08:20, Jacek Lewandowski  > wrote:
>> Congratulations Ekaterina!!! That is awesome news!!! 🎉
>> 
>> 
>> - - -- --- -  -
>> Jacek Lewandowski
>> 
>> 
>> śr., 5 mar 2025 o 09:17 Enrico Olivelli > > napisał(a):
>>> Congratulations!
>>> 
>>> Enrico
>>> 
>>> 
>>> Il Mer 5 Mar 2025, 07:54 Bernardo Botella >> > ha scritto:
 Congratulations!!
 
 On Tue, Mar 4, 2025 at 22:17 Berenguer Blasi >>> > wrote:
> Congrats Ekaterina!
> On 5/3/25 2:03, Jasonstack Zhao Yang wrote:
>> Congratulations Ekaterina!
>> 
>> On Wed, 5 Mar 2025 at 08:18, Josh McKenzie > > wrote:
>>> Welcome Ekaterina!  \o/
>>> 
>>> On Tue, Mar 4, 2025, at 7:07 PM, Francisco Guerrero wrote:
 Congratulations Ekaterina! Well deserved!
 
 On 2025/03/04 20:25:08 Paulo Motta wrote:
 > Aloha,
 > 
 > The Project Management Committee (PMC) for Apache Cassandra is 
 > delighted to
 > announce that Ekaterina Dimitrova has joined the PMC!
 > 
 > Thanks a lot, Ekaterina, for everything you have done for the 
 > project all
 > these years.
 > 
 > The PMC - Project Management Committee - manages and guides the 
 > direction
 > of the project, and is responsible for inviting new committers and 
 > PMC
 > members to steward the longevity of the project.
 > 
 > See https://community.apache.org/pmc/responsibilities.html if you're
 > interested in learning more about the rights and responsibilities of 
 > PMC
 > members.
 > 
 > Please join us in welcoming Ekaterina Dimitrova to her new role in 
 > our
 > project!
 > 
 > Paulo, on behalf of the Apache Cassandra PMC
 > 
 
>>> 



Re: Welcome Ekaterina Dimitrova as Cassandra PMC member

2025-03-05 Thread Andrés de la Peña
Congratulations Ekaterina!

On Wed, 5 Mar 2025 at 08:20, Jacek Lewandowski 
wrote:

> Congratulations Ekaterina!!! That is awesome news!!! 🎉
>
>
> - - -- --- -  -
> Jacek Lewandowski
>
>
> śr., 5 mar 2025 o 09:17 Enrico Olivelli  napisał(a):
>
>> Congratulations!
>>
>> Enrico
>>
>> Il Mer 5 Mar 2025, 07:54 Bernardo Botella 
>> ha scritto:
>>
>>> Congratulations!!
>>>
>>> On Tue, Mar 4, 2025 at 22:17 Berenguer Blasi 
>>> wrote:
>>>
 Congrats Ekaterina!
 On 5/3/25 2:03, Jasonstack Zhao Yang wrote:

 Congratulations Ekaterina!

 On Wed, 5 Mar 2025 at 08:18, Josh McKenzie 
 wrote:

> Welcome Ekaterina!  \o/
>
> On Tue, Mar 4, 2025, at 7:07 PM, Francisco Guerrero wrote:
>
> Congratulations Ekaterina! Well deserved!
>
> On 2025/03/04 20:25:08 Paulo Motta wrote:
> > Aloha,
> >
> > The Project Management Committee (PMC) for Apache Cassandra is
> delighted to
> > announce that Ekaterina Dimitrova has joined the PMC!
> >
> > Thanks a lot, Ekaterina, for everything you have done for the
> project all
> > these years.
> >
> > The PMC - Project Management Committee - manages and guides the
> direction
> > of the project, and is responsible for inviting new committers and
> PMC
> > members to steward the longevity of the project.
> >
> > See https://community.apache.org/pmc/responsibilities.html if you're
> > interested in learning more about the rights and responsibilities of
> PMC
> > members.
> >
> > Please join us in welcoming Ekaterina Dimitrova to her new role in
> our
> > project!
> >
> > Paulo, on behalf of the Apache Cassandra PMC
> >
>
>
>


Re: CEP-15 Update

2025-03-05 Thread Benedict Elliott Smith
That depends on all of you lovely people :D

I think we should have finished merging everything we want before QA by 
~Monday; certainly not much later.

I think we have some upgrade and python dtest failures to address as well.

So it could be pretty soon if the community is supportive.

> On 5 Mar 2025, at 17:22, Patrick McFadin  wrote:
> 
> What is the timing for starting the merge process? I'm asking because
> I have (yet another) presentation and this would be a cool update.
> 
> On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith
>  wrote:
>> 
>> Thanks everyone.
>> 
>> Jon - your help will be greatly appreciated. We’ll let you know when we’ve 
>> got the cycles to invest in performance work (hopefully fairly soon). I 
>> expect the first step will be improving visibility so we can better 
>> understand what the system is doing (particularly the caching layers), but 
>> we can dig in together when ready.
>> 
>> On 4 Mar 2025, at 18:15, Jon Haddad  wrote:
>> 
>> Very exciting!
>> 
>> I have a client that's very interested in Accord, so I should have budget to 
>> dig into it, especially on the performance side of things.
>> 
>> Jon
>> 
>> On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov  
>> wrote:
>>> 
>>> Thank you to all Accord and TCM contributors, it is really exciting to see 
>>> a development of such huge and wonderful features moving forward and 
>>> opening the door to the new Cassandra epoch!
>>> 
>>> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston  wrote:
 
 Thanks Benedict!
 
 I’m really excited to see accord reach this milestone, even with these 
 caveats. You seem to have left yourself off the list of contributors 
 though, even though you’ve been a central figure in its development :) So 
 thanks to all accord & tcm contributors, including Benedict, for making 
 this possible!
 
 On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote:
 
 Hi everyone,
 
 It’s been exactly 3.5 years since the first commit to cassandra-accord. 
 Yes, really, it’s been that long.
 
 We will be starting to validate the feature against real workloads in the 
 near future, so we can’t sensibly push off merging much longer. The 
 following is a brief run-down of the state of play. There are no known 
 bugs, but there remain a number of caveats we will be incrementally 
 addressing in the run-up to a full release:
 
 [1] Accord is likely to be SLOW until further optimisations are implemented
 [2] Schema changes have a number of hard edges
 [3] Validation is ongoing, so there are likely still a number of bugs to 
 shake out
 [4] Many operator visibility/tooling/documentation improvements are pending
 
 To expand a little:
 
 [1] As of the last experiment we conducted, accord’s throughput was poor - 
 also leading to higher LAN latencies. We have done no WAN experiments to 
 date, but the protocol guarantees should already achieve better round-trip 
 performance, in particular under contention. Improving throughput will be 
 the main focus of attention once we are satisfied the protocol is 
 otherwise stable, but our focus remains validation for the moment.
 [2] Schema changes have not yet been well integrated with TCM. Dropping a 
 table for instance will currently cause problems if nodes are offline.
 [3] We have a range of validations we are already performing against 
 cassandra-accord directly, and against its integration with Cassandra in 
 cep-15-accord. We have run hundreds of billions of simulated transactions, 
 and are still discovering some minor fault every few billion simulated 
 transactions or so. There remains a lot more simulated validation to 
 explore, as well as with real clusters serving real workloads.
 [4] There are already a range of virtual tables for exploring internal 
 state in Accord, and reasonably good metric support. However, tracing is 
 not yet supported, and our metric and virtual table integrations need some 
 further development.
 [5] There are also other edge cases to address such as ensuring we do not 
 reuse HLCs after restart, supporting ByteOrderPartitioner, and live 
 migration from/to Paxos is undergoing fine-tuning and validation; probably 
 there are some other things I am forgetting.
 
 Altogether the feature is fairly mature, despite these caveats. This is 
 the fruit of the labour of a long list of contributors, including Aleksey 
 Yeschenko, Alex Petrov, Ariel Weisberg, Blake Eggleston, Caleb Rackliffe 
 and David Capwell, and represents a huge undertaking. It also wouldn’t 
 have been possible without the work of Alex Petrov, Marcus Eriksson and 
 Sam Tunnicliffe on delivering transactional cluster metadata. I hope you 
 will join me in thanking them all for their contributions.
 
 Alex has also kindly produced 

Re: Welcome Ekaterina Dimitrova as Cassandra PMC member

2025-03-05 Thread Abe Ratnofsky
Congratulations Ekaterina! 🎉


Re: Welcome Bernardo Botella as Cassandra Committer

2025-03-05 Thread Abe Ratnofsky
Congratulations Bernardo! Great news.


Re: CEP-15 Update

2025-03-05 Thread Patrick McFadin
What is the timing for starting the merge process? I'm asking because
I have (yet another) presentation and this would be a cool update.

On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith
 wrote:
>
> Thanks everyone.
>
> Jon - your help will be greatly appreciated. We’ll let you know when we’ve 
> got the cycles to invest in performance work (hopefully fairly soon). I 
> expect the first step will be improving visibility so we can better 
> understand what the system is doing (particularly the caching layers), but we 
> can dig in together when ready.
>
> On 4 Mar 2025, at 18:15, Jon Haddad  wrote:
>
> Very exciting!
>
> I have a client that's very interested in Accord, so I should have budget to 
> dig into it, especially on the performance side of things.
>
> Jon
>
> On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov  wrote:
>>
>> Thank you to all Accord and TCM contributors, it is really exciting to see a 
>> development of such huge and wonderful features moving forward and opening 
>> the door to the new Cassandra epoch!
>>
>> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston  wrote:
>>>
>>> Thanks Benedict!
>>>
>>> I’m really excited to see accord reach this milestone, even with these 
>>> caveats. You seem to have left yourself off the list of contributors 
>>> though, even though you’ve been a central figure in its development :) So 
>>> thanks to all accord & tcm contributors, including Benedict, for making 
>>> this possible!
>>>
>>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote:
>>>
>>> Hi everyone,
>>>
>>> It’s been exactly 3.5 years since the first commit to cassandra-accord. 
>>> Yes, really, it’s been that long.
>>>
>>> We will be starting to validate the feature against real workloads in the 
>>> near future, so we can’t sensibly push off merging much longer. The 
>>> following is a brief run-down of the state of play. There are no known 
>>> bugs, but there remain a number of caveats we will be incrementally 
>>> addressing in the run-up to a full release:
>>>
>>> [1] Accord is likely to be SLOW until further optimisations are implemented
>>> [2] Schema changes have a number of hard edges
>>> [3] Validation is ongoing, so there are likely still a number of bugs to 
>>> shake out
>>> [4] Many operator visibility/tooling/documentation improvements are pending
>>>
>>> To expand a little:
>>>
>>> [1] As of the last experiment we conducted, accord’s throughput was poor - 
>>> also leading to higher LAN latencies. We have done no WAN experiments to 
>>> date, but the protocol guarantees should already achieve better round-trip 
>>> performance, in particular under contention. Improving throughput will be 
>>> the main focus of attention once we are satisfied the protocol is otherwise 
>>> stable, but our focus remains validation for the moment.
>>> [2] Schema changes have not yet been well integrated with TCM. Dropping a 
>>> table for instance will currently cause problems if nodes are offline.
>>> [3] We have a range of validations we are already performing against 
>>> cassandra-accord directly, and against its integration with Cassandra in 
>>> cep-15-accord. We have run hundreds of billions of simulated transactions, 
>>> and are still discovering some minor fault every few billion simulated 
>>> transactions or so. There remains a lot more simulated validation to 
>>> explore, as well as with real clusters serving real workloads.
>>> [4] There are already a range of virtual tables for exploring internal 
>>> state in Accord, and reasonably good metric support. However, tracing is 
>>> not yet supported, and our metric and virtual table integrations need some 
>>> further development.
>>> [5] There are also other edge cases to address such as ensuring we do not 
>>> reuse HLCs after restart, supporting ByteOrderPartitioner, and live 
>>> migration from/to Paxos is undergoing fine-tuning and validation; probably 
>>> there are some other things I am forgetting.
>>>
>>> Altogether the feature is fairly mature, despite these caveats. This is the 
>>> fruit of the labour of a long list of contributors, including Aleksey 
>>> Yeschenko, Alex Petrov, Ariel Weisberg, Blake Eggleston, Caleb Rackliffe 
>>> and David Capwell, and represents a huge undertaking. It also wouldn’t have 
>>> been possible without the work of Alex Petrov, Marcus Eriksson and Sam 
>>> Tunnicliffe on delivering transactional cluster metadata. I hope you will 
>>> join me in thanking them all for their contributions.
>>>
>>> Alex has also kindly produced some initial overview documentation for 
>>> developers, that can be found here: 
>>> https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc.
>>>  This will be expanded as time permits.
>>>
>>> Does anyone have any questions or concerns?
>>>
>>>
>>
>>
>> --
>> Dmitry Konstantinov
>
>


Re: Welcome Ekaterina Dimitrova as Cassandra PMC member

2025-03-05 Thread Mick Semb Wever
   .


> The Project Management Committee (PMC) for Apache Cassandra is delighted to 
> announce that Ekaterina Dimitrova has joined the PMC!
>
> Thanks a lot, Ekaterina, for everything you have done for the project all 
> these years.



Congrats Ekaterina !


Re: Welcome Ekaterina Dimitrova as Cassandra PMC member

2025-03-05 Thread Maxim Muzafarov
Congratulations Ekaterina!

On Wed, 5 Mar 2025 at 15:00, Mick Semb Wever  wrote:
>
>.
>
>
> > The Project Management Committee (PMC) for Apache Cassandra is delighted to 
> > announce that Ekaterina Dimitrova has joined the PMC!
> >
> > Thanks a lot, Ekaterina, for everything you have done for the project all 
> > these years.
>
>
>
> Congrats Ekaterina !


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-05 Thread Mick Semb Wever
   .


It’s not an area where I can currently dedicate engineering effort. But if
> others are interested in contributing a feature like this, I’d see it as
> valuable for the project and would be happy to collaborate on
> design/architecture/goals.
>


Jake mentioned 17 months ago a custom FileSystemProvider we could offer.

None of us at DataStax has gotten around to providing that, but to quickly
throw something over the wall this is it:

https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java

  (with a few friend classes under o.a.c.io.util)

We then have a RemoteStorageProvider, private in another repo, that
implements that and also provides the RemoteFileSystemProvider that Jake
refers to.

Hopefully that's a start to get people thinking about CEP level details,
while we get a cleaned abstract of RemoteStorageProvider and friends to
offer.


Re: Welcome Ekaterina Dimitrova as Cassandra PMC member

2025-03-05 Thread Joseph Lynch
Congratulations Ekaterina - very well deserved!

On Tue, Mar 4, 2025 at 3:25 PM Paulo Motta  wrote:

> Aloha,
>
> The Project Management Committee (PMC) for Apache Cassandra is delighted
> to announce that Ekaterina Dimitrova has joined the PMC!
>
> Thanks a lot, Ekaterina, for everything you have done for the project all
> these years.
>
> The PMC - Project Management Committee - manages and guides the direction
> of the project, and is responsible for inviting new committers and PMC
> members to steward the longevity of the project.
>
> See https://community.apache.org/pmc/responsibilities.html if you're
> interested in learning more about the rights and responsibilities of PMC
> members.
>
> Please join us in welcoming Ekaterina Dimitrova to her new role in our
> project!
>
> Paulo, on behalf of the Apache Cassandra PMC
>


Re: Welcome Ekaterina Dimitrova as Cassandra PMC member

2025-03-05 Thread Enrico Olivelli
Congratulations!

Enrico

Il Mer 5 Mar 2025, 07:54 Bernardo Botella  ha
scritto:

> Congratulations!!
>
> On Tue, Mar 4, 2025 at 22:17 Berenguer Blasi 
> wrote:
>
>> Congrats Ekaterina!
>> On 5/3/25 2:03, Jasonstack Zhao Yang wrote:
>>
>> Congratulations Ekaterina!
>>
>> On Wed, 5 Mar 2025 at 08:18, Josh McKenzie  wrote:
>>
>>> Welcome Ekaterina!  \o/
>>>
>>> On Tue, Mar 4, 2025, at 7:07 PM, Francisco Guerrero wrote:
>>>
>>> Congratulations Ekaterina! Well deserved!
>>>
>>> On 2025/03/04 20:25:08 Paulo Motta wrote:
>>> > Aloha,
>>> >
>>> > The Project Management Committee (PMC) for Apache Cassandra is
>>> delighted to
>>> > announce that Ekaterina Dimitrova has joined the PMC!
>>> >
>>> > Thanks a lot, Ekaterina, for everything you have done for the project
>>> all
>>> > these years.
>>> >
>>> > The PMC - Project Management Committee - manages and guides the
>>> direction
>>> > of the project, and is responsible for inviting new committers and PMC
>>> > members to steward the longevity of the project.
>>> >
>>> > See https://community.apache.org/pmc/responsibilities.html if you're
>>> > interested in learning more about the rights and responsibilities of
>>> PMC
>>> > members.
>>> >
>>> > Please join us in welcoming Ekaterina Dimitrova to her new role in our
>>> > project!
>>> >
>>> > Paulo, on behalf of the Apache Cassandra PMC
>>> >
>>>
>>>
>>>


Re: Welcome Ekaterina Dimitrova as Cassandra PMC member

2025-03-05 Thread Jacek Lewandowski
Congratulations Ekaterina!!! That is awesome news!!! 🎉


- - -- --- -  -
Jacek Lewandowski


śr., 5 mar 2025 o 09:17 Enrico Olivelli  napisał(a):

> Congratulations!
>
> Enrico
>
> Il Mer 5 Mar 2025, 07:54 Bernardo Botella 
> ha scritto:
>
>> Congratulations!!
>>
>> On Tue, Mar 4, 2025 at 22:17 Berenguer Blasi 
>> wrote:
>>
>>> Congrats Ekaterina!
>>> On 5/3/25 2:03, Jasonstack Zhao Yang wrote:
>>>
>>> Congratulations Ekaterina!
>>>
>>> On Wed, 5 Mar 2025 at 08:18, Josh McKenzie  wrote:
>>>
 Welcome Ekaterina!  \o/

 On Tue, Mar 4, 2025, at 7:07 PM, Francisco Guerrero wrote:

 Congratulations Ekaterina! Well deserved!

 On 2025/03/04 20:25:08 Paulo Motta wrote:
 > Aloha,
 >
 > The Project Management Committee (PMC) for Apache Cassandra is
 delighted to
 > announce that Ekaterina Dimitrova has joined the PMC!
 >
 > Thanks a lot, Ekaterina, for everything you have done for the project
 all
 > these years.
 >
 > The PMC - Project Management Committee - manages and guides the
 direction
 > of the project, and is responsible for inviting new committers and PMC
 > members to steward the longevity of the project.
 >
 > See https://community.apache.org/pmc/responsibilities.html if you're
 > interested in learning more about the rights and responsibilities of
 PMC
 > members.
 >
 > Please join us in welcoming Ekaterina Dimitrova to her new role in our
 > project!
 >
 > Paulo, on behalf of the Apache Cassandra PMC
 >





Re: CEP-15 Update

2025-03-05 Thread Benedict Elliott Smith
Thanks everyone. 

Jon - your help will be greatly appreciated. We’ll let you know when we’ve got 
the cycles to invest in performance work (hopefully fairly soon). I expect the 
first step will be improving visibility so we can better understand what the 
system is doing (particularly the caching layers), but we can dig in together 
when ready.

> On 4 Mar 2025, at 18:15, Jon Haddad  wrote:
> 
> Very exciting!  
> 
> I have a client that's very interested in Accord, so I should have budget to 
> dig into it, especially on the performance side of things.
> 
> Jon
> 
> On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov  > wrote:
>> Thank you to all Accord and TCM contributors, it is really exciting to see a 
>> development of such huge and wonderful features moving forward and opening 
>> the door to the new Cassandra epoch!
>> 
>> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston > > wrote:
>>> Thanks Benedict!
>>> 
>>> I’m really excited to see accord reach this milestone, even with these 
>>> caveats. You seem to have left yourself off the list of contributors 
>>> though, even though you’ve been a central figure in its development :) So 
>>> thanks to all accord & tcm contributors, including Benedict, for making 
>>> this possible!
>>> 
>>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote:
 Hi everyone,
 
 It’s been exactly 3.5 years since the first commit to cassandra-accord. 
 Yes, really, it’s been that long.
 
 We will be starting to validate the feature against real workloads in the 
 near future, so we can’t sensibly push off merging much longer. The 
 following is a brief run-down of the state of play. There are no known 
 bugs, but there remain a number of caveats we will be incrementally 
 addressing in the run-up to a full release:
 
 [1] Accord is likely to be SLOW until further optimisations are implemented
 [2] Schema changes have a number of hard edges
 [3] Validation is ongoing, so there are likely still a number of bugs to 
 shake out
 [4] Many operator visibility/tooling/documentation improvements are pending
 
 To expand a little: 
 
 [1] As of the last experiment we conducted, accord’s throughput was poor - 
 also leading to higher LAN latencies. We have done no WAN experiments to 
 date, but the protocol guarantees should already achieve better round-trip 
 performance, in particular under contention. Improving throughput will be 
 the main focus of attention once we are satisfied the protocol is 
 otherwise stable, but our focus remains validation for the moment.
 [2] Schema changes have not yet been well integrated with TCM. Dropping a 
 table for instance will currently cause problems if nodes are offline.
 [3] We have a range of validations we are already performing against 
 cassandra-accord directly, and against its integration with Cassandra in 
 cep-15-accord. We have run hundreds of billions of simulated transactions, 
 and are still discovering some minor fault every few billion simulated 
 transactions or so. There remains a lot more simulated validation to 
 explore, as well as with real clusters serving real workloads.
 [4] There are already a range of virtual tables for exploring internal 
 state in Accord, and reasonably good metric support. However, tracing is 
 not yet supported, and our metric and virtual table integrations need some 
 further development.
 [5] There are also other edge cases to address such as ensuring we do not 
 reuse HLCs after restart, supporting ByteOrderPartitioner, and live 
 migration from/to Paxos is undergoing fine-tuning and validation; probably 
 there are some other things I am forgetting.
 
 Altogether the feature is fairly mature, despite these caveats. This is 
 the fruit of the labour of a long list of contributors, including Aleksey 
 Yeschenko, Alex Petrov, Ariel Weisberg, Blake Eggleston, Caleb Rackliffe 
 and David Capwell, and represents a huge undertaking. It also wouldn’t 
 have been possible without the work of Alex Petrov, Marcus Eriksson and 
 Sam Tunnicliffe on delivering transactional cluster metadata. I hope you 
 will join me in thanking them all for their contributions.
 
 Alex has also kindly produced some initial overview documentation for 
 developers, that can be found here: 
 https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc.
  This will be expanded as time permits.
 
 Does anyone have any questions or concerns?
>>> 
>> 
>> 
>> 
>> --
>> Dmitry Konstantinov



Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-05 Thread Dmitry Konstantinov
Hi Jon

>>  Is there a specific workload you're running where you're seeing it take
up a significant % of CPU time?  Could you share some metrics, profile
data, or a workload so I can try to reproduce your findings?
Yes, I have shared the workload generation command (sorry, it is in
cassandra-stress, I have not yet adopted your tool but want to do it soon
:-) ), setup details and async profiler CPU profile in CASSANDRA-20250

A summary:

   - it is a plain insert-only workload to assert a max throughput capacity
   for a single node: ./tools/bin/cassandra-stress "write n=10m" -rate
   threads=100 -node myhost
   - small amount of data per row is inserted, local SSD disks are used, so
   CPU is a primary bottleneck in this scenario (while it is quite synthetic
   in my real business cases CPU is a primary bottleneck as well)
   - I used 5.1 trunk version (similar results I have for 5.0 version while
   I was checking CASSANDRA-20165
   )
   - I enabled trie memetables + offheap objects mode
   - I disabled compaction
   - a recent nightly build is used for async-profiler
   - my hardware is quite old: on-premise VM, Linux 4.18.0-240.el8.x86_64,
   OpenJdk-11.0.26+4, Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 16 cores
   - link to CPU profile
   

("codahale"
   code: 8.65%)
   - -XX:+DebugNonSafepoints option is enabled to improve the profile
   precision


On Wed, 5 Mar 2025 at 12:38, Benedict Elliott Smith 
wrote:

> Some quick thoughts of my own…
>
> === Performance ===
> - I have seen heap dumps with > 1GiB dedicated to metric counters. This
> patch should improve this, while opening up room to cut it further, steeply.
> - The performance improvement in relative terms for the metrics being
> replaced is rather dramatic - about 80%.. We can also improve this further.
> - Cheaper metrics (in terms of both cpu and memory) means we can readily
> have more of them, exposing finer-grained details. This is hard to
> understate the value of.
>
> === Reporting ===
> - We’re already non-standard for our most important metrics, because we
> had to replace the Codahale histogram years ago
> - We can continue implementing the Codahale interfaces, so that exporting
> libraries have minimal work to support us
> - We can probably push patches upstream to a couple of selected libraries
> we consider important
> - I would anyway also support picking a new reporting framework to
> support, but I would like us to do this with great care to avoid repeating
> our mistakes. I won’t have cycles to actually implement this, so it would
> be down to others to decide if they are willing to undertake this work
>
> I think the fallback option for now, however, is to abuse unsafe to allow
> us to override the implementation details of Codahale metrics. So we can
> decouple the performance discussion for now from the deprecation
> discussion, but I think we should have a target of deprecating
> Codahale/DropWizard for the reasons Dmitry outlines, however we decide to
> do it.
>
> On 4 Mar 2025, at 21:17, Jon Haddad  wrote:
>
> I've got a few thoughts...
>
> On the performance side, I took a look at a few CPU profiles from past
> benchmarks and I'm seeing DropWizard taking ~ 3% of CPU time.  Is there a
> specific workload you're running where you're seeing it take up a
> significant % of CPU time?  Could you share some metrics, profile data, or
> a workload so I can try to reproduce your findings?  In my testing I've
> found the majority of the overhead from metrics to come from JMX, not
> DropWizard.
>
> On the operator side, inventing our own metrics lib means risks making it
> harder to instrument Cassandra.  There are libraries out there that allow
> you to tap into DropWizard metrics directly.  For example, Sarma Pydipally
> did a presentation on this last year [1] based on some code I threw
> together.
>
> If you're planning on making it easier to instrument C* by supporting
> sending metrics to the OTel collector [2], then I could see the change
> being a net win as long as the perf is no worse than the status quo.
>
> It's hard to know the full extent of what you're planning and the impact,
> so I'll save any opinions till I know more about the plan.
>
> Thanks for bringing this up!
> Jon
>
> [1]
> https://planetcassandra.org/leaf/apache-cassandra-lunch-62-grafana-dashboard-for-apache-cassandra-business-platform-team/
> [2] https://opentelemetry.io/docs/collector/
>
> On Tue, Mar 4, 2025 at 12:40 PM Dmitry Konstantinov 
> wrote:
>
>> Hi all,
>>
>> After a long conversation with Benedict and Maxim in CASSANDRA-20250
>>  I would like to
>> raise and discuss a proposal to deprecate Dropwizard/Codahale metrics usage
>> in the next major release of Cassandra server and drop it in the

Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-05 Thread Benedict
I really like the idea of integrating tracing, metrics and logging frameworks.I would like to have the time to look closely at the API before we decide to adopt it though. I agree that a widely deployed API has inherent benefits, but any API we adopt also shapes future evolution of our capabilities. Hopefully this is also a good API that allows us plenty of evolutionary headroom.On 5 Mar 2025, at 19:45, Josh McKenzie  wrote:if the plan is to rip out something old and unmaintained and replace with something new, I think there's a huge win to be had by implementing the standard that everyone's using now.Strong +1 on anything that's an ecosystem integration inflection point. The added benefit here is that if we architect ourselves to gracefully integrate with whatever system's are ubiquitous today, we'll inherit the migration work that any new industry-wide replacement system would need to do to become the new de facto standard.On Wed, Mar 5, 2025, at 2:23 PM, Jon Haddad wrote:Thank you for the replies.Dmitry: Based on some other patches you've worked on and your explanation here, it looks like you're optimizing the front door portion of write path - very cool.  Testing it in isolation with those settings makes sense if your goal is to push write throughput as far as you can, something I'm very much on board with, and is a key component to pushing density and reducing cost.  I'm spinning up a 5.0 cluster now to run a test, so I'll run a load test similar to what you've done and try to reproduce your results.  I'll also review the JIRA to get more familiar with what you're working on.Benedict: I agree with your line of thinking around optimizing the cost of metrics.  As we push both density and multi-tenancy, there's going to be more and more demand for clusters with hundreds or thousands of tables.  Maybe tens of thousands.  Reducing overhead for something that's O(N * M) (multiple counters per table) will definitely be a welcome improvement.  There's always more stuff that's going to get in the way, but it's an elephant and I appreciate every bite.My main concern with metrics isn't really compatibility, and I don't have any real investment in DropWizard.  I don't know if there's any real value in putting in effort to maintain compatibility, but I'm just one sample, so I won't make a strong statement here.It would be *very nice* we moved to metrics which implement the Open Telemetry Metrics API [1],  which I think solves multiple issues at once:* We can use either one of the existing implementations (OTel SDK) or our own* We get a "free" upgrade that lets people tap into the OTel ecosystem* It paves the way for OTel traces with ZipKin [2] / Jaeger [3]* We can use the ubiquitous Otel instrumentation agent to send metrics to the OTel collector, meaning people can collect at a much higher frequency than today* OTel logging is a significant improvement over logback, you can coorelate metrics + traces + logs together.Anyways, if the plan is to rip out something old and unmaintained and replace with something new, I think there's a huge win to be had by implementing the standard that everyone's using now.All this is very exciting and I appreciate the discussion!Jon[1] https://opentelemetry.io/docs/languages/java/api/[2] https://zipkin.io/[3] https://www.jaegertracing.io/On Wed, Mar 5, 2025 at 2:58 AM Dmitry Konstantinov  wrote:Hi Jon>>  Is there a specific workload you're running where you're seeing it take up a significant % of CPU time?  Could you share some metrics, profile data, or a workload so I can try to reproduce your findings? Yes, I have shared the workload generation command (sorry, it is in cassandra-stress, I have not yet adopted your tool but want to do it soon :-) ), setup details and async profiler CPU profile in CASSANDRA-20250 A summary:it is a plain insert-only workload to assert a max throughput capacity for a single node: ./tools/bin/cassandra-stress "write n=10m" -rate threads=100 -node myhostsmall amount of data per row is inserted, local SSD disks are used, so CPU is a primary bottleneck in this scenario (while it is quite synthetic in my real business cases CPU is a primary bottleneck as well)I used 5.1 trunk version (similar results I have for 5.0 version while I was checking CASSANDRA-20165)I enabled trie memetables + offheap objects modeI disabled compactiona recent nightly build is used for async-profilermy hardware is quite old: on-premise VM, Linux 4.18.0-240.el8.x86_64, OpenJdk-11.0.26+4, Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 16 coreslink to CPU profile ("codahale" code: 8.65%)-XX:+DebugNonSafepoints option is enabled to improve the profile precisionOn Wed, 5 Mar 2025 at 12:38, Benedict Elliott Smith  wrote:Some quick thoughts of my own…=== Performance ===- I have seen heap dumps with > 1GiB dedicated to metric counters. This patch should improve this, while opening up room to cut it further, steeply.- The performance improvement in rel

Re: CEP-15 Update

2025-03-05 Thread Jeremiah Jordan
 So great to see all this hard work about to pay off!

On the questions/concerns front, the only concern I would have towards
merging this to trunk is if any of the caveats apply when someone is not
using Accord.  Assuming they only apply when the feature flag is enabled, I
see no reason not to get this merged into trunk once everyone involved is
happy with the state of it.

-Jeremiah

On Mar 5, 2025 at 12:15:23 PM, Benedict Elliott Smith 
wrote:

> That depends on all of you lovely people :D
>
> I think we should have finished merging everything we want before QA by
> ~Monday; certainly not much later.
>
> I think we have some upgrade and python dtest failures to address as well.
>
> So it could be pretty soon if the community is supportive.
>
> On 5 Mar 2025, at 17:22, Patrick McFadin  wrote:
>
>
> What is the timing for starting the merge process? I'm asking because
>
> I have (yet another) presentation and this would be a cool update.
>
>
> On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith
>
>  wrote:
>
> >
>
> > Thanks everyone.
>
> >
>
> > Jon - your help will be greatly appreciated. We’ll let you know when
> we’ve got the cycles to invest in performance work (hopefully fairly soon).
> I expect the first step will be improving visibility so we can better
> understand what the system is doing (particularly the caching layers), but
> we can dig in together when ready.
>
> >
>
> > On 4 Mar 2025, at 18:15, Jon Haddad  wrote:
>
> >
>
> > Very exciting!
>
> >
>
> > I have a client that's very interested in Accord, so I should have
> budget to dig into it, especially on the performance side of things.
>
> >
>
> > Jon
>
> >
>
> > On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov 
> wrote:
>
> >>
>
> >> Thank you to all Accord and TCM contributors, it is really exciting to
> see a development of such huge and wonderful features moving forward and
> opening the door to the new Cassandra epoch!
>
> >>
>
> >> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston 
> wrote:
>
> >>>
>
> >>> Thanks Benedict!
>
> >>>
>
> >>> I’m really excited to see accord reach this milestone, even with these
> caveats. You seem to have left yourself off the list of contributors
> though, even though you’ve been a central figure in its development :) So
> thanks to all accord & tcm contributors, including Benedict, for making
> this possible!
>
> >>>
>
> >>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote:
>
> >>>
>
> >>> Hi everyone,
>
> >>>
>
> >>> It’s been exactly 3.5 years since the first commit to
> cassandra-accord. Yes, really, it’s been that long.
>
> >>>
>
> >>> We will be starting to validate the feature against real workloads in
> the near future, so we can’t sensibly push off merging much longer. The
> following is a brief run-down of the state of play. There are no known
> bugs, but there remain a number of caveats we will be incrementally
> addressing in the run-up to a full release:
>
> >>>
>
> >>> [1] Accord is likely to be SLOW until further optimisations are
> implemented
>
> >>> [2] Schema changes have a number of hard edges
>
> >>> [3] Validation is ongoing, so there are likely still a number of bugs
> to shake out
>
> >>> [4] Many operator visibility/tooling/documentation improvements are
> pending
>
> >>>
>
> >>> To expand a little:
>
> >>>
>
> >>> [1] As of the last experiment we conducted, accord’s throughput was
> poor - also leading to higher LAN latencies. We have done no WAN
> experiments to date, but the protocol guarantees should already achieve
> better round-trip performance, in particular under contention. Improving
> throughput will be the main focus of attention once we are satisfied the
> protocol is otherwise stable, but our focus remains validation for the
> moment.
>
> >>> [2] Schema changes have not yet been well integrated with TCM.
> Dropping a table for instance will currently cause problems if nodes are
> offline.
>
> >>> [3] We have a range of validations we are already performing against
> cassandra-accord directly, and against its integration with Cassandra in
> cep-15-accord. We have run hundreds of billions of simulated transactions,
> and are still discovering some minor fault every few billion simulated
> transactions or so. There remains a lot more simulated validation to
> explore, as well as with real clusters serving real workloads.
>
> >>> [4] There are already a range of virtual tables for exploring internal
> state in Accord, and reasonably good metric support. However, tracing is
> not yet supported, and our metric and virtual table integrations need some
> further development.
>
> >>> [5] There are also other edge cases to address such as ensuring we do
> not reuse HLCs after restart, supporting ByteOrderPartitioner, and live
> migration from/to Paxos is undergoing fine-tuning and validation; probably
> there are some other things I am forgetting.
>
> >>>
>
> >>> Altogether the feature is fairly mature, despite these caveats. This
> is the fruit of the labour of

Re: CEP-15 Update

2025-03-05 Thread Patrick McFadin
You have my +1

On Wed, Mar 5, 2025 at 12:16 PM Benedict  wrote:
>
> Correct, these caveats should only apply to tables that have opted-in to 
> accord.
>
> On 5 Mar 2025, at 20:08, Jeremiah Jordan  wrote:
>
> 
> So great to see all this hard work about to pay off!
>
> On the questions/concerns front, the only concern I would have towards 
> merging this to trunk is if any of the caveats apply when someone is not 
> using Accord.  Assuming they only apply when the feature flag is enabled, I 
> see no reason not to get this merged into trunk once everyone involved is 
> happy with the state of it.
>
> -Jeremiah
>
> On Mar 5, 2025 at 12:15:23 PM, Benedict Elliott Smith  
> wrote:
>>
>> That depends on all of you lovely people :D
>>
>> I think we should have finished merging everything we want before QA by 
>> ~Monday; certainly not much later.
>>
>> I think we have some upgrade and python dtest failures to address as well.
>>
>> So it could be pretty soon if the community is supportive.
>>
>> On 5 Mar 2025, at 17:22, Patrick McFadin  wrote:
>>
>>
>> What is the timing for starting the merge process? I'm asking because
>>
>> I have (yet another) presentation and this would be a cool update.
>>
>>
>> On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith
>>
>>  wrote:
>>
>> >
>>
>> > Thanks everyone.
>>
>> >
>>
>> > Jon - your help will be greatly appreciated. We’ll let you know when we’ve 
>> > got the cycles to invest in performance work (hopefully fairly soon). I 
>> > expect the first step will be improving visibility so we can better 
>> > understand what the system is doing (particularly the caching layers), but 
>> > we can dig in together when ready.
>>
>> >
>>
>> > On 4 Mar 2025, at 18:15, Jon Haddad  wrote:
>>
>> >
>>
>> > Very exciting!
>>
>> >
>>
>> > I have a client that's very interested in Accord, so I should have budget 
>> > to dig into it, especially on the performance side of things.
>>
>> >
>>
>> > Jon
>>
>> >
>>
>> > On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov  
>> > wrote:
>>
>> >>
>>
>> >> Thank you to all Accord and TCM contributors, it is really exciting to 
>> >> see a development of such huge and wonderful features moving forward and 
>> >> opening the door to the new Cassandra epoch!
>>
>> >>
>>
>> >> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston  wrote:
>>
>> >>>
>>
>> >>> Thanks Benedict!
>>
>> >>>
>>
>> >>> I’m really excited to see accord reach this milestone, even with these 
>> >>> caveats. You seem to have left yourself off the list of contributors 
>> >>> though, even though you’ve been a central figure in its development :) 
>> >>> So thanks to all accord & tcm contributors, including Benedict, for 
>> >>> making this possible!
>>
>> >>>
>>
>> >>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote:
>>
>> >>>
>>
>> >>> Hi everyone,
>>
>> >>>
>>
>> >>> It’s been exactly 3.5 years since the first commit to cassandra-accord. 
>> >>> Yes, really, it’s been that long.
>>
>> >>>
>>
>> >>> We will be starting to validate the feature against real workloads in 
>> >>> the near future, so we can’t sensibly push off merging much longer. The 
>> >>> following is a brief run-down of the state of play. There are no known 
>> >>> bugs, but there remain a number of caveats we will be incrementally 
>> >>> addressing in the run-up to a full release:
>>
>> >>>
>>
>> >>> [1] Accord is likely to be SLOW until further optimisations are 
>> >>> implemented
>>
>> >>> [2] Schema changes have a number of hard edges
>>
>> >>> [3] Validation is ongoing, so there are likely still a number of bugs to 
>> >>> shake out
>>
>> >>> [4] Many operator visibility/tooling/documentation improvements are 
>> >>> pending
>>
>> >>>
>>
>> >>> To expand a little:
>>
>> >>>
>>
>> >>> [1] As of the last experiment we conducted, accord’s throughput was poor 
>> >>> - also leading to higher LAN latencies. We have done no WAN experiments 
>> >>> to date, but the protocol guarantees should already achieve better 
>> >>> round-trip performance, in particular under contention. Improving 
>> >>> throughput will be the main focus of attention once we are satisfied the 
>> >>> protocol is otherwise stable, but our focus remains validation for the 
>> >>> moment.
>>
>> >>> [2] Schema changes have not yet been well integrated with TCM. Dropping 
>> >>> a table for instance will currently cause problems if nodes are offline.
>>
>> >>> [3] We have a range of validations we are already performing against 
>> >>> cassandra-accord directly, and against its integration with Cassandra in 
>> >>> cep-15-accord. We have run hundreds of billions of simulated 
>> >>> transactions, and are still discovering some minor fault every few 
>> >>> billion simulated transactions or so. There remains a lot more simulated 
>> >>> validation to explore, as well as with real clusters serving real 
>> >>> workloads.
>>
>> >>> [4] There are already a range of virtual tables for exploring internal 
>> >>> state in Accord, and reasonably g

Re: CEP-15 Update

2025-03-05 Thread Benedict
Correct, these caveats should only apply to tables that have opted-in to accord.On 5 Mar 2025, at 20:08, Jeremiah Jordan  wrote:
So great to see all this hard work about to pay off!On the questions/concerns front, the only concern I would have towards merging this to trunk is if any of the caveats apply when someone is not using Accord.  Assuming they only apply when the feature flag is enabled, I see no reason not to get this merged into trunk once everyone involved is happy with the state of it.-Jeremiah


On Mar 5, 2025 at 12:15:23 PM, Benedict Elliott Smith  wrote:




That depends on all of you lovely people :DI think we should have finished merging everything we want before QA by ~Monday; certainly not much later.I think we have some upgrade and python dtest failures to address as well.So it could be pretty soon if the community is supportive. On 5 Mar 2025, at 17:22, Patrick McFadin  wrote:  What is the timing for starting the merge process? I'm asking because I have (yet another) presentation and this would be a cool update.  On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith  wrote:> > Thanks everyone.> > Jon - your help will be greatly appreciated. We’ll let you know when we’ve got the cycles to invest in performance work (hopefully fairly soon). I expect the first step will be improving visibility so we can better understand what the system is doing (particularly the caching layers), but we can dig in together when ready.> > On 4 Mar 2025, at 18:15, Jon Haddad  wrote:> > Very exciting!> > I have a client that's very interested in Accord, so I should have budget to dig into it, especially on the performance side of things.> > Jon> > On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov  wrote:>> >> Thank you to all Accord and TCM contributors, it is really exciting to see a development of such huge and wonderful features moving forward and opening the door to the new Cassandra epoch!>> >> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston  wrote:>>> >>> Thanks Benedict!>>> >>> I’m really excited to see accord reach this milestone, even with these caveats. You seem to have left yourself off the list of contributors though, even though you’ve been a central figure in its development :) So thanks to all accord & tcm contributors, including Benedict, for making this possible!>>> >>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote:>>> >>> Hi everyone,>>> >>> It’s been exactly 3.5 years since the first commit to cassandra-accord. Yes, really, it’s been that long.>>> >>> We will be starting to validate the feature against real workloads in the near future, so we can’t sensibly push off merging much longer. The following is a brief run-down of the state of play. There are no known bugs, but there remain a number of caveats we will be incrementally addressing in the run-up to a full release:>>> >>> [1] Accord is likely to be SLOW until further optimisations are implemented>>> [2] Schema changes have a number of hard edges>>> [3] Validation is ongoing, so there are likely still a number of bugs to shake out>>> [4] Many operator visibility/tooling/documentation improvements are pending>>> >>> To expand a little:>>> >>> [1] As of the last experiment we conducted, accord’s throughput was poor - also leading to higher LAN latencies. We have done no WAN experiments to date, but the protocol guarantees should already achieve better round-trip performance, in particular under contention. Improving throughput will be the main focus of attention once we are satisfied the protocol is otherwise stable, but our focus remains validation for the moment.>>> [2] Schema changes have not yet been well integrated with TCM. Dropping a table for instance will currently cause problems if nodes are offline.>>> [3] We have a range of validations we are already performing against cassandra-accord directly, and against its integration with Cassandra in cep-15-accord. We have run hundreds of billions of simulated transactions, and are still discovering some minor fault every few billion simulated transactions or so. There remains a lot more simulated validation to explore, as well as with real clusters serving real workloads.>>> [4] There are already a range of virtual tables for exploring internal state in Accord, and reasonably good metric support. However, tracing is not yet supported, and our metric and virtual table integrations need some further development.>>> [5] There are also other edge cases to address such as ensuring we do not reuse HLCs after restart, supporting ByteOrderPartitioner, and live migration from/to Paxos is undergoing fine-tuning and validation; probably there are some other things I am forgetting.>>> >>> Altogether the feature is fairly mature, despite these caveats. This is the fruit of the labour of a long list of contributors, including Alekse

Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-05 Thread C. Scott Andreas
No strong opinion on particular choice of metrics library.My primary feedback is that if we swap metrics implementations and the new values are *different*, we can anticipate broad user confusion/interest.In particular if latency stats are reported higher post-upgrade, we should expect users to interpret this as a performance regression, dedicating significant resources to investigating the change, and expending credibility with stakeholders in their systems.- ScottOn Mar 5, 2025, at 11:57 AM, Benedict  wrote:I really like the idea of integrating tracing, metrics and logging frameworks.I would like to have the time to look closely at the API before we decide to adopt it though. I agree that a widely deployed API has inherent benefits, but any API we adopt also shapes future evolution of our capabilities. Hopefully this is also a good API that allows us plenty of evolutionary headroom.On 5 Mar 2025, at 19:45, Josh McKenzie  wrote:if the plan is to rip out something old and unmaintained and replace with something new, I think there's a huge win to be had by implementing the standard that everyone's using now.Strong +1 on anything that's an ecosystem integration inflection point. The added benefit here is that if we architect ourselves to gracefully integrate with whatever system's are ubiquitous today, we'll inherit the migration work that any new industry-wide replacement system would need to do to become the new de facto standard.On Wed, Mar 5, 2025, at 2:23 PM, Jon Haddad wrote:Thank you for the replies.Dmitry: Based on some other patches you've worked on and your explanation here, it looks like you're optimizing the front door portion of write path - very cool.  Testing it in isolation with those settings makes sense if your goal is to push write throughput as far as you can, something I'm very much on board with, and is a key component to pushing density and reducing cost.  I'm spinning up a 5.0 cluster now to run a test, so I'll run a load test similar to what you've done and try to reproduce your results.  I'll also review the JIRA to get more familiar with what you're working on.Benedict: I agree with your line of thinking around optimizing the cost of metrics.  As we push both density and multi-tenancy, there's going to be more and more demand for clusters with hundreds or thousands of tables.  Maybe tens of thousands.  Reducing overhead for something that's O(N * M) (multiple counters per table) will definitely be a welcome improvement.  There's always more stuff that's going to get in the way, but it's an elephant and I appreciate every bite.My main concern with metrics isn't really compatibility, and I don't have any real investment in DropWizard.  I don't know if there's any real value in putting in effort to maintain compatibility, but I'm just one sample, so I won't make a strong statement here.It would be *very nice* we moved to metrics which implement the Open Telemetry Metrics API [1],  which I think solves multiple issues at once:* We can use either one of the existing implementations (OTel SDK) or our own* We get a "free" upgrade that lets people tap into the OTel ecosystem* It paves the way for OTel traces with ZipKin [2] / Jaeger [3]* We can use the ubiquitous Otel instrumentation agent to send metrics to the OTel collector, meaning people can collect at a much higher frequency than today* OTel logging is a significant improvement over logback, you can coorelate metrics + traces + logs together.Anyways, if the plan is to rip out something old and unmaintained and replace with something new, I think there's a huge win to be had by implementing the standard that everyone's using now.All this is very exciting and I appreciate the discussion!Jon[1] https://opentelemetry.io/docs/languages/java/api/[2] https://zipkin.io/[3] https://www.jaegertracing.io/On Wed, Mar 5, 2025 at 2:58 AM Dmitry Konstantinov  wrote:Hi Jon>>  Is there a specific workload you're running where you're seeing it take up a significant % of CPU time?  Could you share some metrics, profile data, or a workload so I can try to reproduce your findings? Yes, I have shared the workload generation command (sorry, it is in cassandra-stress, I have not yet adopted your tool but want to do it soon :-) ), setup details and async profiler CPU profile in CASSANDRA-20250 A summary:it is a plain insert-only workload to assert a max throughput capacity for a single node: ./tools/bin/cassandra-stress "write n=10m" -rate threads=100 -node myhostsmall amount of data per row is inserted, local SSD disks are used, so CPU is a primary bottleneck in this scenario (while it is quite synthetic in my real business cases CPU is a primary bottleneck as well)I used 5.1 trunk version (similar results I have for 5.0 version while I was checking CASSANDRA-20165)I enabled trie memetables + offheap objects modeI disabled compactiona recent nightly build is used for async-profilermy hardware is quite old: on-premise VM, Linux 4.18.0-2

Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-05 Thread Josh McKenzie
> if the plan is to rip out something old and unmaintained and replace with 
> something new, I think there's a huge win to be had by implementing the 
> standard that everyone's using now.
Strong +1 on anything that's an ecosystem integration inflection point. The 
added benefit here is that if we architect ourselves to gracefully integrate 
with whatever system's are ubiquitous today, we'll inherit the migration work 
that any new industry-wide replacement system would need to do to become the 
new de facto standard.

On Wed, Mar 5, 2025, at 2:23 PM, Jon Haddad wrote:
> Thank you for the replies.
> 
> Dmitry: Based on some other patches you've worked on and your explanation 
> here, it looks like you're optimizing the front door portion of write path - 
> very cool.  Testing it in isolation with those settings makes sense if your 
> goal is to push write throughput as far as you can, something I'm very much 
> on board with, and is a key component to pushing density and reducing cost.  
> I'm spinning up a 5.0 cluster now to run a test, so I'll run a load test 
> similar to what you've done and try to reproduce your results.  I'll also 
> review the JIRA to get more familiar with what you're working on.
> 
> Benedict: I agree with your line of thinking around optimizing the cost of 
> metrics.  As we push both density and multi-tenancy, there's going to be more 
> and more demand for clusters with hundreds or thousands of tables.  Maybe 
> tens of thousands.  Reducing overhead for something that's O(N * M) (multiple 
> counters per table) will definitely be a welcome improvement.  There's always 
> more stuff that's going to get in the way, but it's an elephant and I 
> appreciate every bite.
> 
> My main concern with metrics isn't really compatibility, and I don't have any 
> real investment in DropWizard.  I don't know if there's any real value in 
> putting in effort to maintain compatibility, but I'm just one sample, so I 
> won't make a strong statement here.
> 
> It would be *very nice* we moved to metrics which implement the Open 
> Telemetry Metrics API [1],  which I think solves multiple issues at once:
> 
> * We can use either one of the existing implementations (OTel SDK) or our own
> * We get a "free" upgrade that lets people tap into the OTel ecosystem
> * It paves the way for OTel traces with ZipKin [2] / Jaeger [3]
> * We can use the ubiquitous Otel instrumentation agent to send metrics to the 
> OTel collector, meaning people can collect at a much higher frequency than 
> today
> * OTel logging is a significant improvement over logback, you can coorelate 
> metrics + traces + logs together.
> 
> Anyways, if the plan is to rip out something old and unmaintained and replace 
> with something new, I think there's a huge win to be had by implementing the 
> standard that everyone's using now.
> 
> All this is very exciting and I appreciate the discussion!
> 
> Jon
> 
> [1] https://opentelemetry.io/docs/languages/java/api/
> [2] https://zipkin.io/
> [3] https://www.jaegertracing.io/
> 
> 
> 
> 
> On Wed, Mar 5, 2025 at 2:58 AM Dmitry Konstantinov  wrote:
>> Hi Jon
>> 
>> >>  Is there a specific workload you're running where you're seeing it take 
>> >> up a significant % of CPU time?  Could you share some metrics, profile 
>> >> data, or a workload so I can try to reproduce your findings? 
>> Yes, I have shared the workload generation command (sorry, it is in 
>> cassandra-stress, I have not yet adopted your tool but want to do it soon 
>> :-) ), setup details and async profiler CPU profile in CASSANDRA-20250 
>>  
>> A summary:
>>  • it is a plain insert-only workload to assert a max throughput capacity 
>> for a single node: ./tools/bin/cassandra-stress "write n=10m" -rate 
>> threads=100 -node myhost
>>  • small amount of data per row is inserted, local SSD disks are used, so 
>> CPU is a primary bottleneck in this scenario (while it is quite synthetic in 
>> my real business cases CPU is a primary bottleneck as well)
>>  • I used 5.1 trunk version (similar results I have for 5.0 version while I 
>> was checking CASSANDRA-20165 
>> )
>>  • I enabled trie memetables + offheap objects mode
>>  • I disabled compaction
>>  • a recent nightly build is used for async-profiler
>>  • my hardware is quite old: on-premise VM, Linux 4.18.0-240.el8.x86_64, 
>> OpenJdk-11.0.26+4, Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 16 cores
>>  • link to CPU profile 
>> 
>>  ("codahale" code: 8.65%)
>>  • -XX:+DebugNonSafepoints option is enabled to improve the profile precision
>> 
>> On Wed, 5 Mar 2025 at 12:38, Benedict Elliott Smith  
>> wrote:
>>> Some quick thoughts of my own…
>>> 
>>> === Performance ===
>>> - I have seen heap dumps with > 1GiB dedicated to metric counters. This 
>>> patch should improve this, 

Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-05 Thread Jon Haddad
Thank you for the replies.

Dmitry: Based on some other patches you've worked on and your explanation
here, it looks like you're optimizing the front door portion of write path
- very cool.  Testing it in isolation with those settings makes sense if
your goal is to push write throughput as far as you can, something I'm very
much on board with, and is a key component to pushing density and reducing
cost.  I'm spinning up a 5.0 cluster now to run a test, so I'll run a load
test similar to what you've done and try to reproduce your results.  I'll
also review the JIRA to get more familiar with what you're working on.

Benedict: I agree with your line of thinking around optimizing the cost of
metrics.  As we push both density and multi-tenancy, there's going to be
more and more demand for clusters with hundreds or thousands of tables.
Maybe tens of thousands.  Reducing overhead for something that's O(N * M)
(multiple counters per table) will definitely be a welcome improvement.
There's always more stuff that's going to get in the way, but it's an
elephant and I appreciate every bite.

My main concern with metrics isn't really compatibility, and I don't have
any real investment in DropWizard.  I don't know if there's any real value
in putting in effort to maintain compatibility, but I'm just one sample, so
I won't make a strong statement here.

It would be *very nice* we moved to metrics which implement the Open
Telemetry Metrics API [1],  which I think solves multiple issues at once:

* We can use either one of the existing implementations (OTel SDK) or our
own
* We get a "free" upgrade that lets people tap into the OTel ecosystem
* It paves the way for OTel traces with ZipKin [2] / Jaeger [3]
* We can use the ubiquitous Otel instrumentation agent to send metrics to
the OTel collector, meaning people can collect at a much higher frequency
than today
* OTel logging is a significant improvement over logback, you can coorelate
metrics + traces + logs together.

Anyways, if the plan is to rip out something old and unmaintained and
replace with something new, I think there's a huge win to be had by
implementing the standard that everyone's using now.

All this is very exciting and I appreciate the discussion!

Jon

[1] https://opentelemetry.io/docs/languages/java/api/
[2] https://zipkin.io/
[3] https://www.jaegertracing.io/




On Wed, Mar 5, 2025 at 2:58 AM Dmitry Konstantinov 
wrote:

> Hi Jon
>
> >>  Is there a specific workload you're running where you're seeing it
> take up a significant % of CPU time?  Could you share some metrics, profile
> data, or a workload so I can try to reproduce your findings?
> Yes, I have shared the workload generation command (sorry, it is in
> cassandra-stress, I have not yet adopted your tool but want to do it soon
> :-) ), setup details and async profiler CPU profile in CASSANDRA-20250
> 
> A summary:
>
>- it is a plain insert-only workload to assert a max
>throughput capacity for a single node: ./tools/bin/cassandra-stress "write
>n=10m" -rate threads=100 -node myhost
>- small amount of data per row is inserted, local SSD disks are used,
>so CPU is a primary bottleneck in this scenario (while it is quite
>synthetic in my real business cases CPU is a primary bottleneck as well)
>- I used 5.1 trunk version (similar results I have for 5.0 version
>while I was checking CASSANDRA-20165
>)
>- I enabled trie memetables + offheap objects mode
>- I disabled compaction
>- a recent nightly build is used for async-profiler
>- my hardware is quite old: on-premise VM, Linux
>4.18.0-240.el8.x86_64, OpenJdk-11.0.26+4, Intel(R) Xeon(R) CPU E5-2680 v4 @
>2.40GHz, 16 cores
>- link to CPU profile
>
> 
>  ("codahale"
>code: 8.65%)
>- -XX:+DebugNonSafepoints option is enabled to improve the profile
>precision
>
>
> On Wed, 5 Mar 2025 at 12:38, Benedict Elliott Smith 
> wrote:
>
>> Some quick thoughts of my own…
>>
>> === Performance ===
>> - I have seen heap dumps with > 1GiB dedicated to metric counters. This
>> patch should improve this, while opening up room to cut it further, steeply.
>> - The performance improvement in relative terms for the metrics being
>> replaced is rather dramatic - about 80%.. We can also improve this further.
>> - Cheaper metrics (in terms of both cpu and memory) means we can readily
>> have more of them, exposing finer-grained details. This is hard to
>> understate the value of.
>>
>> === Reporting ===
>> - We’re already non-standard for our most important metrics, because we
>> had to replace the Codahale histogram years ago
>> - We can continue implementing the Codahale interfaces, so that exporting
>> libraries have minimal work to support us
>> - We can probably push patches upstream to a coup

Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-05 Thread Jeff Jirsa
I think widely accepted that otel in general has won this stage of observability, as most metrics systems allow it and most saas providers support it. So Jon’s point there is important. The promise of unifying logs/traces/metrics usually (aka wide events) is far more important in the tracing side of our observability than in the areas we use Codahale/DropWizard. Scott: if we swap, we can (probably should) deprecate like everything else, and run both side by side for a release so people don’t lose metrics entirely on bounce? FF both, to control double cost during the transition. On Mar 5, 2025, at 8:21 PM, C. Scott Andreas  wrote:No strong opinion on particular choice of metrics library.My primary feedback is that if we swap metrics implementations and the new values are *different*, we can anticipate broad user confusion/interest.In particular if latency stats are reported higher post-upgrade, we should expect users to interpret this as a performance regression, dedicating significant resources to investigating the change, and expending credibility with stakeholders in their systems.- ScottOn Mar 5, 2025, at 11:57 AM, Benedict  wrote:I really like the idea of integrating tracing, metrics and logging frameworks.I would like to have the time to look closely at the API before we decide to adopt it though. I agree that a widely deployed API has inherent benefits, but any API we adopt also shapes future evolution of our capabilities. Hopefully this is also a good API that allows us plenty of evolutionary headroom.On 5 Mar 2025, at 19:45, Josh McKenzie  wrote:if the plan is to rip out something old and unmaintained and replace with something new, I think there's a huge win to be had by implementing the standard that everyone's using now.Strong +1 on anything that's an ecosystem integration inflection point. The added benefit here is that if we architect ourselves to gracefully integrate with whatever system's are ubiquitous today, we'll inherit the migration work that any new industry-wide replacement system would need to do to become the new de facto standard.On Wed, Mar 5, 2025, at 2:23 PM, Jon Haddad wrote:Thank you for the replies.Dmitry: Based on some other patches you've worked on and your explanation here, it looks like you're optimizing the front door portion of write path - very cool.  Testing it in isolation with those settings makes sense if your goal is to push write throughput as far as you can, something I'm very much on board with, and is a key component to pushing density and reducing cost.  I'm spinning up a 5.0 cluster now to run a test, so I'll run a load test similar to what you've done and try to reproduce your results.  I'll also review the JIRA to get more familiar with what you're working on.Benedict: I agree with your line of thinking around optimizing the cost of metrics.  As we push both density and multi-tenancy, there's going to be more and more demand for clusters with hundreds or thousands of tables.  Maybe tens of thousands.  Reducing overhead for something that's O(N * M) (multiple counters per table) will definitely be a welcome improvement.  There's always more stuff that's going to get in the way, but it's an elephant and I appreciate every bite.My main concern with metrics isn't really compatibility, and I don't have any real investment in DropWizard.  I don't know if there's any real value in putting in effort to maintain compatibility, but I'm just one sample, so I won't make a strong statement here.It would be *very nice* we moved to metrics which implement the Open Telemetry Metrics API [1],  which I think solves multiple issues at once:* We can use either one of the existing implementations (OTel SDK) or our own* We get a "free" upgrade that lets people tap into the OTel ecosystem* It paves the way for OTel traces with ZipKin [2] / Jaeger [3]* We can use the ubiquitous Otel instrumentation agent to send metrics to the OTel collector, meaning people can collect at a much higher frequency than today* OTel logging is a significant improvement over logback, you can coorelate metrics + traces + logs together.Anyways, if the plan is to rip out something old and unmaintained and replace with something new, I think there's a huge win to be had by implementing the standard that everyone's using now.All this is very exciting and I appreciate the discussion!Jon[1] https://opentelemetry.io/docs/languages/java/api/[2] https://zipkin.io/[3] https://www.jaegertracing.io/On Wed, Mar 5, 2025 at 2:58 AM Dmitry Konstantinov  wrote:Hi Jon>>  Is there a specific workload you're running where you're seeing it take up a significant % of CPU time?  Could you share some metrics, profile data, or a workload so I can try to reproduce your findings? Yes, I have shared the workload generation command (sorry, it is in cassandra-stress, I have not yet adopted your tool but want to do it soon :-) ), setup details and async profiler CPU profile in CASSANDRA-20250 A summary:it is a

Re: Welcome Aaron Ploetz as Cassandra Committer

2025-03-05 Thread Mick Semb Wever
Well deserved Aaron !  🎉

On Tue, 4 Mar 2025 at 19:27, Jon Haddad  wrote:
>
> Congrats Aaron!
>
> On Tue, Mar 4, 2025 at 10:26 AM Jordan West  wrote:
>>
>> Congratulations!!
>> On Tue, Mar 4, 2025 at 09:57 Tolbert, Andy  wrote:
>>>
>>> Congrats Aaron!
>>>
>>> On Tue, Mar 4, 2025 at 11:24 AM Francisco Guerrero  
>>> wrote:

 Congratulations Aaron!

 On 2025/03/04 00:23:49 Patrick McFadin wrote:
 > The Apache Cassandra PMC is very happy to announce that Aaron Ploetz has
 > accepted the invitation to become a committer!
 >
 > Aaron has been tireless in his mission to help every single Cassandra
 > operator on planet Earth. If you don't believe me, check out his Stack
 > Overflow profile page: https://stackoverflow.com/users/1054558/aaron
 > He's been a continuous speaker on Cassandra topics and is one of the
 > coordinators for the Planet Cassandra meetup. Those are just the
 > recent highlights.
 >
 > Please join us in congratulating and welcoming Aaron.
 >
 > The Apache Cassandra PMC members
 >


Re: Welcome Bernardo Botella as Cassandra Committer

2025-03-05 Thread Mick Semb Wever
Congrats Bernardo!

On Wed, 5 Mar 2025 at 16:03, Abe Ratnofsky  wrote:

> Congratulations Bernardo! Great news.
>


Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-05 Thread Maxim Muzafarov
If we do swap, we may run into the same issues with third-party
metrics libraries in the next 10-15 years that we are discussing now
with the Codahale we added ~10-15 years ago, and given the fact that a
proposed new API is quite small my personal feeling is that it would
be our best choice for the metrics.

Having our own API also doesn't prevent us from having all the
integrations with new 3-rd party libraries the world will develop in
future, just by writing custom adapters to our own -- this will be
possible for the Codahale (with some suboptimal considerations), where
we have to support backwards compatibility, and for the OpenTelemetry
as well. We already have the CEP-32[1] proposal to instrument metrics;
in this sense, it doesn't change much for us.

Another point of having our own API is the virtual tables we have --
it gives us enough flexibility and latitude to export the metrics
efficiently via the virtual tables by implementing the access patterns
we consider important.

[1] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255071749#CEP32:(DRAFT)OpenTelemetryintegration-ExportingMetricsthroughOpenTelemetry
[2 https://opentelemetry.io/docs/languages/java/instrumentation/

On Wed, 5 Mar 2025 at 21:35, Jeff Jirsa  wrote:
>
> I think widely accepted that otel in general has won this stage of 
> observability, as most metrics systems allow it and most saas providers 
> support it. So Jon’s point there is important.
>
> The promise of unifying logs/traces/metrics usually (aka wide events) is far 
> more important in the tracing side of our observability than in the areas we 
> use Codahale/DropWizard.
>
> Scott: if we swap, we can (probably should) deprecate like everything else, 
> and run both side by side for a release so people don’t lose metrics entirely 
> on bounce? FF both, to control double cost during the transition.
>
>
>
>
> On Mar 5, 2025, at 8:21 PM, C. Scott Andreas  wrote:
>
> No strong opinion on particular choice of metrics library.
>
> My primary feedback is that if we swap metrics implementations and the new 
> values are *different*, we can anticipate broad user confusion/interest.
>
> In particular if latency stats are reported higher post-upgrade, we should 
> expect users to interpret this as a performance regression, dedicating 
> significant resources to investigating the change, and expending credibility 
> with stakeholders in their systems.
>
> - Scott
>
> On Mar 5, 2025, at 11:57 AM, Benedict  wrote:
>
> 
> I really like the idea of integrating tracing, metrics and logging frameworks.
>
> I would like to have the time to look closely at the API before we decide to 
> adopt it though. I agree that a widely deployed API has inherent benefits, 
> but any API we adopt also shapes future evolution of our capabilities. 
> Hopefully this is also a good API that allows us plenty of evolutionary 
> headroom.
>
>
> On 5 Mar 2025, at 19:45, Josh McKenzie  wrote:
>
> 
>
> if the plan is to rip out something old and unmaintained and replace with 
> something new, I think there's a huge win to be had by implementing the 
> standard that everyone's using now.
>
> Strong +1 on anything that's an ecosystem integration inflection point. The 
> added benefit here is that if we architect ourselves to gracefully integrate 
> with whatever system's are ubiquitous today, we'll inherit the migration work 
> that any new industry-wide replacement system would need to do to become the 
> new de facto standard.
>
> On Wed, Mar 5, 2025, at 2:23 PM, Jon Haddad wrote:
>
> Thank you for the replies.
>
> Dmitry: Based on some other patches you've worked on and your explanation 
> here, it looks like you're optimizing the front door portion of write path - 
> very cool.  Testing it in isolation with those settings makes sense if your 
> goal is to push write throughput as far as you can, something I'm very much 
> on board with, and is a key component to pushing density and reducing cost.  
> I'm spinning up a 5.0 cluster now to run a test, so I'll run a load test 
> similar to what you've done and try to reproduce your results.  I'll also 
> review the JIRA to get more familiar with what you're working on.
>
> Benedict: I agree with your line of thinking around optimizing the cost of 
> metrics.  As we push both density and multi-tenancy, there's going to be more 
> and more demand for clusters with hundreds or thousands of tables.  Maybe 
> tens of thousands.  Reducing overhead for something that's O(N * M) (multiple 
> counters per table) will definitely be a welcome improvement.  There's always 
> more stuff that's going to get in the way, but it's an elephant and I 
> appreciate every bite.
>
> My main concern with metrics isn't really compatibility, and I don't have any 
> real investment in DropWizard.  I don't know if there's any real value in 
> putting in effort to maintain compatibility, but I'm just one sample, so I 
> won't make a strong statement here.
>
> It would be *

Re: Dropwizard/Codahale metrics deprecation in Cassandra server

2025-03-05 Thread Patrick McFadin
We can also do an education campaign to get people to migrate. There
will be good reasons to do it.

On Wed, Mar 5, 2025 at 12:33 PM Jeff Jirsa  wrote:
>
> I think widely accepted that otel in general has won this stage of 
> observability, as most metrics systems allow it and most saas providers 
> support it. So Jon’s point there is important.
>
> The promise of unifying logs/traces/metrics usually (aka wide events) is far 
> more important in the tracing side of our observability than in the areas we 
> use Codahale/DropWizard.
>
> Scott: if we swap, we can (probably should) deprecate like everything else, 
> and run both side by side for a release so people don’t lose metrics entirely 
> on bounce? FF both, to control double cost during the transition.
>
>
>
>
> On Mar 5, 2025, at 8:21 PM, C. Scott Andreas  wrote:
>
> No strong opinion on particular choice of metrics library.
>
> My primary feedback is that if we swap metrics implementations and the new 
> values are *different*, we can anticipate broad user confusion/interest.
>
> In particular if latency stats are reported higher post-upgrade, we should 
> expect users to interpret this as a performance regression, dedicating 
> significant resources to investigating the change, and expending credibility 
> with stakeholders in their systems.
>
> - Scott
>
> On Mar 5, 2025, at 11:57 AM, Benedict  wrote:
>
> 
> I really like the idea of integrating tracing, metrics and logging frameworks.
>
> I would like to have the time to look closely at the API before we decide to 
> adopt it though. I agree that a widely deployed API has inherent benefits, 
> but any API we adopt also shapes future evolution of our capabilities. 
> Hopefully this is also a good API that allows us plenty of evolutionary 
> headroom.
>
>
> On 5 Mar 2025, at 19:45, Josh McKenzie  wrote:
>
> 
>
> if the plan is to rip out something old and unmaintained and replace with 
> something new, I think there's a huge win to be had by implementing the 
> standard that everyone's using now.
>
> Strong +1 on anything that's an ecosystem integration inflection point. The 
> added benefit here is that if we architect ourselves to gracefully integrate 
> with whatever system's are ubiquitous today, we'll inherit the migration work 
> that any new industry-wide replacement system would need to do to become the 
> new de facto standard.
>
> On Wed, Mar 5, 2025, at 2:23 PM, Jon Haddad wrote:
>
> Thank you for the replies.
>
> Dmitry: Based on some other patches you've worked on and your explanation 
> here, it looks like you're optimizing the front door portion of write path - 
> very cool.  Testing it in isolation with those settings makes sense if your 
> goal is to push write throughput as far as you can, something I'm very much 
> on board with, and is a key component to pushing density and reducing cost.  
> I'm spinning up a 5.0 cluster now to run a test, so I'll run a load test 
> similar to what you've done and try to reproduce your results.  I'll also 
> review the JIRA to get more familiar with what you're working on.
>
> Benedict: I agree with your line of thinking around optimizing the cost of 
> metrics.  As we push both density and multi-tenancy, there's going to be more 
> and more demand for clusters with hundreds or thousands of tables.  Maybe 
> tens of thousands.  Reducing overhead for something that's O(N * M) (multiple 
> counters per table) will definitely be a welcome improvement.  There's always 
> more stuff that's going to get in the way, but it's an elephant and I 
> appreciate every bite.
>
> My main concern with metrics isn't really compatibility, and I don't have any 
> real investment in DropWizard.  I don't know if there's any real value in 
> putting in effort to maintain compatibility, but I'm just one sample, so I 
> won't make a strong statement here.
>
> It would be *very nice* we moved to metrics which implement the Open 
> Telemetry Metrics API [1],  which I think solves multiple issues at once:
>
> * We can use either one of the existing implementations (OTel SDK) or our own
> * We get a "free" upgrade that lets people tap into the OTel ecosystem
> * It paves the way for OTel traces with ZipKin [2] / Jaeger [3]
> * We can use the ubiquitous Otel instrumentation agent to send metrics to the 
> OTel collector, meaning people can collect at a much higher frequency than 
> today
> * OTel logging is a significant improvement over logback, you can coorelate 
> metrics + traces + logs together.
>
> Anyways, if the plan is to rip out something old and unmaintained and replace 
> with something new, I think there's a huge win to be had by implementing the 
> standard that everyone's using now.
>
> All this is very exciting and I appreciate the discussion!
>
> Jon
>
> [1] https://opentelemetry.io/docs/languages/java/api/
> [2] https://zipkin.io/
> [3] https://www.jaegertracing.io/
>
>
>
>
> On Wed, Mar 5, 2025 at 2:58 AM Dmitry Konstantinov  wrote:
>
> Hi Jon
>
> >>  

Re: CEP-15 Update

2025-03-05 Thread Blake Eggleston
+1 to merging it

On Wed, Mar 5, 2025, at 12:22 PM, Patrick McFadin wrote:
> You have my +1
> 
> On Wed, Mar 5, 2025 at 12:16 PM Benedict  wrote:
> >
> > Correct, these caveats should only apply to tables that have opted-in to 
> > accord.
> >
> > On 5 Mar 2025, at 20:08, Jeremiah Jordan  wrote:
> >
> > 
> > So great to see all this hard work about to pay off!
> >
> > On the questions/concerns front, the only concern I would have towards 
> > merging this to trunk is if any of the caveats apply when someone is not 
> > using Accord.  Assuming they only apply when the feature flag is enabled, I 
> > see no reason not to get this merged into trunk once everyone involved is 
> > happy with the state of it.
> >
> > -Jeremiah
> >
> > On Mar 5, 2025 at 12:15:23 PM, Benedict Elliott Smith  
> > wrote:
> >>
> >> That depends on all of you lovely people :D
> >>
> >> I think we should have finished merging everything we want before QA by 
> >> ~Monday; certainly not much later.
> >>
> >> I think we have some upgrade and python dtest failures to address as well.
> >>
> >> So it could be pretty soon if the community is supportive.
> >>
> >> On 5 Mar 2025, at 17:22, Patrick McFadin  wrote:
> >>
> >>
> >> What is the timing for starting the merge process? I'm asking because
> >>
> >> I have (yet another) presentation and this would be a cool update.
> >>
> >>
> >> On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith
> >>
> >>  wrote:
> >>
> >> >
> >>
> >> > Thanks everyone.
> >>
> >> >
> >>
> >> > Jon - your help will be greatly appreciated. We’ll let you know when 
> >> > we’ve got the cycles to invest in performance work (hopefully fairly 
> >> > soon). I expect the first step will be improving visibility so we can 
> >> > better understand what the system is doing (particularly the caching 
> >> > layers), but we can dig in together when ready.
> >>
> >> >
> >>
> >> > On 4 Mar 2025, at 18:15, Jon Haddad  wrote:
> >>
> >> >
> >>
> >> > Very exciting!
> >>
> >> >
> >>
> >> > I have a client that's very interested in Accord, so I should have 
> >> > budget to dig into it, especially on the performance side of things.
> >>
> >> >
> >>
> >> > Jon
> >>
> >> >
> >>
> >> > On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov  
> >> > wrote:
> >>
> >> >>
> >>
> >> >> Thank you to all Accord and TCM contributors, it is really exciting to 
> >> >> see a development of such huge and wonderful features moving forward 
> >> >> and opening the door to the new Cassandra epoch!
> >>
> >> >>
> >>
> >> >> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston  
> >> >> wrote:
> >>
> >> >>>
> >>
> >> >>> Thanks Benedict!
> >>
> >> >>>
> >>
> >> >>> I’m really excited to see accord reach this milestone, even with these 
> >> >>> caveats. You seem to have left yourself off the list of contributors 
> >> >>> though, even though you’ve been a central figure in its development :) 
> >> >>> So thanks to all accord & tcm contributors, including Benedict, for 
> >> >>> making this possible!
> >>
> >> >>>
> >>
> >> >>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote:
> >>
> >> >>>
> >>
> >> >>> Hi everyone,
> >>
> >> >>>
> >>
> >> >>> It’s been exactly 3.5 years since the first commit to 
> >> >>> cassandra-accord. Yes, really, it’s been that long.
> >>
> >> >>>
> >>
> >> >>> We will be starting to validate the feature against real workloads in 
> >> >>> the near future, so we can’t sensibly push off merging much longer. 
> >> >>> The following is a brief run-down of the state of play. There are no 
> >> >>> known bugs, but there remain a number of caveats we will be 
> >> >>> incrementally addressing in the run-up to a full release:
> >>
> >> >>>
> >>
> >> >>> [1] Accord is likely to be SLOW until further optimisations are 
> >> >>> implemented
> >>
> >> >>> [2] Schema changes have a number of hard edges
> >>
> >> >>> [3] Validation is ongoing, so there are likely still a number of bugs 
> >> >>> to shake out
> >>
> >> >>> [4] Many operator visibility/tooling/documentation improvements are 
> >> >>> pending
> >>
> >> >>>
> >>
> >> >>> To expand a little:
> >>
> >> >>>
> >>
> >> >>> [1] As of the last experiment we conducted, accord’s throughput was 
> >> >>> poor - also leading to higher LAN latencies. We have done no WAN 
> >> >>> experiments to date, but the protocol guarantees should already 
> >> >>> achieve better round-trip performance, in particular under contention. 
> >> >>> Improving throughput will be the main focus of attention once we are 
> >> >>> satisfied the protocol is otherwise stable, but our focus remains 
> >> >>> validation for the moment.
> >>
> >> >>> [2] Schema changes have not yet been well integrated with TCM. 
> >> >>> Dropping a table for instance will currently cause problems if nodes 
> >> >>> are offline.
> >>
> >> >>> [3] We have a range of validations we are already performing against 
> >> >>> cassandra-accord directly, and against its integration with Cassandra 
> >> >>> in cep-15-accord. We have run hun

Re: Welcome Aaron Ploetz as Cassandra Committer

2025-03-05 Thread Anthony Grasso
Congratulations Aaron!

On Thu, 6 Mar 2025 at 07:37, Mick Semb Wever  wrote:

> Well deserved Aaron !  🎉
>
> On Tue, 4 Mar 2025 at 19:27, Jon Haddad  wrote:
> >
> > Congrats Aaron!
> >
> > On Tue, Mar 4, 2025 at 10:26 AM Jordan West  wrote:
> >>
> >> Congratulations!!
> >> On Tue, Mar 4, 2025 at 09:57 Tolbert, Andy  wrote:
> >>>
> >>> Congrats Aaron!
> >>>
> >>> On Tue, Mar 4, 2025 at 11:24 AM Francisco Guerrero 
> wrote:
> 
>  Congratulations Aaron!
> 
>  On 2025/03/04 00:23:49 Patrick McFadin wrote:
>  > The Apache Cassandra PMC is very happy to announce that Aaron
> Ploetz has
>  > accepted the invitation to become a committer!
>  >
>  > Aaron has been tireless in his mission to help every single
> Cassandra
>  > operator on planet Earth. If you don't believe me, check out his
> Stack
>  > Overflow profile page:
> https://stackoverflow.com/users/1054558/aaron
>  > He's been a continuous speaker on Cassandra topics and is one of the
>  > coordinators for the Planet Cassandra meetup. Those are just the
>  > recent highlights.
>  >
>  > Please join us in congratulating and welcoming Aaron.
>  >
>  > The Apache Cassandra PMC members
>  >
>