It sounds like for the original query we have a broad consensus:1) Deprecate Codahale, but for the next major version publish compatible metrics2) After the next release, move to a codahale-like registry that allows us to be efficient without abusing unsafe, and continue publishing metrics that imp
Absolutely, happy to share. All tests were done using easy-cass-stress v9
and easy-cass-lab, with the latest released 5.0 (not including 15452 or
20092). Instructions at the end.
> Regarding allocation rate vs throughput, unfortunately allocation rate vs
throughput are not connected linearly,
Y
Jon, thank you for testing!, can you share your CPU profile and test load
details? Have you tested it with CASSANDRA-20092 changes included?
>> Allocations related to codahale were < 1%.
Just to clarify: in the initial mail by memory footprint I mean the static
amount of memory used to store metri
Definitely +1 on registry + docs. I believe that's part of the OTel Java
SDK [1][2]
I did some performance testing yesterday and was able to replicate the
findings where the codahale code path took 7-10% of CPU time. The only
caveat is that it only happens with compaction disabled. Once compact
Just something to be mindful about what we had *before* codahale in
Cassandra and avoid that again. Pre 1.1 it was pretty much impossible to
collect metrics without looking at code (there were efficient custom made
things, but each metric was reported differently) and that stuck through
until 2.2 d
> Having something like a registry and standardizing/enforcing all metric types
> is something we should be sure to maintain.
A registry w/documentation on each metric indicating *what it's actually
measuring and what it means* would be great for our users.
On Mon, Mar 10, 2025, at 3:46 PM, Chri
As long as operators are able to use all the OTel tooling, I'm happy. I'm
not looking to try to decide what the metrics API looks like, although I
think trying to plan for 15 years out is a bit unnecessary. A lot of the DB
will be replaced by then. That said, I'm mostly hands off on code and you
We can also do an education campaign to get people to migrate. There
will be good reasons to do it.
On Wed, Mar 5, 2025 at 12:33 PM Jeff Jirsa wrote:
>
> I think widely accepted that otel in general has won this stage of
> observability, as most metrics systems allow it and most saas providers
If we do swap, we may run into the same issues with third-party
metrics libraries in the next 10-15 years that we are discussing now
with the Codahale we added ~10-15 years ago, and given the fact that a
proposed new API is quite small my personal feeling is that it would
be our best choice for the
I think widely accepted that otel in general has won this stage of observability, as most metrics systems allow it and most saas providers support it. So Jon’s point there is important. The promise of unifying logs/traces/metrics usually (aka wide events) is far more important in the tracing side o
No strong opinion on particular choice of metrics library.My primary feedback is that if we swap metrics implementations and the new values are *different*, we can anticipate broad user confusion/interest.In particular if latency stats are reported higher post-upgrade, we should expect users to int
I really like the idea of integrating tracing, metrics and logging frameworks.I would like to have the time to look closely at the API before we decide to adopt it though. I agree that a widely deployed API has inherent benefits, but any API we adopt also shapes future evolution of our capabilities
Thank you for the replies.
Dmitry: Based on some other patches you've worked on and your explanation
here, it looks like you're optimizing the front door portion of write path
- very cool. Testing it in isolation with those settings makes sense if
your goal is to push write throughput as far as y
> if the plan is to rip out something old and unmaintained and replace with
> something new, I think there's a huge win to be had by implementing the
> standard that everyone's using now.
Strong +1 on anything that's an ecosystem integration inflection point. The
added benefit here is that if we
Hi Jon
>> Is there a specific workload you're running where you're seeing it take
up a significant % of CPU time? Could you share some metrics, profile
data, or a workload so I can try to reproduce your findings?
Yes, I have shared the workload generation command (sorry, it is in
cassandra-stres
Some quick thoughts of my own…
=== Performance ===
- I have seen heap dumps with > 1GiB dedicated to metric counters. This patch
should improve this, while opening up room to cut it further, steeply.
- The performance improvement in relative terms for the metrics being replaced
is rather dramati
I've got a few thoughts...
On the performance side, I took a look at a few CPU profiles from past
benchmarks and I'm seeing DropWizard taking ~ 3% of CPU time. Is there a
specific workload you're running where you're seeing it take up a
significant % of CPU time? Could you share some metrics, pr
17 matches
Mail list logo