Re: [DISCUSS] CASSSIDECAR-254 - Enabling sidecar to collect async profiles

Yaman Ziadeh (BLOOMBERG/ 919 3RD A) Mon, 16 Jun 2025 17:05:58 -0700

Thanks everyone for your inputs! 

I'm looking to work on this, and will circle back with any recommendations or 
discussion points moving forward - excited to get this into C*!

From: [email protected] At: 06/13/25 14:40:24 UTC-4:00To:  
[email protected]
Subject: Re: [DISCUSS] CASSSIDECAR-254 - Enabling sidecar to collect async 
profiles

I'd be very happy to see async-profiler included with C*  I've  made extensive 
use of it in my performance evaluations [1][2], and even posted a video about 
it [3] for general Java perf analysis (among others).  It's part of 
easy-cass-lab and is easily the most informative tool I've found for the 
getting to the bottom of anything performance related.

There's probably a good case to be made for including it with the C* artifact 
as well as  having it be something you can drop in. I lean towards including it 
all the time, but I haven't run it this way myself yet, so there might be some 
downside I'm unaware of.
When you  call the asprof executable, it attaches the async-profiler to the  
running jvm using jattach [4].  We could do this as well, if we wanted to  
avoid including it with the release, but I don't know how much we really 
benefit from that.  I've run into issues with it when it's unable to detatch 
correctly, then you're unable to reattach it until after the server is 
restarted.  On the flip side, I don't know if you're able to set up all the 
same options for arbitrary profiling when it's loaded as an agent and turned 
on/off dynamically.  I think we can, based on the integration page [6], but I 
haven't tried it yet.  It would be a bummer if we only had a single mode of 
profiling available.  

The default mode, CPU profiling, is fantastic, but I've also made extensive use 
of allocation profiling [5] to identify perf issues as well so having that 
available is a must, imo. Wall clock / off cpu profiling is great for 
identifying when IO is the root cause, which isn't clearly revealed by on-cpu 
profiling due to the way threads are scheduled.  When I look at a system I 
typically do CPU / Wall / Alloc / Off-CPU to be thorough, and the last thing 
you want to do is have to restart between each one.  You can also specify 
specific Java methods, include or exclude frames matching specific regex, and a 
whole slew of other options.  The latest version even supports continuous 
profiling with heatmaps although I haven't tried it yet.  

So hopefully the option we go with allows all of that, otherwise the limits 
would impose more of a headache to me as I'd need to remove it and continue to 
bring my own.

Under the hood, the async-profiler uses Linux perf events + asynchronous 
polling of the java stack to match them up and generate it's reports.  As a 
result, it requires certain permissions to run and get all the details I like.  
Specifically these kernel parameters:

sudo sysctl kernel.perf_event_paranoid=1
sudo sysctl kernel.kptr_restrict=0

You also need to enable some capabilities for off-cpu profiliing:

sudo find /usr/lib/jvm/ -type f -name 'java' -exec setcap 
"cap_perfmon,cap_sys_ptrace,cap_syslog=ep" {} \;

Then you can do off-cpu with this wild cryptic version (shout out to Andrei 
Pangin for helping me with this [7]):

asprof -e kprobe:schedule -i 2 --cstack dwarf -X '*Unsafe.park*' "${@:2}" $PID

There's also some subtle issues when it's run in a container, since by default 
you don't have access to the perf_event_open syscall.  Just something to keep 
in mind.  This is one of my main grievances with container deployments.

Indeed Patrick, I am very happy to see this discussion!  Thanks Doug for 
starting the thread.

Jon

[1] https://issues.apache.org/jira/browse/CASSANDRA-15452
[2] https://issues.apache.org/jira/browse/CASSANDRA-19477
[3] 
https://www.youtube.com/watch?v=yNZtnzjyJRI&t=212s&pp=ygUOYXN5bmMgcHJvZmlsZXI%3D
[4] 
https://github.com/async-profiler/async-profiler/blob/2b556680dc8f5d02c3f26ac119d835dc2381e604/src/jattach/jattach_hotspot.c#L38
[5] https://issues.apache.org/jira/browse/CASSANDRA-20428
[6] 
https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md
[7] https://github.com/async-profiler/async-profiler/issues/907

On Fri, Jun 13, 2025 at 10:18 AM Patrick McFadin <[email protected]> wrote:

The fact o3 used "Bus-factor" as a dimension is just amazing. 

After reading more about the project, the possibilities are pretty interesting. 
I suspect we'll see this in a Haddad talk soon. 
On Fri, Jun 13, 2025 at 1:57 AM Josh McKenzie <[email protected]> wrote:

I was curious if o3 (model from OpenAI) would be able to do a deep dive health 
check on a repo to assist in considering taking it as a dependency. The results 
can be found here: 
https://chatgpt.com/share/684be703-1d4c-8002-b831-f997f829f4b4

Apparently it can, and can do it quite well. This was a useful time saver (and 
honestly did a better job than I usually can in > 10x the time)

I'm +1 to taking this as a dependency on the lib in core C*. The rest of the 
ecosystem can consume it (more easily if we move to a cassandra-shared regime 
shared library build as well), and it opens up some interesting opportunities 
for us in both how we test core C* proper and what we expose in tooling.

On Thu, Jun 12, 2025, at 7:36 PM, Paulo Motta wrote:
I'd prefer to avoid calling an external process and use the library if 
possible. Not sure about including it in the project by default, but also not 
against.

If there's contention about including it, I wonder if it would make sense to 
explore  java's optional module extension[1] to make this available optionally 
? I can see this being useful for other extensions if we haven't explored that 
option.

Then we could have another project cassandra-sidecar-extensions (or similar) 
that would be linked by sidecar/advanced operators to enable extended 
featureset in the main process.

[1] - 
https://openjdk.org/projects/jigsaw/doc/topics/optional.html

On Thu, 12 Jun 2025 at 17:57 Doug Rohrer <[email protected]> wrote:
Hey folks!

We're looking into enabling the sidecar to collect async profiles from 
Cassandra and, digging through the async-profiler code and usage, it seems like 
there may be a few different ways to do it. I’m curious if other folks have 
already done this beyond just “run asprof with the pid of the Cassandra 
process”, as I’m a bit hesitant to depend on executing an external process from 
the Sidecar to gather the actual profile if we can avoid it.

There seem to be some opportunities to integrate the profiler into another 
project (see 
https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#using-java-api)
 but it seems this would end up having to be part of Cassandra, and somehow 
callable via the sidecar (JMX? Some virtual table interface where you insert a 
row to start a profile with the profiler options, and it kicks off the profile, 
dumping the results into the table when it’s done?).

The benefit in putting this functionality into Cassandra would be that other 
consumers (in-jvm dtests, python dtests, other monitoring systems where Sidecar 
isn’t available, easy-cass-lab) would be able to leverage the same interface 
rather than having to re-invent the wheel each time.

Drawback is it’s another library, and one with native library dependencies, 
added to the class path and loaded at runtime.

Thoughts? Previous experiences (good or bad)?

Thanks,

Doug

Re: [DISCUSS] CASSSIDECAR-254 - Enabling sidecar to collect async profiles

Reply via email to