> The fact o3 used "Bus-factor" as a dimension is just amazing.
Yeah - that got me too.
On Fri, Jun 13, 2025, at 2:38 PM, Jon Haddad wrote:
> I'd be very happy to see async-profiler included with C* I've made extensive
> use of it in my performance evaluations [1][2], and even posted a video about
> it [3] for general Java perf analysis (among others). It's part of
> easy-cass-lab and is easily the most informative tool I've found for the
> getting to the bottom of anything performance related.
>
> There's probably a good case to be made for including it with the C* artifact
> as well as having it be something you can drop in. I lean towards including
> it all the time, but I haven't run it this way myself yet, so there might be
> some downside I'm unaware of.
>
> When you call the asprof executable, it attaches the async-profiler to the
> running jvm using jattach [4]. We could do this as well, if we wanted to
> avoid including it with the release, but I don't know how much we really
> benefit from that. I've run into issues with it when it's unable to detatch
> correctly, then you're unable to reattach it until after the server is
> restarted. On the flip side, I don't know if you're able to set up all the
> same options for arbitrary profiling when it's loaded as an agent and turned
> on/off dynamically. I think we can, based on the integration page [6], but I
> haven't tried it yet. It would be a bummer if we only had a single mode of
> profiling available.
>
> The default mode, CPU profiling, is fantastic, but I've also made extensive
> use of allocation profiling [5] to identify perf issues as well so having
> that available is a must, imo. Wall clock / off cpu profiling is great for
> identifying when IO is the root cause, which isn't clearly revealed by on-cpu
> profiling due to the way threads are scheduled. When I look at a system I
> typically do CPU / Wall / Alloc / Off-CPU to be thorough, and the last thing
> you want to do is have to restart between each one. You can also specify
> specific Java methods, include or exclude frames matching specific regex, and
> a whole slew of other options. The latest version even supports continuous
> profiling with heatmaps although I haven't tried it yet.
>
> So hopefully the option we go with allows all of that, otherwise the limits
> would impose more of a headache to me as I'd need to remove it and continue
> to bring my own.
>
> Under the hood, the async-profiler uses Linux perf events + asynchronous
> polling of the java stack to match them up and generate it's reports. As a
> result, it requires certain permissions to run and get all the details I
> like. Specifically these kernel parameters:
>
> sudo sysctl kernel.perf_event_paranoid=1
> sudo sysctl kernel.kptr_restrict=0
>
> You also need to enable some capabilities for off-cpu profiliing:
>
> sudo find /usr/lib/jvm/ -type f -name 'java' -exec setcap
> "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" {} \;
>
> Then you can do off-cpu with this wild cryptic version (shout out to Andrei
> Pangin for helping me with this [7]):
>
> asprof -e kprobe:schedule -i 2 --cstack dwarf -X '*Unsafe.park*' "${@:2}" $PID
>
> There's also some subtle issues when it's run in a container, since by
> default you don't have access to the perf_event_open syscall. Just something
> to keep in mind. This is one of my main grievances with container
> deployments.
>
> Indeed Patrick, I am very happy to see this discussion! Thanks Doug for
> starting the thread.
>
> Jon
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-15452
> [2] https://issues.apache.org/jira/browse/CASSANDRA-19477
> [3]
> https://www.youtube.com/watch?v=yNZtnzjyJRI&t=212s&pp=ygUOYXN5bmMgcHJvZmlsZXI%3D
> [4]
> https://github.com/async-profiler/async-profiler/blob/2b556680dc8f5d02c3f26ac119d835dc2381e604/src/jattach/jattach_hotspot.c#L38
> [5] https://issues.apache.org/jira/browse/CASSANDRA-20428
> [6]
> https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md
> [7] https://github.com/async-profiler/async-profiler/issues/907
>
>
> On Fri, Jun 13, 2025 at 10:18 AM Patrick McFadin <[email protected]> wrote:
>> The fact o3 used "Bus-factor" as a dimension is just amazing.
>>
>> After reading more about the project, the possibilities are pretty
>> interesting. I suspect we'll see this in a Haddad talk soon.
>>
>> On Fri, Jun 13, 2025 at 1:57 AM Josh McKenzie <[email protected]> wrote:
>>> __
>>> I was curious if o3 (model from OpenAI) would be able to do a deep dive
>>> health check on a repo to assist in considering taking it as a dependency.
>>> The results can be found here:
>>> https://chatgpt.com/share/684be703-1d4c-8002-b831-f997f829f4b4
>>>
>>> Apparently it can, and can do it quite well. This was a useful time saver
>>> (and honestly did a better job than I usually can in > 10x the time)
>>>
>>> I'm +1 to taking this as a dependency on the lib in core C*. The rest of
>>> the ecosystem can consume it (more easily if we move to a cassandra-shared
>>> regime shared library build as well), and it opens up some interesting
>>> opportunities for us in both how we test core C* proper and what we expose
>>> in tooling.
>>>
>>> On Thu, Jun 12, 2025, at 7:36 PM, Paulo Motta wrote:
>>>> I'd prefer to avoid calling an external process and use the library if
>>>> possible. Not sure about including it in the project by default, but also
>>>> not against.
>>>>
>>>> If there's contention about including it, I wonder if it would make sense
>>>> to explore java's optional module extension[1] to make this available
>>>> optionally ? I can see this being useful for other extensions if we
>>>> haven't explored that option.
>>>>
>>>> Then we could have another project cassandra-sidecar-extensions (or
>>>> similar) that would be linked by sidecar/advanced operators to enable
>>>> extended featureset in the main process.
>>>>
>>>>
>>>> [1] -
>>>> https://openjdk.org/projects/jigsaw/doc/topics/optional.html
>>>>
>>>> On Thu, 12 Jun 2025 at 17:57 Doug Rohrer <[email protected]> wrote:
>>>>> Hey folks!
>>>>>
>>>>> We're looking into enabling the sidecar to collect async profiles from
>>>>> Cassandra and, digging through the async-profiler code and usage, it
>>>>> seems like there may be a few different ways to do it. I’m curious if
>>>>> other folks have already done this beyond just “run asprof with the pid
>>>>> of the Cassandra process”, as I’m a bit hesitant to depend on executing
>>>>> an external process from the Sidecar to gather the actual profile if we
>>>>> can avoid it.
>>>>>
>>>>> There seem to be some opportunities to integrate the profiler into
>>>>> another project (see
>>>>> https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#using-java-api)
>>>>> but it seems this would end up having to be part of Cassandra, and
>>>>> somehow callable via the sidecar (JMX? Some virtual table interface where
>>>>> you insert a row to start a profile with the profiler options, and it
>>>>> kicks off the profile, dumping the results into the table when it’s
>>>>> done?).
>>>>>
>>>>> The benefit in putting this functionality into Cassandra would be that
>>>>> other consumers (in-jvm dtests, python dtests, other monitoring systems
>>>>> where Sidecar isn’t available, easy-cass-lab) would be able to leverage
>>>>> the same interface rather than having to re-invent the wheel each time.
>>>>>
>>>>> Drawback is it’s another library, and one with native library
>>>>> dependencies, added to the class path and loaded at runtime.
>>>>>
>>>>> Thoughts? Previous experiences (good or bad)?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Doug
>>>