Hi Jon, answers below On Fri, Dec 12, 2025 at 2:19 AM Jon Haddad <[email protected]> wrote: > > +1 to including it, conceptually. It's easily the best tool for diagnosing > perf issues that I've used. I've got a few questions / thoughts about > implementation details & user ergonomics. > > - Capturing call stacks in modern kernels require some params to be set. Are > we going to be able to check the requirements are met and give the user > feedback?
Indeed, we go to inform a user on two occasions. First, the check will be executed in the context of Startup Checks "framework" we already have in place in Cassandra, reading respective parameters from /proc and a message will be logged if values of these parameters are not "ideal". We do not go to fail the startup if they are not though. Just a warning, because a user can always set it while Cassandra runs. No need to _fail_ the startup. However, later on, if you go to profile via "nodetool profile start" and these two are not set as they should be we will fail and inform a user that they need to set them first. > - Profiling in containers is a little weird [1]. Same type of issue as my > first point. I have run this in a container (Docker Compose) and I just did not need to do anything. It just ... worked. I think this will be on a user to ensure all is in place if anything special is needed. We are also not dealing with any "pids" here as profiling is running in JVM via AsyncProfiler API. (2) > - Getting allocation profiles requires debug symbols. More ergonomics. That is an old recommendation in the context of Cassandra 6.0 this lands in, no? Which runs on 11+. They say "Prior to JDK 11" which does not happen here. https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingModes.md#installing-debug-symbols > - The profiler moves a lot faster than we do. Are we going to bump the async > profiler in bug fix C* releases or are we freezing the version? I would update major versions of async profiler only in major versions of Cassandra. Patch versions of AsyncProfiler might be updated within patch versions of Cassandra. That makes the most sense to me. If you want to use something more recent without Cassandra providing it first, you can basically do this and it should just work. > - Can I still attach using the asprof tool? Will there be an issue if I > attach a newer version of the profiler? As said, the fact whether we can profile in Cassandra via in-built profiler is driven by a system property, defaults to false. When set to false, that means the logic which would check kernel parameters or which would instantiate the AsyncProfiler object (as shown in (2)) would not be exercised at all. Hence nothing "async-related" would be instantiated in Cassandra etc. Then you can just take the async profiler as you know it and run bin/asprof for Cassandra's PID as you are used to. That also answers what happens if you use a newer version - it would act the very same way. > - Are we relocating the jars, or does Corretto? The current patch does it in such a way that we are depending on AsyncProfiler and it will be eventually included in release tarball. So if you start Cassandra, that library will be on the class path (even though until a system property is set to true which enables it, it will not be possible to use it and it is not in any way instantiated or initialized, it is also not possible to enable it in runtime). (1) https://github.com/apache/cassandra/blob/1b6e538c98db4287795692b7df88aa4940c3a7af/doc/modules/cassandra/pages/managing/operating/async-profiler.adoc#using-a-different-async-profiler-version (2) https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#example-usage-with-the-api > > Thanks! > Jon > > [1] > https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingInContainer.md > > On Thu, Dec 11, 2025 at 1:12 PM Josh McKenzie <[email protected]> wrote: >> >> If we expose whatever API the 3rd party has and they drift or break it in >> the future, we could introduce a shim that would keep prior ergonomics at >> that time w/sane defaults or graceful handling of removals. >> >> Think "manager" is referring to the sidecar here. >> >> On Thu, Dec 11, 2025, at 2:03 PM, Štefan Miklošovič wrote: >> >> Can you help me to understand what you mean by that? I have a feeling >> I am missing something here or we are not on the same page. >> >> When it comes to API, we are not touching anything already there. We >> expose this through brand new >> org.apache.cassandra.profiler.AsyncProfilerMBean. >> >> So we are not really breaking anything here? >> >> I am also not completely sure what you meant by "manager", what >> manager? Is that some terminology from your work or something we have >> here? Genuinely asking what you mean by that, I am lost a bit here. >> >> If you mean that "we start to call AsyncProfiler and then in later >> versions these guys decide that they will change how it is called" I >> do not think that is really an issue here, is it? A user does not deal >> with that directly anyway at all, only via MBean and there will >> presumably always be a way to start and stop profiling, that is >> basically at the very core of what that library is doing, no? >> >> On Thu, Dec 11, 2025 at 7:03 PM David Capwell <[email protected]> wrote: >> > >> > If disabled, which is default, >> > >> > >> > I def won’t block on this, I just want us to think about these possible >> > problems before we touch a public API; ill leave it to >> > author(s)/reviewer(s). >> > >> > One thing that has been brought up in a different context is if we can >> > make breaking changes to public facing APIs if the thing is disabled by >> > default (debug tables is the example); I personally don’t have clarity >> > here for the project so hard to say. >> > >> > TL;DR I am +0 >> > >> > On Dec 11, 2025, at 3:30 AM, Štefan Miklošovič <[email protected]> >> > wrote: >> > >> > Oh wow! Thanks Dmitry for all these references. I think that the fact >> > Corretto includes that into JDK is the testament of the quality. >> > >> > David, I hope this answers your concerns pretty much? >> > >> > On Thu, Dec 11, 2025 at 12:27 PM Dmitry Konstantinov <[email protected]> >> > wrote: >> > >> > >> > + 1 from my side >> > >> > 1) it is well known mature profiling tool, there are other cases when >> > Apache projects embedded it, for example: >> > - https://issues.apache.org/jira/browse/HADOOP-18055 >> > - https://issues.apache.org/jira/browse/HBASE-29045 >> > - https://issues.apache.org/jira/browse/FLINK-33325 >> > 2) Apache-2.0 license >> > 3) the dependency has a small size (less than 1Mb) and does not have >> > transitive dependencies to other 3rd parties >> > 4) the main contributors are now in Amazon, it is even included into >> > Corretto JDK now >> > (https://aws.amazon.com/about-aws/whats-new/2025/10/amazon-corretto-october-2025-quarterly-updates/ >> > ) >> > 5) the logic is disabled by default, so no impact if you do not use it. >> > >> > >> > On Wed, 10 Dec 2025 at 18:08, Štefan Miklošovič <[email protected]> >> > wrote: >> > >> > >> > This capability is disabled by default, it is driven by a system >> > property you have to set to true in order to be able to get an >> > instance of AsyncProfiler which does the actual profiling. If >> > disabled, which is default, then any calls via nodetool which needs >> > AsyncProfiler (start, stop, status) would return a message that >> > profiling is not enabled. >> > >> > Not sure if this answers your concerns but without knowingly turning >> > it on nothing happens. >> > >> > On Wed, Dec 10, 2025 at 6:28 PM David Capwell <[email protected]> wrote: >> > >> > >> > I have no issues adding it. I think my only real comment would be the >> > same as with manager; w/e we expose to the public api (in this case >> > Nodetool) we have to support, so if a 3rd party lib breaks compatibility >> > that puts us in a bind if we didn’t think about that up front. >> > >> > Having async-profiler exposed makes it easier to profile is a good thing. >> > Manager has (or is in the process of adding) API auth so we can lock down >> > async-profiler to those “allowed” but do we have similar in Nodetool? We >> > had an issue in the past that async-profiler would trigger a JVM crash >> > (JVM bug), so we had to limit calls to it until it was fixed. >> > >> > On Dec 10, 2025, at 9:00 AM, Štefan Miklošovič <[email protected]> >> > wrote: >> > >> > Worth to mention that we were also contemplating about the inclusion >> > of jfr-convert so a user can also convert raw JFR files to e.g. HTML >> > with heatmaps but we evaluated that it is not necessary. Sure, it >> > would be comfortable, but ultimately not needed. Conversion of such a >> > file via nodetool, on server side, is just not a good idea, it is not >> > a job of a server to convert anything. >> > >> > In majority of cases, people using the profiler just want to get a >> > HTML with cpu / allocation profile, it can even gather JFR files as >> > such and fetch it is, it is just that the conversion as such can >> > happen on client's side instead. >> > >> > I am +1 for introducing the core async profiler library only. >> > >> > On Wed, Dec 10, 2025 at 5:46 PM Bernardo Botella >> > <[email protected]> wrote: >> > >> > >> > Hi everyone! >> > >> > I’d like to propose adding the async-profiler library to the Cassandra >> > project. This will enable us to add a new nodetool command to do profiling >> > tasks on the process running Cassandra. This information can be useful to >> > debug a wide range of potential issues and performance optimizations. >> > CASSANDRA-20854 captures the effort and the details of the proposal, and >> > this PR proposes its implementation. >> > >> > I want to note that this feature was already discussed in this thread, and >> > this one only want to make sure that no one has any concerns about adding >> > the library as a dependency. >> > >> > What is async-profiler? >> > async-profiler is a low overhead sampling profiler for Java that does not >> > suffer from the Safepoint bias problem. It features HotSpot-specific API >> > to collect stack traces and to track memory allocations. The profiler >> > works with OpenJDK and other Java runtimes based on the HotSpot JVM. >> > >> > Unlike traditional Java profilers, async-profiler monitors non-Java >> > threads (e.g., GC and JIT compiler threads) and shows native and kernel >> > frames in stack traces. >> > >> > What can be profiled: >> > >> > CPU time >> > Allocations in Java Heap >> > Native memory allocations and leaks >> > Contended locks >> > Hardware and software performance counters like cache misses, page faults, >> > context switches >> > and more. >> > >> > >> > We propose to add async-profiler 4.2 as a dependency to Cassandra. >> > >> > Any concerns? >> > Bernardo >> > >> > >> > >> > >> > >> > -- >> > Dmitry Konstantinov >> > >> > >> >>
