> A problem I've seen elsewhere with one > process trying to manage different kinds of workloads is that if you > need to scale up one kind of workload you may be required to scale them > all up and run head first into some kind of resource starvation issue. This is a really good point. If the resource consumption by a CDC process grows in correlation w/data ingestion on the C* node, we would be in for A Bad Time.
@Bernardo - do we resource constrain the CommitLog reading and reporting to some kind of ceiling so the CDC consumption just falls behind and the sidecar can otherwise keep making forward progress on its other more critical operations? And/or have internal scheduling and prioritization to help facilitate that? On Tue, Oct 14, 2025, at 5:24 PM, Joel Shepherd wrote: > Thanks for all the additional light shed. A couple more > comments/questions interleaved below ... > > On 10/9/2025 12:31 PM, Maxim Muzafarov wrote: > > Isaac, > >> CEP-38 is looking to offer an alternative to JMX for single-node > >> management operations, whereas Sidecar is focused on holistic > >> cluster-level operations. > > Thank you for the summary.You have a perfect understanding of the > > CEP-38's purpose, and I share your vision for the Apache Sidecar. So I > > think that both CEP-38 and Sidecar complement each other perfectly as > > a single product. > > Yes, that's a really helpful distinction. CQL Management API operates at > the node level; Sidecar operates (or is intended to be used?) at cluster > level. > > When I re-read CEP-38, I also noticed that CQL management commands (e.g. > EXECUTE) are expected to be sent on a separate port from plain old CQL > (DDL/DML), so that helps limit the surface area for both. Maxim, I > curious about at what point in the request handling and execution the > Management API and existing CQL API will branch. E.g. are they going to > share the same parser? Aside from permissions, is there going to be > code-level enforcement that CQL-for-management can't be accepted through > the existing CQL port? > > What I'm wondering about are the layers of protection against a > misconfigured or buggy cluster allowing an ordinary user to successfully > invoke management CQL through the existing CQL port. > > > On Thu, 9 Oct 2025 at 21:09, Josh McKenzie <[email protected]> wrote: > >> A distinction that resonated with me: > >> > >> Control Plane = Sidecar > >> Data Plane = DB > >> > >> I think that's directionally true, but there's no objective definition of > >> what qualifies as one plane or the other. > > It's really hazy. You could argue that CREATE TABLE or CREATE KEYSPACE > are control plane operations because in some sense they're allocating or > managing resources ... but it's also totally reasonable to consider any > DDL/DML as a data plane operation, and consider process, network, file, > jvm, etc., management to be control plane. > > Where does CDC sit? Functionally it's probably part of the data plane. I > believe Sidecar has or plans to have some built-in support for CDC > (CEP-44). I'm wondering out loud about whether there are operational > risks with having the same process trying to push change records into > Kafka as fast as the node is producing them, and remaining available for > executing things like long-running control plane workflows (e.g., > backup-restore, restarts, etc.). A problem I've seen elsewhere with one > process trying to manage different kinds of workloads is that if you > need to scale up one kind of workload you may be required to scale them > all up and run head first into some kind of resource starvation issue. > > I realize there a desire to not require users to deploy and run a bunch > of different processes on each node to get Cassandra to work, and maybe > the different workloads in Sidecar can be sandboxed in a way that > prevents one workload from starving the rest of CPU time, IO, etc. > > Thanks -- Joel. > > >> On top of that, the sidecar is in a unique position where it supports > >> functionality across multiple versions of C*, so if you're looking to > >> implement something with a unified interface that may differ in > >> implementation across multiple versions of C* (say, if you're running a > >> large fleet w/different versions in it), there's pressure there driving > >> certain functionality into the sidecar. > >> > >> On Thu, Oct 9, 2025, at 1:42 PM, Isaac Reath wrote: > >> > >> I don't have too much of an insight on CQL as a whole, but I can offer my > >> views on Sidecar & the CQL Management API. > >> > >> In terms of a rubric for what belongs in Sidecar, I think taking > >> inspiration from CEP-1, it should be functionality needed to manage a > >> Cassandra cluster. My perspective on how this fits in with the CQL > >> Management API (and authors of CEP-38 please correct me if I'm wrong), is > >> that CEP-38 is looking to offer an alternative to JMX for single-node > >> management operations, whereas Sidecar is focused on holistic > >> cluster-level operations. > >> > >> Using rebuild as an example from CEP-38: a user can invoke the CQL > >> Management API to run a rebuild for a single node, but would rely on > >> Sidecar to rebuild an entire datacenter, with Sidecar in turn calling the > >> CQL Management API on individual nodes. Similarly, a user could use the > >> CQL Management API to update the configurations which are able to be > >> changed without a restart (similar to how nodetool setconcurrency does > >> today), but Sidecar would provide a single interface to update all > >> configurations, including those which require restarts. Additionally, > >> Sidecar will support operations which may not involve the CQL Management > >> API at all, such as live instance migration as laid out in CEP-40. > >> > >> Happy to hear other perspectives on this. > >> Isaac > >> > >> On Wed, Oct 8, 2025 at 3:02 PM Joel Shepherd <[email protected]> wrote: > >> > >> To clarify, since I was pinged directly about this ... > >> > >> It's not my intent to impede any of the efforts listed below and I > >> apologize if it sounded otherwise. > >> > >> I am deeply curious and interested in the eventual scope/charter of CQL, > >> CQL Admin APIs, and Sidecar. A little overlap is probably unavoidable > >> but IMO it would be detrimental to the project overall to not have a > >> clear scope for each area. If those scopes have already been defined, > >> I'd love pointers to decisions so I can get it straight in my head. If > >> they haven't and the community is comfortable with that, okay too. If > >> they haven't and anyone else is a little squirmy about that, what's the > >> right way to drive a conversation? > >> > >> Thanks -- Joel. > >> > >> On 10/7/2025 4:57 PM, Joel Shepherd wrote: > >>> Thanks for the clarifications on CEP-38, Maxim: I actually got some > >>> insights from your comments below that had slipped by me while reading > >>> the CEP. > >>> > >>> I want to fork the thread a bit, so breaking this off from the CEP-38 > >>> DISCUSS thread. > >>> > >>> If I can back away a bit and squint ... It seems to me that there are > >>> three initiatives floating around at the moment that could make > >>> Cassandra more awesome and manageable, or make it confusing and complex. > >>> > >>> 1) Patrick McFadin's proposal (as presented at CoC) to align CQL > >>> syntax/semantics closely with PostgreSQL's. I haven't heard anyone > >>> strongly object, but have heard several expressions of surprise. Maybe > >>> something is already in the works, but I'd love to see and discuss a > >>> proposal for this, so there's consensus that it's a good idea and (if > >>> needed) guidelines on how to evolve CQL in that direction. > >>> > >>> 2) CQL management API (CEP-38): As mentioned in the CEP, it'll take > >>> some time to implement all the functionality that could be in scope of > >>> this CEP. I wonder if it'd be beneficial to have some kind of rubric > >>> or guidelines for deciding what kind of things make sense to manage > >>> via CQL, and what don't. For example, skimming through the PostgreSQL > >>> management commands, many of them look like they could be thin > >>> wrappers over SQL executed against "private" tables and views in the > >>> database. I don't know that that is how they are implemented, but many > >>> of the commands are ultimately just setting a value, or reading and > >>> returning values that could potentially be managed in tables/views of > >>> some sort. (E.g., like Cassandra virtual tables). That seems to fit > >>> pretty neatly with preserving SQL as a declarative, data independent > >>> language for data access, with limited side-effects. Is that a useful > >>> filter for determining what kinds of things can be managed via CQL > >>> management, and which should be handled elsewhere? E.g., is a > >>> filesystem operation like nodetool scrub a good candidate for CQL > >>> management or not? (I'd vote not: interested in what others think.) > >>> > >>> 3) Cassandra Sidecar: Like the CQL management API, I wonder if it'd be > >>> beneficial to have a rubric for deciding what kinds of things make > >>> sense to go into Sidecar. The recent discussion about CEP-55 > >>> (generated role names) landed on implementing the functionality both > >>> as a CQL statement and as a Sidecar API. There's also activity around > >>> using SIdecar for rolling restarts, backup and restore, etc.: control > >>> plane activities that are largely orthogonal to interacting with the > >>> data. Should operations that are primarily generating or manipulating > >>> data be available via Sidecar to give folks the option of invoking > >>> them via CQL or HTTP/REST, or would Sidecar benefit from having a more > >>> narrowly scope charter (e.g. data-agnostic control plane operations only)? > >>> > >>> I think all of these tools -- CQL, CQL Management API and Sidecar -- > >>> will be more robust, easier to use, and easier to maintain if we have > >>> a consistent way of deciding where a given feature should live, and a > >>> minimal number of choices for accessing the feature. Orthogonal > >>> controls. Since Sidecar and CQL Management API are pretty new, it's a > >>> good time to clarify their charter to ensure they evolve well > >>> together. And to get consensus on the long-term direction for CQL. > >>> > >>> Let me know if I can help -- Joel. > >>> > >>> > >>> On 10/7/2025 12:22 PM, Maxim Muzafarov wrote: > >>>> Hello Folks, > >>>> > >>>> > >>>> First of all, thank you for your comments. Your feedback motivates me > >>>> to implement these changes and refine the final result to the highest > >>>> standard. To keep the vote thread clean, I'm addressing your questions > >>>> in the discussion thread. > >>>> > >>>> The vote is here: > >>>> https://lists.apache.org/thread/zmgvo2ty5nqvlz1xccsls2kcrgnbjh5v > >>>> > >>>> > >>>> = The idea: = > >>>> > >>>> First, let me focus on the general idea, and then I will answer your > >>>> questions in more detail. > >>>> > >>>> The main focus is on introducing a new API (CQL) to invoke the same > >>>> node management commands. While this has an indirect effect on tooling > >>>> (cqlsh, nodetool), the tooling itself is not the main focus. The scope > >>>> (or Phase 1) of the initial changes is narrowed down only to the API > >>>> only, to ensure the PR remains reviewable. > >>>> > >>>> This implies the following: > >>>> - the nodetool commands and the way they are implemented won't change > >>>> - the nodetool commands will be accessible via CQL, their > >>>> implementation will not change (and the execution locality) > >>>> - this change introduces ONLY a new way of how management commands > >>>> will be invoked > >>>> - this change is not about the tooling (cqlsh, nodetool), it will help > >>>> them evolve, however > >>>> - these changes are being introduced as an experimental API with a > >>>> feature flag, disabled by default > >>>> > >>>> > >>>> = The answers: = > >>>> > >>>>> how will the new CQL API behave if the user does not specify a hostname? > >>>> The changes only affect the API part; improvements to the tooling will > >>>> follow later. The command is executed on the node that the client is > >>>> connected to. > >>>> Note also that the port differs from 9042 (default) as a new > >>>> management port will be introduced. See examples here [1]. > >>>> > >>>> cqlsh 10.20.88.164 11211 -u myusername -p mypassword > >>>> nodetool -h 10.20.88.164 -p 8081 -u myusername -pw mypassword > >>>> > >>>> If a host is not specified, the cli tool will attempt to connect to > >>>> localhost. I suppose. > >>>> > >>>> > >>>>> My understanding is that commands like nodetool bootstrap typically run > >>>>> on a single node. > >>>> This is correct; however, as I don't control the implementation of the > >>>> command, it may actually involve communication with other nodes. This > >>>> is actually not part of this CEP. I'm only reusing the commands we > >>>> already have. > >>>> > >>>> > >>>>> Will we continue requiring users to specify a hostname/port explicitly, > >>>>> or will the CQL API be responsible for orchestrating the command safely > >>>>> across the entire cluster or datacenter? > >>>> It seems that you are confusing the API with the tooling. The tooling > >>>> (cqlsh, nodetool) will continue to work as it does now. I am only > >>>> adding a new way in which commands can be invoked - CQL, > >>>> orchestration, however, is the subject of other projects. Cassandra > >>>> Sidecar? > >>>> > >>>> > >>>>> It might, however, be worth verifying that the proposed CQL syntax > >>>>> aligns with PostgreSQL conventions, and adjusting it if needed for > >>>>> cross-compatibility. > >>>> It's a bit new info to me that we're targeting PostgreSQL as the main > >>>> reference and drifting towards the invoking management operations the > >>>> same way. I'm inclined to agree that the syntax should probably be > >>>> similar, more or less, however. > >>>> > >>>> We are introducing a new CQL syntax in a minimal and isolated manner. > >>>> The CEP-38 defines a small set of management-oriented CQL statements > >>>> (EXECUTE COMMAND / DESCRIBE COMMAND) that can be used to match all > >>>> existing nodetool commands at once, introducing further aliases as an > >>>> option. This eliminates the need to introduce a new antlr grammar for > >>>> each management operation. > >>>> > >>>> The command execution syntax is the main thing that users interact > >>>> with in this CEP, but I'm taking a more relaxed approach to it for the > >>>> following reasons: > >>>> - the tip of the iceberg, the unification of the JMX, CQL and possible > >>>> REST API for Cassandra is priority; > >>>> - the feature will be in experimental state in the major release, we > >>>> need collect the real feedback from users and their deployments; > >>>> - the aliasing will be used for some important commands like > >>>> compaction, bootstrap; > >>>> > >>>> Taking all of the above into account, I still think it's important to > >>>> reach an agreement, or at least to avoid objections. > >>>> So, I've checked the PostgreSQL and SQL standards to identify areas of > >>>> alignment. The latter I think is relatively easy to support as > >>>> aliases. > >>>> > >>>> > >>>> The syntax proposed in the CEP: > >>>> > >>>> EXECUTE COMMAND forcecompact WITH keyspace=distributed_test_keyspace > >>>> AND table=tbl AND keys=["k4", "k2", "k7"]; > >>>> > >>>> Other Cassandra-style options that I had previously considered: > >>>> > >>>> 1. EXECUTE COMMAND forcecompact (keyspace=distributed_test_keyspace, > >>>> table=tbl, keys=["k4", "k2", "k7"]); > >>>> 2. EXECUTE COMMAND forcecompact WITH ARGS {"keyspace": > >>>> "distributed_test_keyspace", "table": "tbl", "keys":["k4", "k2", > >>>> "k7"]}; > >>>> > >>>> With the postgresql context [2] it could look like: > >>>> > >>>> COMPACT (keys=["k4", "k2", "k7"]) distributed_test_keyspace.tbl; > >>>> > >>>> The SQL-standard [3][4] procedural approach: > >>>> > >>>> CALL system_mgmt.forcecompact( > >>>> keyspace => 'distributed_test_keyspace', > >>>> table => 'tbl', > >>>> keys => ['k4','k2','k7'], > >>>> options => { "parallel": 2, "verbose": true } > >>>> ); > >>>> > >>>> > >>>> Please let me know if you have any questions, or if you would like us > >>>> to arrange a call to discuss all the details. > >>>> > >>>> > >>>> [1]https://www.instaclustr.com/support/documentation/cassandra/using-cassandra/connect-to-cassandra-with-cqlsh/ > >>>> [2]https://www.postgresql.org/docs/current/sql-vacuum.html > >>>> [3]https://en.wikipedia.org/wiki/Stored_procedure?utm_source=chatgpt.com#Implementation > >>>> [4]https://www.postgresql.org/docs/9.3/functions-admin.html > >>>> > >>>> > >> >
