This is starting to sound more and more like a k8s operator as we are going along here.
On Wed, Oct 15, 2025 at 9:02 AM Isaac Reath <[email protected]> wrote: > I could also see a future where C* Sidecar manages multiple sub-processes > to to help alleviate the challenges of needing to run multiple different > sidecars, each configured for a subset of features (e.g., one configured to > provide SSTable access for analytics, one configured for CDC, one > configured for managing cluster operations). > > > On Wed, Oct 15, 2025 at 10:28 AM Dinesh Joshi <[email protected]> wrote: > >> The C* Sidecar is built with modules. One could deploy specialized >> instances of Sidecar which only publish CDC streams. The point I’m making >> is that just because the code lives in a single repo and we have a single >> artifact doesn’t necessarily mean the user has to enable all the >> functionality at runtime. >> >> On Wed, Oct 15, 2025 at 7:24 AM Josh McKenzie <[email protected]> >> wrote: >> >>> A problem I've seen elsewhere with one >>> process trying to manage different kinds of workloads is that if you >>> need to scale up one kind of workload you may be required to scale them >>> all up and run head first into some kind of resource starvation issue. >>> >>> This is a really good point. If the resource consumption by a CDC >>> process grows in correlation w/data ingestion on the C* node, we would be >>> in for A Bad Time. >>> >>> @Bernardo - do we resource constrain the CommitLog reading and reporting >>> to some kind of ceiling so the CDC consumption just falls behind and the >>> sidecar can otherwise keep making forward progress on its other more >>> critical operations? And/or have internal scheduling and prioritization to >>> help facilitate that? >>> >>> On Tue, Oct 14, 2025, at 5:24 PM, Joel Shepherd wrote: >>> >>> Thanks for all the additional light shed. A couple more >>> comments/questions interleaved below ... >>> >>> On 10/9/2025 12:31 PM, Maxim Muzafarov wrote: >>> > Isaac, >>> >> CEP-38 is looking to offer an alternative to JMX for single-node >>> management operations, whereas Sidecar is focused on holistic cluster-level >>> operations. >>> > Thank you for the summary.You have a perfect understanding of the >>> > CEP-38's purpose, and I share your vision for the Apache Sidecar. So I >>> > think that both CEP-38 and Sidecar complement each other perfectly as >>> > a single product. >>> >>> Yes, that's a really helpful distinction. CQL Management API operates at >>> the node level; Sidecar operates (or is intended to be used?) at cluster >>> level. >>> >>> When I re-read CEP-38, I also noticed that CQL management commands (e.g. >>> EXECUTE) are expected to be sent on a separate port from plain old CQL >>> (DDL/DML), so that helps limit the surface area for both. Maxim, I >>> curious about at what point in the request handling and execution the >>> Management API and existing CQL API will branch. E.g. are they going to >>> share the same parser? Aside from permissions, is there going to be >>> code-level enforcement that CQL-for-management can't be accepted through >>> the existing CQL port? >>> >>> What I'm wondering about are the layers of protection against a >>> misconfigured or buggy cluster allowing an ordinary user to successfully >>> invoke management CQL through the existing CQL port. >>> >>> > On Thu, 9 Oct 2025 at 21:09, Josh McKenzie <[email protected]> >>> wrote: >>> >> A distinction that resonated with me: >>> >> >>> >> Control Plane = Sidecar >>> >> Data Plane = DB >>> >> >>> >> I think that's directionally true, but there's no objective >>> definition of what qualifies as one plane or the other. >>> >>> It's really hazy. You could argue that CREATE TABLE or CREATE KEYSPACE >>> are control plane operations because in some sense they're allocating or >>> managing resources ... but it's also totally reasonable to consider any >>> DDL/DML as a data plane operation, and consider process, network, file, >>> jvm, etc., management to be control plane. >>> >>> Where does CDC sit? Functionally it's probably part of the data plane. I >>> believe Sidecar has or plans to have some built-in support for CDC >>> (CEP-44). I'm wondering out loud about whether there are operational >>> risks with having the same process trying to push change records into >>> Kafka as fast as the node is producing them, and remaining available for >>> executing things like long-running control plane workflows (e.g., >>> backup-restore, restarts, etc.). A problem I've seen elsewhere with one >>> process trying to manage different kinds of workloads is that if you >>> need to scale up one kind of workload you may be required to scale them >>> all up and run head first into some kind of resource starvation issue. >>> >>> I realize there a desire to not require users to deploy and run a bunch >>> of different processes on each node to get Cassandra to work, and maybe >>> the different workloads in Sidecar can be sandboxed in a way that >>> prevents one workload from starving the rest of CPU time, IO, etc. >>> >>> Thanks -- Joel. >>> >>> >> On top of that, the sidecar is in a unique position where it supports >>> functionality across multiple versions of C*, so if you're looking to >>> implement something with a unified interface that may differ in >>> implementation across multiple versions of C* (say, if you're running a >>> large fleet w/different versions in it), there's pressure there driving >>> certain functionality into the sidecar. >>> >> >>> >> On Thu, Oct 9, 2025, at 1:42 PM, Isaac Reath wrote: >>> >> >>> >> I don't have too much of an insight on CQL as a whole, but I can >>> offer my views on Sidecar & the CQL Management API. >>> >> >>> >> In terms of a rubric for what belongs in Sidecar, I think taking >>> inspiration from CEP-1, it should be functionality needed to manage a >>> Cassandra cluster. My perspective on how this fits in with the CQL >>> Management API (and authors of CEP-38 please correct me if I'm wrong), is >>> that CEP-38 is looking to offer an alternative to JMX for single-node >>> management operations, whereas Sidecar is focused on holistic cluster-level >>> operations. >>> >> >>> >> Using rebuild as an example from CEP-38: a user can invoke the CQL >>> Management API to run a rebuild for a single node, but would rely on >>> Sidecar to rebuild an entire datacenter, with Sidecar in turn calling the >>> CQL Management API on individual nodes. Similarly, a user could use the >>> CQL Management API to update the configurations which are able to be >>> changed without a restart (similar to how nodetool setconcurrency does >>> today), but Sidecar would provide a single interface to update all >>> configurations, including those which require restarts. Additionally, >>> Sidecar will support operations which may not involve the CQL Management >>> API at all, such as live instance migration as laid out in CEP-40. >>> >> >>> >> Happy to hear other perspectives on this. >>> >> Isaac >>> >> >>> >> On Wed, Oct 8, 2025 at 3:02 PM Joel Shepherd <[email protected]> >>> wrote: >>> >> >>> >> To clarify, since I was pinged directly about this ... >>> >> >>> >> It's not my intent to impede any of the efforts listed below and I >>> >> apologize if it sounded otherwise. >>> >> >>> >> I am deeply curious and interested in the eventual scope/charter of >>> CQL, >>> >> CQL Admin APIs, and Sidecar. A little overlap is probably unavoidable >>> >> but IMO it would be detrimental to the project overall to not have a >>> >> clear scope for each area. If those scopes have already been defined, >>> >> I'd love pointers to decisions so I can get it straight in my head. If >>> >> they haven't and the community is comfortable with that, okay too. If >>> >> they haven't and anyone else is a little squirmy about that, what's >>> the >>> >> right way to drive a conversation? >>> >> >>> >> Thanks -- Joel. >>> >> >>> >> On 10/7/2025 4:57 PM, Joel Shepherd wrote: >>> >>> Thanks for the clarifications on CEP-38, Maxim: I actually got some >>> >>> insights from your comments below that had slipped by me while >>> reading >>> >>> the CEP. >>> >>> >>> >>> I want to fork the thread a bit, so breaking this off from the CEP-38 >>> >>> DISCUSS thread. >>> >>> >>> >>> If I can back away a bit and squint ... It seems to me that there are >>> >>> three initiatives floating around at the moment that could make >>> >>> Cassandra more awesome and manageable, or make it confusing and >>> complex. >>> >>> >>> >>> 1) Patrick McFadin's proposal (as presented at CoC) to align CQL >>> >>> syntax/semantics closely with PostgreSQL's. I haven't heard anyone >>> >>> strongly object, but have heard several expressions of surprise. >>> Maybe >>> >>> something is already in the works, but I'd love to see and discuss a >>> >>> proposal for this, so there's consensus that it's a good idea and (if >>> >>> needed) guidelines on how to evolve CQL in that direction. >>> >>> >>> >>> 2) CQL management API (CEP-38): As mentioned in the CEP, it'll take >>> >>> some time to implement all the functionality that could be in scope >>> of >>> >>> this CEP. I wonder if it'd be beneficial to have some kind of rubric >>> >>> or guidelines for deciding what kind of things make sense to manage >>> >>> via CQL, and what don't. For example, skimming through the PostgreSQL >>> >>> management commands, many of them look like they could be thin >>> >>> wrappers over SQL executed against "private" tables and views in the >>> >>> database. I don't know that that is how they are implemented, but >>> many >>> >>> of the commands are ultimately just setting a value, or reading and >>> >>> returning values that could potentially be managed in tables/views of >>> >>> some sort. (E.g., like Cassandra virtual tables). That seems to fit >>> >>> pretty neatly with preserving SQL as a declarative, data independent >>> >>> language for data access, with limited side-effects. Is that a useful >>> >>> filter for determining what kinds of things can be managed via CQL >>> >>> management, and which should be handled elsewhere? E.g., is a >>> >>> filesystem operation like nodetool scrub a good candidate for CQL >>> >>> management or not? (I'd vote not: interested in what others think.) >>> >>> >>> >>> 3) Cassandra Sidecar: Like the CQL management API, I wonder if it'd >>> be >>> >>> beneficial to have a rubric for deciding what kinds of things make >>> >>> sense to go into Sidecar. The recent discussion about CEP-55 >>> >>> (generated role names) landed on implementing the functionality both >>> >>> as a CQL statement and as a Sidecar API. There's also activity around >>> >>> using SIdecar for rolling restarts, backup and restore, etc.: control >>> >>> plane activities that are largely orthogonal to interacting with the >>> >>> data. Should operations that are primarily generating or manipulating >>> >>> data be available via Sidecar to give folks the option of invoking >>> >>> them via CQL or HTTP/REST, or would Sidecar benefit from having a >>> more >>> >>> narrowly scope charter (e.g. data-agnostic control plane operations >>> only)? >>> >>> >>> >>> I think all of these tools -- CQL, CQL Management API and Sidecar -- >>> >>> will be more robust, easier to use, and easier to maintain if we have >>> >>> a consistent way of deciding where a given feature should live, and a >>> >>> minimal number of choices for accessing the feature. Orthogonal >>> >>> controls. Since Sidecar and CQL Management API are pretty new, it's a >>> >>> good time to clarify their charter to ensure they evolve well >>> >>> together. And to get consensus on the long-term direction for CQL. >>> >>> >>> >>> Let me know if I can help -- Joel. >>> >>> >>> >>> >>> >>> On 10/7/2025 12:22 PM, Maxim Muzafarov wrote: >>> >>>> Hello Folks, >>> >>>> >>> >>>> >>> >>>> First of all, thank you for your comments. Your feedback motivates >>> me >>> >>>> to implement these changes and refine the final result to the >>> highest >>> >>>> standard. To keep the vote thread clean, I'm addressing your >>> questions >>> >>>> in the discussion thread. >>> >>>> >>> >>>> The vote is here: >>> >>>> https://lists.apache.org/thread/zmgvo2ty5nqvlz1xccsls2kcrgnbjh5v >>> >>>> >>> >>>> >>> >>>> = The idea: = >>> >>>> >>> >>>> First, let me focus on the general idea, and then I will answer your >>> >>>> questions in more detail. >>> >>>> >>> >>>> The main focus is on introducing a new API (CQL) to invoke the same >>> >>>> node management commands. While this has an indirect effect on >>> tooling >>> >>>> (cqlsh, nodetool), the tooling itself is not the main focus. The >>> scope >>> >>>> (or Phase 1) of the initial changes is narrowed down only to the API >>> >>>> only, to ensure the PR remains reviewable. >>> >>>> >>> >>>> This implies the following: >>> >>>> - the nodetool commands and the way they are implemented won't >>> change >>> >>>> - the nodetool commands will be accessible via CQL, their >>> >>>> implementation will not change (and the execution locality) >>> >>>> - this change introduces ONLY a new way of how management commands >>> >>>> will be invoked >>> >>>> - this change is not about the tooling (cqlsh, nodetool), it will >>> help >>> >>>> them evolve, however >>> >>>> - these changes are being introduced as an experimental API with a >>> >>>> feature flag, disabled by default >>> >>>> >>> >>>> >>> >>>> = The answers: = >>> >>>> >>> >>>>> how will the new CQL API behave if the user does not specify a >>> hostname? >>> >>>> The changes only affect the API part; improvements to the tooling >>> will >>> >>>> follow later. The command is executed on the node that the client is >>> >>>> connected to. >>> >>>> Note also that the port differs from 9042 (default) as a new >>> >>>> management port will be introduced. See examples here [1]. >>> >>>> >>> >>>> cqlsh 10.20.88.164 11211 -u myusername -p mypassword >>> >>>> nodetool -h 10.20.88.164 -p 8081 -u myusername -pw mypassword >>> >>>> >>> >>>> If a host is not specified, the cli tool will attempt to connect to >>> >>>> localhost. I suppose. >>> >>>> >>> >>>> >>> >>>>> My understanding is that commands like nodetool bootstrap >>> typically run on a single node. >>> >>>> This is correct; however, as I don't control the implementation of >>> the >>> >>>> command, it may actually involve communication with other nodes. >>> This >>> >>>> is actually not part of this CEP. I'm only reusing the commands we >>> >>>> already have. >>> >>>> >>> >>>> >>> >>>>> Will we continue requiring users to specify a hostname/port >>> explicitly, or will the CQL API be responsible for orchestrating the >>> command safely across the entire cluster or datacenter? >>> >>>> It seems that you are confusing the API with the tooling. The >>> tooling >>> >>>> (cqlsh, nodetool) will continue to work as it does now. I am only >>> >>>> adding a new way in which commands can be invoked - CQL, >>> >>>> orchestration, however, is the subject of other projects. Cassandra >>> >>>> Sidecar? >>> >>>> >>> >>>> >>> >>>>> It might, however, be worth verifying that the proposed CQL syntax >>> aligns with PostgreSQL conventions, and adjusting it if needed for >>> cross-compatibility. >>> >>>> It's a bit new info to me that we're targeting PostgreSQL as the >>> main >>> >>>> reference and drifting towards the invoking management operations >>> the >>> >>>> same way. I'm inclined to agree that the syntax should probably be >>> >>>> similar, more or less, however. >>> >>>> >>> >>>> We are introducing a new CQL syntax in a minimal and isolated >>> manner. >>> >>>> The CEP-38 defines a small set of management-oriented CQL statements >>> >>>> (EXECUTE COMMAND / DESCRIBE COMMAND) that can be used to match all >>> >>>> existing nodetool commands at once, introducing further aliases as >>> an >>> >>>> option. This eliminates the need to introduce a new antlr grammar >>> for >>> >>>> each management operation. >>> >>>> >>> >>>> The command execution syntax is the main thing that users interact >>> >>>> with in this CEP, but I'm taking a more relaxed approach to it for >>> the >>> >>>> following reasons: >>> >>>> - the tip of the iceberg, the unification of the JMX, CQL and >>> possible >>> >>>> REST API for Cassandra is priority; >>> >>>> - the feature will be in experimental state in the major release, we >>> >>>> need collect the real feedback from users and their deployments; >>> >>>> - the aliasing will be used for some important commands like >>> >>>> compaction, bootstrap; >>> >>>> >>> >>>> Taking all of the above into account, I still think it's important >>> to >>> >>>> reach an agreement, or at least to avoid objections. >>> >>>> So, I've checked the PostgreSQL and SQL standards to identify areas >>> of >>> >>>> alignment. The latter I think is relatively easy to support as >>> >>>> aliases. >>> >>>> >>> >>>> >>> >>>> The syntax proposed in the CEP: >>> >>>> >>> >>>> EXECUTE COMMAND forcecompact WITH keyspace=distributed_test_keyspace >>> >>>> AND table=tbl AND keys=["k4", "k2", "k7"]; >>> >>>> >>> >>>> Other Cassandra-style options that I had previously considered: >>> >>>> >>> >>>> 1. EXECUTE COMMAND forcecompact (keyspace=distributed_test_keyspace, >>> >>>> table=tbl, keys=["k4", "k2", "k7"]); >>> >>>> 2. EXECUTE COMMAND forcecompact WITH ARGS {"keyspace": >>> >>>> "distributed_test_keyspace", "table": "tbl", "keys":["k4", "k2", >>> >>>> "k7"]}; >>> >>>> >>> >>>> With the postgresql context [2] it could look like: >>> >>>> >>> >>>> COMPACT (keys=["k4", "k2", "k7"]) distributed_test_keyspace.tbl; >>> >>>> >>> >>>> The SQL-standard [3][4] procedural approach: >>> >>>> >>> >>>> CALL system_mgmt.forcecompact( >>> >>>> keyspace => 'distributed_test_keyspace', >>> >>>> table => 'tbl', >>> >>>> keys => ['k4','k2','k7'], >>> >>>> options => { "parallel": 2, "verbose": true } >>> >>>> ); >>> >>>> >>> >>>> >>> >>>> Please let me know if you have any questions, or if you would like >>> us >>> >>>> to arrange a call to discuss all the details. >>> >>>> >>> >>>> >>> >>>> [1] >>> https://www.instaclustr.com/support/documentation/cassandra/using-cassandra/connect-to-cassandra-with-cqlsh/ >>> >>>> [2]https://www.postgresql.org/docs/current/sql-vacuum.html >>> >>>> [3] >>> https://en.wikipedia.org/wiki/Stored_procedure?utm_source=chatgpt.com#Implementation >>> >>>> [4]https://www.postgresql.org/docs/9.3/functions-admin.html >>> >>>> >>> >>>> >>> >> >>> >>> >>>
