Re: Orthogonality (WAS: Re: [DISCUSSION] CEP-38: CQL Management API)

Patrick McFadin Fri, 17 Oct 2025 21:25:42 -0700

This is starting to sound more and more like a k8s operator as we are going
along here.


On Wed, Oct 15, 2025 at 9:02 AM Isaac Reath <[email protected]> wrote:

> I could also see a future where C* Sidecar manages multiple sub-processes
> to to help alleviate the challenges of needing to run multiple different
> sidecars, each configured for a subset of features (e.g., one configured to
> provide SSTable access for analytics, one configured for CDC, one
> configured for managing cluster operations).
>
>
> On Wed, Oct 15, 2025 at 10:28 AM Dinesh Joshi <[email protected]> wrote:
>
>> The C* Sidecar is built with modules. One could deploy specialized
>> instances of Sidecar which only publish CDC streams. The point I’m making
>> is that just because the code lives in a single repo and we have a single
>> artifact doesn’t necessarily mean the user has to enable all the
>> functionality at runtime.
>>
>> On Wed, Oct 15, 2025 at 7:24 AM Josh McKenzie <[email protected]>
>> wrote:
>>
>>> A problem I've seen elsewhere with one
>>> process trying to manage different kinds of workloads is that if you
>>> need to scale up one kind of workload you may be required to scale them
>>> all up and run head first into some kind of resource starvation issue.
>>>
>>> This is a really good point. If the resource consumption by a CDC
>>> process grows in correlation w/data ingestion on the C* node, we would be
>>> in for A Bad Time.
>>>
>>> @Bernardo - do we resource constrain the CommitLog reading and reporting
>>> to some kind of ceiling so the CDC consumption just falls behind and the
>>> sidecar can otherwise keep making forward progress on its other more
>>> critical operations? And/or have internal scheduling and prioritization to
>>> help facilitate that?
>>>
>>> On Tue, Oct 14, 2025, at 5:24 PM, Joel Shepherd wrote:
>>>
>>> Thanks for all the additional light shed. A couple more
>>> comments/questions interleaved below ...
>>>
>>> On 10/9/2025 12:31 PM, Maxim Muzafarov wrote:
>>> > Isaac,
>>> >> CEP-38 is looking to offer an alternative to JMX for single-node
>>> management operations, whereas Sidecar is focused on holistic cluster-level
>>> operations.
>>> > Thank you for the summary.You have a perfect understanding of the
>>> > CEP-38's purpose, and I share your vision for the Apache Sidecar. So I
>>> > think that both CEP-38 and Sidecar complement each other perfectly as
>>> > a single product.
>>>
>>> Yes, that's a really helpful distinction. CQL Management API operates at
>>> the node level; Sidecar operates (or is intended to be used?) at cluster
>>> level.
>>>
>>> When I re-read CEP-38, I also noticed that CQL management commands (e.g.
>>> EXECUTE) are expected to be sent on a separate port from plain old CQL
>>> (DDL/DML), so that helps limit the surface area for both. Maxim, I
>>> curious about at what point in the request handling and execution the
>>> Management API and existing CQL API will branch. E.g. are they going to
>>> share the same parser? Aside from permissions, is there going to be
>>> code-level enforcement that CQL-for-management can't be accepted through
>>> the existing CQL port?
>>>
>>> What I'm wondering about are the layers of protection against a
>>> misconfigured or buggy cluster allowing an ordinary user to successfully
>>> invoke management CQL through the existing CQL port.
>>>
>>> > On Thu, 9 Oct 2025 at 21:09, Josh McKenzie <[email protected]>
>>> wrote:
>>> >> A distinction that resonated with me:
>>> >>
>>> >> Control Plane = Sidecar
>>> >> Data Plane = DB
>>> >>
>>> >> I think that's directionally true, but there's no objective
>>> definition of what qualifies as one plane or the other.
>>>
>>> It's really hazy. You  could argue that CREATE TABLE or CREATE KEYSPACE
>>> are control plane operations because in some sense they're allocating or
>>> managing resources ... but it's also totally reasonable to consider any
>>> DDL/DML as a data plane operation, and consider process, network, file,
>>> jvm, etc., management to be control plane.
>>>
>>> Where does CDC sit? Functionally it's probably part of the data plane. I
>>> believe Sidecar has or plans to have some built-in support for CDC
>>> (CEP-44). I'm wondering out loud about whether there are operational
>>> risks with having the same process trying to push change records into
>>> Kafka as fast as the node is producing them, and remaining available for
>>> executing things like long-running control plane workflows (e.g.,
>>> backup-restore, restarts, etc.). A problem I've seen elsewhere with one
>>> process trying to manage different kinds of workloads is that if you
>>> need to scale up one kind of workload you may be required to scale them
>>> all up and run head first into some kind of resource starvation issue.
>>>
>>> I realize there a desire to not require users to deploy and run a bunch
>>> of different processes on each node to get Cassandra to work, and maybe
>>> the different workloads in Sidecar can be sandboxed in a way that
>>> prevents one workload from starving the rest of CPU time, IO, etc.
>>>
>>> Thanks -- Joel.
>>>
>>> >> On top of that, the sidecar is in a unique position where it supports
>>> functionality across multiple versions of C*, so if you're looking to
>>> implement something with a unified interface that may differ in
>>> implementation across multiple versions of C* (say, if you're running a
>>> large fleet w/different versions in it), there's pressure there driving
>>> certain functionality into the sidecar.
>>> >>
>>> >> On Thu, Oct 9, 2025, at 1:42 PM, Isaac Reath wrote:
>>> >>
>>> >> I don't have too much of an insight on CQL as a whole, but I can
>>> offer my views on Sidecar & the CQL Management API.
>>> >>
>>> >> In terms of a rubric for what belongs in Sidecar, I think taking
>>> inspiration from CEP-1, it should be functionality needed to manage a
>>> Cassandra cluster. My perspective on how this fits in with the CQL
>>> Management API (and authors of CEP-38 please correct me if I'm wrong), is
>>> that CEP-38 is looking to offer an alternative to JMX for single-node
>>> management operations, whereas Sidecar is focused on holistic cluster-level
>>> operations.
>>> >>
>>> >> Using rebuild as an example from CEP-38: a user can invoke the CQL
>>> Management API to run a rebuild for a single node, but would rely on
>>> Sidecar to rebuild an entire datacenter, with Sidecar in turn calling the
>>> CQL Management API on individual nodes.  Similarly, a user could use the
>>> CQL Management API to update the configurations which are able to be
>>> changed without a restart (similar to how nodetool setconcurrency does
>>> today), but Sidecar would provide a single interface to update all
>>> configurations, including those which require restarts. Additionally,
>>> Sidecar will support operations which may not involve the CQL Management
>>> API at all, such as live instance migration as laid out in CEP-40.
>>> >>
>>> >> Happy to hear other perspectives on this.
>>> >> Isaac
>>> >>
>>> >> On Wed, Oct 8, 2025 at 3:02 PM Joel Shepherd <[email protected]>
>>> wrote:
>>> >>
>>> >> To clarify, since I was pinged directly about this ...
>>> >>
>>> >> It's not my intent to impede any of the efforts listed below and I
>>> >> apologize if it sounded otherwise.
>>> >>
>>> >> I am deeply curious and interested in the eventual scope/charter of
>>> CQL,
>>> >> CQL Admin APIs, and Sidecar. A little overlap is probably unavoidable
>>> >> but IMO it would be detrimental to the project overall to not have a
>>> >> clear scope for each area. If those scopes have already been defined,
>>> >> I'd love pointers to decisions so I can get it straight in my head. If
>>> >> they haven't and the community is comfortable with that, okay too. If
>>> >> they haven't and anyone else is a little squirmy about that, what's
>>> the
>>> >> right way to drive a conversation?
>>> >>
>>> >> Thanks -- Joel.
>>> >>
>>> >> On 10/7/2025 4:57 PM, Joel Shepherd wrote:
>>> >>> Thanks for the clarifications on CEP-38, Maxim: I actually got some
>>> >>> insights from your comments below that had slipped by me while
>>> reading
>>> >>> the CEP.
>>> >>>
>>> >>> I want to fork the thread a bit, so breaking this off from the CEP-38
>>> >>> DISCUSS thread.
>>> >>>
>>> >>> If I can back away a bit and squint ... It seems to me that there are
>>> >>> three initiatives floating around at the moment that could make
>>> >>> Cassandra more awesome and manageable, or make it confusing and
>>> complex.
>>> >>>
>>> >>> 1) Patrick McFadin's proposal (as presented at CoC) to align CQL
>>> >>> syntax/semantics closely with PostgreSQL's. I haven't heard anyone
>>> >>> strongly object, but have heard several expressions of surprise.
>>> Maybe
>>> >>> something is already in the works, but I'd love to see and discuss a
>>> >>> proposal for this, so there's consensus that it's a good idea and (if
>>> >>> needed) guidelines on how to evolve CQL in that direction.
>>> >>>
>>> >>> 2) CQL management API (CEP-38): As mentioned in the CEP, it'll take
>>> >>> some time to implement all the functionality that could be in scope
>>> of
>>> >>> this CEP. I wonder if it'd be beneficial to have some kind of rubric
>>> >>> or guidelines for deciding what kind of things make sense to manage
>>> >>> via CQL, and what don't. For example, skimming through the PostgreSQL
>>> >>> management commands, many of them look like they could be thin
>>> >>> wrappers over SQL executed against "private" tables and views in the
>>> >>> database. I don't know that that is how they are implemented, but
>>> many
>>> >>> of the commands are ultimately just setting a value, or reading and
>>> >>> returning values that could potentially be managed in tables/views of
>>> >>> some sort. (E.g., like Cassandra virtual tables). That seems to fit
>>> >>> pretty neatly with preserving SQL as a declarative, data independent
>>> >>> language for data access, with limited side-effects. Is that a useful
>>> >>> filter for determining what kinds of things can be managed via CQL
>>> >>> management, and which should be handled elsewhere? E.g., is a
>>> >>> filesystem operation like nodetool scrub a good candidate for CQL
>>> >>> management or not? (I'd vote not: interested in what others think.)
>>> >>>
>>> >>> 3) Cassandra Sidecar: Like the CQL management API, I wonder if it'd
>>> be
>>> >>> beneficial to have a rubric for deciding what kinds of things make
>>> >>> sense to go into Sidecar. The recent discussion about CEP-55
>>> >>> (generated role names) landed on implementing the functionality both
>>> >>> as a CQL statement and as a Sidecar API. There's also activity around
>>> >>> using SIdecar for rolling restarts, backup and restore, etc.: control
>>> >>> plane activities that are largely orthogonal to interacting with the
>>> >>> data. Should operations that are primarily generating or manipulating
>>> >>> data be available via Sidecar to give folks the option of invoking
>>> >>> them via CQL or HTTP/REST, or would Sidecar benefit from having a
>>> more
>>> >>> narrowly scope charter (e.g. data-agnostic control plane operations
>>> only)?
>>> >>>
>>> >>> I think all of these tools -- CQL, CQL Management API and Sidecar --
>>> >>> will be more robust, easier to use, and easier to maintain if we have
>>> >>> a consistent way of deciding where a given feature should live, and a
>>> >>> minimal number of choices for accessing the feature. Orthogonal
>>> >>> controls. Since Sidecar and CQL Management API are pretty new, it's a
>>> >>> good time to clarify their charter to ensure they evolve well
>>> >>> together. And to get consensus on the long-term direction for CQL.
>>> >>>
>>> >>> Let me know if I can help -- Joel.
>>> >>>
>>> >>>
>>> >>> On 10/7/2025 12:22 PM, Maxim Muzafarov wrote:
>>> >>>> Hello Folks,
>>> >>>>
>>> >>>>
>>> >>>> First of all, thank you for your comments. Your feedback motivates
>>> me
>>> >>>> to implement these changes and refine the final result to the
>>> highest
>>> >>>> standard. To keep the vote thread clean, I'm addressing your
>>> questions
>>> >>>> in the discussion thread.
>>> >>>>
>>> >>>> The vote is here:
>>> >>>> https://lists.apache.org/thread/zmgvo2ty5nqvlz1xccsls2kcrgnbjh5v
>>> >>>>
>>> >>>>
>>> >>>> = The idea: =
>>> >>>>
>>> >>>> First, let me focus on the general idea, and then I will answer your
>>> >>>> questions in more detail.
>>> >>>>
>>> >>>> The main focus is on introducing a new API (CQL) to invoke the same
>>> >>>> node management commands. While this has an indirect effect on
>>> tooling
>>> >>>> (cqlsh, nodetool), the tooling itself is not the main focus. The
>>> scope
>>> >>>> (or Phase 1) of the initial changes is narrowed down only to the API
>>> >>>> only, to ensure the PR remains reviewable.
>>> >>>>
>>> >>>> This implies the following:
>>> >>>> - the nodetool commands and the way they are implemented won't
>>> change
>>> >>>> - the nodetool commands will be accessible via CQL, their
>>> >>>> implementation will not change (and the execution locality)
>>> >>>> - this change introduces ONLY a new way of how management commands
>>> >>>> will be invoked
>>> >>>> - this change is not about the tooling (cqlsh, nodetool), it will
>>> help
>>> >>>> them evolve, however
>>> >>>> - these changes are being introduced as an experimental API with a
>>> >>>> feature flag, disabled by default
>>> >>>>
>>> >>>>
>>> >>>> = The answers: =
>>> >>>>
>>> >>>>> how will the new CQL API behave if the user does not specify a
>>> hostname?
>>> >>>> The changes only affect the API part; improvements to the tooling
>>> will
>>> >>>> follow later. The command is executed on the node that the client is
>>> >>>> connected to.
>>> >>>> Note also that the port differs from 9042 (default) as a new
>>> >>>> management port will be introduced. See examples here [1].
>>> >>>>
>>> >>>> cqlsh 10.20.88.164 11211 -u myusername -p mypassword
>>> >>>> nodetool -h 10.20.88.164 -p 8081 -u myusername -pw mypassword
>>> >>>>
>>> >>>> If a host is not specified, the cli tool will attempt to connect to
>>> >>>> localhost. I suppose.
>>> >>>>
>>> >>>>
>>> >>>>> My understanding is that commands like nodetool bootstrap
>>> typically run on a single node.
>>> >>>> This is correct; however, as I don't control the implementation of
>>> the
>>> >>>> command, it may actually involve communication with other nodes.
>>> This
>>> >>>> is actually not part of this CEP. I'm only reusing the commands we
>>> >>>> already have.
>>> >>>>
>>> >>>>
>>> >>>>> Will we continue requiring users to specify a hostname/port
>>> explicitly, or will the CQL API be responsible for orchestrating the
>>> command safely across the entire cluster or datacenter?
>>> >>>> It seems that you are confusing the API with the tooling. The
>>> tooling
>>> >>>> (cqlsh, nodetool) will continue to work as it does now. I am only
>>> >>>> adding a new way in which commands can be invoked - CQL,
>>> >>>> orchestration, however, is the subject of other projects. Cassandra
>>> >>>> Sidecar?
>>> >>>>
>>> >>>>
>>> >>>>> It might, however, be worth verifying that the proposed CQL syntax
>>> aligns with PostgreSQL conventions, and adjusting it if needed for
>>> cross-compatibility.
>>> >>>> It's a bit new info to me that we're targeting PostgreSQL as the
>>> main
>>> >>>> reference and drifting towards the invoking management operations
>>> the
>>> >>>> same way. I'm inclined to agree that the syntax should probably be
>>> >>>> similar, more or less, however.
>>> >>>>
>>> >>>> We are introducing a new CQL syntax in a minimal and isolated
>>> manner.
>>> >>>> The CEP-38 defines a small set of management-oriented CQL statements
>>> >>>> (EXECUTE COMMAND / DESCRIBE COMMAND) that can be used to match all
>>> >>>> existing nodetool commands at once, introducing further aliases as
>>> an
>>> >>>> option. This eliminates the need to introduce a new antlr grammar
>>> for
>>> >>>> each management operation.
>>> >>>>
>>> >>>> The command execution syntax is the main thing that users interact
>>> >>>> with in this CEP, but I'm taking a more relaxed approach to it for
>>> the
>>> >>>> following reasons:
>>> >>>> - the tip of the iceberg, the unification of the JMX, CQL and
>>> possible
>>> >>>> REST API for Cassandra is priority;
>>> >>>> - the feature will be in experimental state in the major release, we
>>> >>>> need collect the real feedback from users and their deployments;
>>> >>>> - the aliasing will be used for some important commands like
>>> >>>> compaction, bootstrap;
>>> >>>>
>>> >>>> Taking all of the above into account, I still think it's important
>>> to
>>> >>>> reach an agreement, or at least to avoid objections.
>>> >>>> So, I've checked the PostgreSQL and SQL standards to identify areas
>>> of
>>> >>>> alignment. The latter I think is relatively easy to support as
>>> >>>> aliases.
>>> >>>>
>>> >>>>
>>> >>>> The syntax proposed in the CEP:
>>> >>>>
>>> >>>> EXECUTE COMMAND forcecompact WITH keyspace=distributed_test_keyspace
>>> >>>> AND table=tbl AND keys=["k4", "k2", "k7"];
>>> >>>>
>>> >>>> Other Cassandra-style options that I had previously considered:
>>> >>>>
>>> >>>> 1. EXECUTE COMMAND forcecompact (keyspace=distributed_test_keyspace,
>>> >>>> table=tbl, keys=["k4", "k2", "k7"]);
>>> >>>> 2. EXECUTE COMMAND forcecompact WITH ARGS {"keyspace":
>>> >>>> "distributed_test_keyspace", "table": "tbl", "keys":["k4", "k2",
>>> >>>> "k7"]};
>>> >>>>
>>> >>>> With the postgresql context [2] it could look like:
>>> >>>>
>>> >>>> COMPACT (keys=["k4", "k2", "k7"]) distributed_test_keyspace.tbl;
>>> >>>>
>>> >>>> The SQL-standard [3][4] procedural approach:
>>> >>>>
>>> >>>> CALL system_mgmt.forcecompact(
>>> >>>>     keyspace => 'distributed_test_keyspace',
>>> >>>>     table    => 'tbl',
>>> >>>>     keys     => ['k4','k2','k7'],
>>> >>>>     options  => { "parallel": 2, "verbose": true }
>>> >>>> );
>>> >>>>
>>> >>>>
>>> >>>> Please let me know if you have any questions, or if you would like
>>> us
>>> >>>> to arrange a call to discuss all the details.
>>> >>>>
>>> >>>>
>>> >>>> [1]
>>> https://www.instaclustr.com/support/documentation/cassandra/using-cassandra/connect-to-cassandra-with-cqlsh/
>>> >>>> [2]https://www.postgresql.org/docs/current/sql-vacuum.html
>>> >>>> [3]
>>> https://en.wikipedia.org/wiki/Stored_procedure?utm_source=chatgpt.com#Implementation
>>> >>>> [4]https://www.postgresql.org/docs/9.3/functions-admin.html
>>> >>>>
>>> >>>>
>>> >>
>>>
>>>
>>>

Re: Orthogonality (WAS: Re: [DISCUSSION] CEP-38: CQL Management API)

Reply via email to