Hi Francisco, Thank you for your response, these explanations are very helpful but I think I need to go through again the documentation and the codebase to understand properly.
Thanks, Samson On Wed, Oct 15, 2025 at 10:49 PM Francisco Guerrero <[email protected]> wrote: > Hi Samson, > > I can speak for the Cassandra Sidecar and the way it has been designed. > Cassandra Sidecar is a process that is meant to be run in the same host/pod > as your Cassandra process(es). Cassandra Sidecar has been built with > security in mind, and for that reason the Sidecar, as a server, supports > TLS, JWT Authentication, and Mutual TLS Authentication. You can > configure a server certificate and a truststore. > > As a client, when connecting to Cassandra (via JMX or CQL), it also > supports TLS for these connection, as well as support for Cassandra's > authentication models such as password-based authentication or > Mutual TLS authentication. For Sidecar-to-Sidecar communication, > Sidecar supports JWT/mTLS also. > > To be able to take advantage of TLS/mTLS based authentication, you > will need to have a certification authority that is able to vend > certificates > that can be used by the Sidecar process to act as a server as well as > client. > If you already have a way to vend these certificates, I think it becomes > more trivial to integrate Sidecar with full mTLS support. > > The Cassandra Sidecar also has Role-Based Access Control (RBAC), > the way RBAC has been implemented in the Sidecar is by modeling it > based on Cassandra's permissioning system. Assuming you are using > mTLS this means that you can have an identity configured in your > Cassandra database. You could use a client certificate to authenticate > against the database, but also you can use the same certificate to > authenticate against the Sidecar. > Sidecar will only allow that user/identity to perform operations that is > allowed to perform. So if that identity can only access a single keyspace > for > example, then Sidecar will only allow this identity to access that single > keyspace. > > The Sidecar model works well for bare metal configurations, as well as > Kube, and hybrid models. > > Hopefully this helps clarify how Apache Cassandra Sidecar works, but > I'm more than happy to elaborate on this topic. > > Best, > - Francisco > > On 2025/10/15 22:44:58 mapyourown wrote: > > Sharing my perspective > > > > In many recent discussions and feature proposals, the Cassandra community > > seems to be moving more toward a sidecar-based architecture. While this > > direction has its advantages, there are some practical challenges from an > > operational standpoint. > > > > Most Cassandra clusters especially in production enforce secure > connections > > through SSL certificates* and truststores*. This can make sidecar-based > > features difficult to adopt, as they often assume that the sidecar has > > direct access to the Cassandra instance, which isn’t always the case. > > > > In our environment, for example, we deploy Cassandra outside > > Kubernetes, on *dedicated > > servers or regions *whether in the cloud or on-premises. When new > features > > depend on sidecars running inside Kubernetes, it raises questions about > how > > those features can be extended to customers running Cassandra outside of > > that model. > > > > I’ve had conversations with teams deploying Cassandra within Kubernetes > who > > face challenges exposing Cassandra endpoints to applications *outside the > > cluster*. It makes sense when both Cassandra and the application are in > the > > same K8s environment but that’s not always feasible, especially for > > enterprise environments focused on *security, compliance, and isolation*. > > > > I’m still trying to fully understand how sidecars are expected to connect > > to clusters that require *certificate-based authentication and > truststores*. > > It would be great to hear more about how this integration is envisioned > for > > secure, non-Kubernetes, or hybrid . > > > > Thanks, > > > > Samson > > > > > > > > On Wed, Oct 15, 2025 at 11:34 AM Patrick McFadin <[email protected]> > wrote: > > > > > This is starting to sound more and more like a k8s operator as we are > > > going along here. > > > > > > On Wed, Oct 15, 2025 at 9:02 AM Isaac Reath <[email protected]> > wrote: > > > > > >> I could also see a future where C* Sidecar manages multiple > sub-processes > > >> to to help alleviate the challenges of needing to run multiple > different > > >> sidecars, each configured for a subset of features (e.g., one > configured to > > >> provide SSTable access for analytics, one configured for CDC, one > > >> configured for managing cluster operations). > > >> > > >> > > >> On Wed, Oct 15, 2025 at 10:28 AM Dinesh Joshi <[email protected]> > wrote: > > >> > > >>> The C* Sidecar is built with modules. One could deploy specialized > > >>> instances of Sidecar which only publish CDC streams. The point I’m > making > > >>> is that just because the code lives in a single repo and we have a > single > > >>> artifact doesn’t necessarily mean the user has to enable all the > > >>> functionality at runtime. > > >>> > > >>> On Wed, Oct 15, 2025 at 7:24 AM Josh McKenzie <[email protected]> > > >>> wrote: > > >>> > > >>>> A problem I've seen elsewhere with one > > >>>> process trying to manage different kinds of workloads is that if you > > >>>> need to scale up one kind of workload you may be required to scale > them > > >>>> all up and run head first into some kind of resource starvation > issue. > > >>>> > > >>>> This is a really good point. If the resource consumption by a CDC > > >>>> process grows in correlation w/data ingestion on the C* node, we > would be > > >>>> in for A Bad Time. > > >>>> > > >>>> @Bernardo - do we resource constrain the CommitLog reading and > > >>>> reporting to some kind of ceiling so the CDC consumption just falls > behind > > >>>> and the sidecar can otherwise keep making forward progress on its > other > > >>>> more critical operations? And/or have internal scheduling and > > >>>> prioritization to help facilitate that? > > >>>> > > >>>> On Tue, Oct 14, 2025, at 5:24 PM, Joel Shepherd wrote: > > >>>> > > >>>> Thanks for all the additional light shed. A couple more > > >>>> comments/questions interleaved below ... > > >>>> > > >>>> On 10/9/2025 12:31 PM, Maxim Muzafarov wrote: > > >>>> > Isaac, > > >>>> >> CEP-38 is looking to offer an alternative to JMX for single-node > > >>>> management operations, whereas Sidecar is focused on holistic > cluster-level > > >>>> operations. > > >>>> > Thank you for the summary.You have a perfect understanding of the > > >>>> > CEP-38's purpose, and I share your vision for the Apache Sidecar. > So I > > >>>> > think that both CEP-38 and Sidecar complement each other > perfectly as > > >>>> > a single product. > > >>>> > > >>>> Yes, that's a really helpful distinction. CQL Management API > operates > > >>>> at > > >>>> the node level; Sidecar operates (or is intended to be used?) at > > >>>> cluster > > >>>> level. > > >>>> > > >>>> When I re-read CEP-38, I also noticed that CQL management commands > > >>>> (e.g. > > >>>> EXECUTE) are expected to be sent on a separate port from plain old > CQL > > >>>> (DDL/DML), so that helps limit the surface area for both. Maxim, I > > >>>> curious about at what point in the request handling and execution > the > > >>>> Management API and existing CQL API will branch. E.g. are they > going to > > >>>> share the same parser? Aside from permissions, is there going to be > > >>>> code-level enforcement that CQL-for-management can't be accepted > > >>>> through > > >>>> the existing CQL port? > > >>>> > > >>>> What I'm wondering about are the layers of protection against a > > >>>> misconfigured or buggy cluster allowing an ordinary user to > > >>>> successfully > > >>>> invoke management CQL through the existing CQL port. > > >>>> > > >>>> > On Thu, 9 Oct 2025 at 21:09, Josh McKenzie <[email protected]> > > >>>> wrote: > > >>>> >> A distinction that resonated with me: > > >>>> >> > > >>>> >> Control Plane = Sidecar > > >>>> >> Data Plane = DB > > >>>> >> > > >>>> >> I think that's directionally true, but there's no objective > > >>>> definition of what qualifies as one plane or the other. > > >>>> > > >>>> It's really hazy. You could argue that CREATE TABLE or CREATE > KEYSPACE > > >>>> are control plane operations because in some sense they're > allocating > > >>>> or > > >>>> managing resources ... but it's also totally reasonable to consider > any > > >>>> DDL/DML as a data plane operation, and consider process, network, > file, > > >>>> jvm, etc., management to be control plane. > > >>>> > > >>>> Where does CDC sit? Functionally it's probably part of the data > plane. > > >>>> I > > >>>> believe Sidecar has or plans to have some built-in support for CDC > > >>>> (CEP-44). I'm wondering out loud about whether there are operational > > >>>> risks with having the same process trying to push change records > into > > >>>> Kafka as fast as the node is producing them, and remaining available > > >>>> for > > >>>> executing things like long-running control plane workflows (e.g., > > >>>> backup-restore, restarts, etc.). A problem I've seen elsewhere with > one > > >>>> process trying to manage different kinds of workloads is that if you > > >>>> need to scale up one kind of workload you may be required to scale > them > > >>>> all up and run head first into some kind of resource starvation > issue. > > >>>> > > >>>> I realize there a desire to not require users to deploy and run a > bunch > > >>>> of different processes on each node to get Cassandra to work, and > maybe > > >>>> the different workloads in Sidecar can be sandboxed in a way that > > >>>> prevents one workload from starving the rest of CPU time, IO, etc. > > >>>> > > >>>> Thanks -- Joel. > > >>>> > > >>>> >> On top of that, the sidecar is in a unique position where it > > >>>> supports functionality across multiple versions of C*, so if you're > looking > > >>>> to implement something with a unified interface that may differ in > > >>>> implementation across multiple versions of C* (say, if you're > running a > > >>>> large fleet w/different versions in it), there's pressure there > driving > > >>>> certain functionality into the sidecar. > > >>>> >> > > >>>> >> On Thu, Oct 9, 2025, at 1:42 PM, Isaac Reath wrote: > > >>>> >> > > >>>> >> I don't have too much of an insight on CQL as a whole, but I can > > >>>> offer my views on Sidecar & the CQL Management API. > > >>>> >> > > >>>> >> In terms of a rubric for what belongs in Sidecar, I think taking > > >>>> inspiration from CEP-1, it should be functionality needed to manage > a > > >>>> Cassandra cluster. My perspective on how this fits in with the CQL > > >>>> Management API (and authors of CEP-38 please correct me if I'm > wrong), is > > >>>> that CEP-38 is looking to offer an alternative to JMX for > single-node > > >>>> management operations, whereas Sidecar is focused on holistic > cluster-level > > >>>> operations. > > >>>> >> > > >>>> >> Using rebuild as an example from CEP-38: a user can invoke the > CQL > > >>>> Management API to run a rebuild for a single node, but would rely on > > >>>> Sidecar to rebuild an entire datacenter, with Sidecar in turn > calling the > > >>>> CQL Management API on individual nodes. Similarly, a user could > use the > > >>>> CQL Management API to update the configurations which are able to be > > >>>> changed without a restart (similar to how nodetool setconcurrency > does > > >>>> today), but Sidecar would provide a single interface to update all > > >>>> configurations, including those which require restarts. > Additionally, > > >>>> Sidecar will support operations which may not involve the CQL > Management > > >>>> API at all, such as live instance migration as laid out in CEP-40. > > >>>> >> > > >>>> >> Happy to hear other perspectives on this. > > >>>> >> Isaac > > >>>> >> > > >>>> >> On Wed, Oct 8, 2025 at 3:02 PM Joel Shepherd < > [email protected]> > > >>>> wrote: > > >>>> >> > > >>>> >> To clarify, since I was pinged directly about this ... > > >>>> >> > > >>>> >> It's not my intent to impede any of the efforts listed below and > I > > >>>> >> apologize if it sounded otherwise. > > >>>> >> > > >>>> >> I am deeply curious and interested in the eventual scope/charter > of > > >>>> CQL, > > >>>> >> CQL Admin APIs, and Sidecar. A little overlap is probably > unavoidable > > >>>> >> but IMO it would be detrimental to the project overall to not > have a > > >>>> >> clear scope for each area. If those scopes have already been > defined, > > >>>> >> I'd love pointers to decisions so I can get it straight in my > head. > > >>>> If > > >>>> >> they haven't and the community is comfortable with that, okay > too. If > > >>>> >> they haven't and anyone else is a little squirmy about that, > what's > > >>>> the > > >>>> >> right way to drive a conversation? > > >>>> >> > > >>>> >> Thanks -- Joel. > > >>>> >> > > >>>> >> On 10/7/2025 4:57 PM, Joel Shepherd wrote: > > >>>> >>> Thanks for the clarifications on CEP-38, Maxim: I actually got > some > > >>>> >>> insights from your comments below that had slipped by me while > > >>>> reading > > >>>> >>> the CEP. > > >>>> >>> > > >>>> >>> I want to fork the thread a bit, so breaking this off from the > > >>>> CEP-38 > > >>>> >>> DISCUSS thread. > > >>>> >>> > > >>>> >>> If I can back away a bit and squint ... It seems to me that > there > > >>>> are > > >>>> >>> three initiatives floating around at the moment that could make > > >>>> >>> Cassandra more awesome and manageable, or make it confusing and > > >>>> complex. > > >>>> >>> > > >>>> >>> 1) Patrick McFadin's proposal (as presented at CoC) to align CQL > > >>>> >>> syntax/semantics closely with PostgreSQL's. I haven't heard > anyone > > >>>> >>> strongly object, but have heard several expressions of surprise. > > >>>> Maybe > > >>>> >>> something is already in the works, but I'd love to see and > discuss a > > >>>> >>> proposal for this, so there's consensus that it's a good idea > and > > >>>> (if > > >>>> >>> needed) guidelines on how to evolve CQL in that direction. > > >>>> >>> > > >>>> >>> 2) CQL management API (CEP-38): As mentioned in the CEP, it'll > take > > >>>> >>> some time to implement all the functionality that could be in > scope > > >>>> of > > >>>> >>> this CEP. I wonder if it'd be beneficial to have some kind of > rubric > > >>>> >>> or guidelines for deciding what kind of things make sense to > manage > > >>>> >>> via CQL, and what don't. For example, skimming through the > > >>>> PostgreSQL > > >>>> >>> management commands, many of them look like they could be thin > > >>>> >>> wrappers over SQL executed against "private" tables and views > in the > > >>>> >>> database. I don't know that that is how they are implemented, > but > > >>>> many > > >>>> >>> of the commands are ultimately just setting a value, or reading > and > > >>>> >>> returning values that could potentially be managed in > tables/views > > >>>> of > > >>>> >>> some sort. (E.g., like Cassandra virtual tables). That seems to > fit > > >>>> >>> pretty neatly with preserving SQL as a declarative, data > independent > > >>>> >>> language for data access, with limited side-effects. Is that a > > >>>> useful > > >>>> >>> filter for determining what kinds of things can be managed via > CQL > > >>>> >>> management, and which should be handled elsewhere? E.g., is a > > >>>> >>> filesystem operation like nodetool scrub a good candidate for > CQL > > >>>> >>> management or not? (I'd vote not: interested in what others > think.) > > >>>> >>> > > >>>> >>> 3) Cassandra Sidecar: Like the CQL management API, I wonder if > it'd > > >>>> be > > >>>> >>> beneficial to have a rubric for deciding what kinds of things > make > > >>>> >>> sense to go into Sidecar. The recent discussion about CEP-55 > > >>>> >>> (generated role names) landed on implementing the functionality > both > > >>>> >>> as a CQL statement and as a Sidecar API. There's also activity > > >>>> around > > >>>> >>> using SIdecar for rolling restarts, backup and restore, etc.: > > >>>> control > > >>>> >>> plane activities that are largely orthogonal to interacting > with the > > >>>> >>> data. Should operations that are primarily generating or > > >>>> manipulating > > >>>> >>> data be available via Sidecar to give folks the option of > invoking > > >>>> >>> them via CQL or HTTP/REST, or would Sidecar benefit from having > a > > >>>> more > > >>>> >>> narrowly scope charter (e.g. data-agnostic control plane > operations > > >>>> only)? > > >>>> >>> > > >>>> >>> I think all of these tools -- CQL, CQL Management API and > Sidecar -- > > >>>> >>> will be more robust, easier to use, and easier to maintain if we > > >>>> have > > >>>> >>> a consistent way of deciding where a given feature should live, > and > > >>>> a > > >>>> >>> minimal number of choices for accessing the feature. Orthogonal > > >>>> >>> controls. Since Sidecar and CQL Management API are pretty new, > it's > > >>>> a > > >>>> >>> good time to clarify their charter to ensure they evolve well > > >>>> >>> together. And to get consensus on the long-term direction for > CQL. > > >>>> >>> > > >>>> >>> Let me know if I can help -- Joel. > > >>>> >>> > > >>>> >>> > > >>>> >>> On 10/7/2025 12:22 PM, Maxim Muzafarov wrote: > > >>>> >>>> Hello Folks, > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> First of all, thank you for your comments. Your feedback > motivates > > >>>> me > > >>>> >>>> to implement these changes and refine the final result to the > > >>>> highest > > >>>> >>>> standard. To keep the vote thread clean, I'm addressing your > > >>>> questions > > >>>> >>>> in the discussion thread. > > >>>> >>>> > > >>>> >>>> The vote is here: > > >>>> >>>> > https://lists.apache.org/thread/zmgvo2ty5nqvlz1xccsls2kcrgnbjh5v > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> = The idea: = > > >>>> >>>> > > >>>> >>>> First, let me focus on the general idea, and then I will answer > > >>>> your > > >>>> >>>> questions in more detail. > > >>>> >>>> > > >>>> >>>> The main focus is on introducing a new API (CQL) to invoke the > same > > >>>> >>>> node management commands. While this has an indirect effect on > > >>>> tooling > > >>>> >>>> (cqlsh, nodetool), the tooling itself is not the main focus. > The > > >>>> scope > > >>>> >>>> (or Phase 1) of the initial changes is narrowed down only to > the > > >>>> API > > >>>> >>>> only, to ensure the PR remains reviewable. > > >>>> >>>> > > >>>> >>>> This implies the following: > > >>>> >>>> - the nodetool commands and the way they are implemented won't > > >>>> change > > >>>> >>>> - the nodetool commands will be accessible via CQL, their > > >>>> >>>> implementation will not change (and the execution locality) > > >>>> >>>> - this change introduces ONLY a new way of how management > commands > > >>>> >>>> will be invoked > > >>>> >>>> - this change is not about the tooling (cqlsh, nodetool), it > will > > >>>> help > > >>>> >>>> them evolve, however > > >>>> >>>> - these changes are being introduced as an experimental API > with a > > >>>> >>>> feature flag, disabled by default > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> = The answers: = > > >>>> >>>> > > >>>> >>>>> how will the new CQL API behave if the user does not specify a > > >>>> hostname? > > >>>> >>>> The changes only affect the API part; improvements to the > tooling > > >>>> will > > >>>> >>>> follow later. The command is executed on the node that the > client > > >>>> is > > >>>> >>>> connected to. > > >>>> >>>> Note also that the port differs from 9042 (default) as a new > > >>>> >>>> management port will be introduced. See examples here [1]. > > >>>> >>>> > > >>>> >>>> cqlsh 10.20.88.164 11211 -u myusername -p mypassword > > >>>> >>>> nodetool -h 10.20.88.164 -p 8081 -u myusername -pw mypassword > > >>>> >>>> > > >>>> >>>> If a host is not specified, the cli tool will attempt to > connect to > > >>>> >>>> localhost. I suppose. > > >>>> >>>> > > >>>> >>>> > > >>>> >>>>> My understanding is that commands like nodetool bootstrap > > >>>> typically run on a single node. > > >>>> >>>> This is correct; however, as I don't control the > implementation of > > >>>> the > > >>>> >>>> command, it may actually involve communication with other > nodes. > > >>>> This > > >>>> >>>> is actually not part of this CEP. I'm only reusing the > commands we > > >>>> >>>> already have. > > >>>> >>>> > > >>>> >>>> > > >>>> >>>>> Will we continue requiring users to specify a hostname/port > > >>>> explicitly, or will the CQL API be responsible for orchestrating the > > >>>> command safely across the entire cluster or datacenter? > > >>>> >>>> It seems that you are confusing the API with the tooling. The > > >>>> tooling > > >>>> >>>> (cqlsh, nodetool) will continue to work as it does now. I am > only > > >>>> >>>> adding a new way in which commands can be invoked - CQL, > > >>>> >>>> orchestration, however, is the subject of other projects. > Cassandra > > >>>> >>>> Sidecar? > > >>>> >>>> > > >>>> >>>> > > >>>> >>>>> It might, however, be worth verifying that the proposed CQL > > >>>> syntax aligns with PostgreSQL conventions, and adjusting it if > needed for > > >>>> cross-compatibility. > > >>>> >>>> It's a bit new info to me that we're targeting PostgreSQL as > the > > >>>> main > > >>>> >>>> reference and drifting towards the invoking management > operations > > >>>> the > > >>>> >>>> same way. I'm inclined to agree that the syntax should > probably be > > >>>> >>>> similar, more or less, however. > > >>>> >>>> > > >>>> >>>> We are introducing a new CQL syntax in a minimal and isolated > > >>>> manner. > > >>>> >>>> The CEP-38 defines a small set of management-oriented CQL > > >>>> statements > > >>>> >>>> (EXECUTE COMMAND / DESCRIBE COMMAND) that can be used to match > all > > >>>> >>>> existing nodetool commands at once, introducing further > aliases as > > >>>> an > > >>>> >>>> option. This eliminates the need to introduce a new antlr > grammar > > >>>> for > > >>>> >>>> each management operation. > > >>>> >>>> > > >>>> >>>> The command execution syntax is the main thing that users > interact > > >>>> >>>> with in this CEP, but I'm taking a more relaxed approach to it > for > > >>>> the > > >>>> >>>> following reasons: > > >>>> >>>> - the tip of the iceberg, the unification of the JMX, CQL and > > >>>> possible > > >>>> >>>> REST API for Cassandra is priority; > > >>>> >>>> - the feature will be in experimental state in the major > release, > > >>>> we > > >>>> >>>> need collect the real feedback from users and their > deployments; > > >>>> >>>> - the aliasing will be used for some important commands like > > >>>> >>>> compaction, bootstrap; > > >>>> >>>> > > >>>> >>>> Taking all of the above into account, I still think it's > important > > >>>> to > > >>>> >>>> reach an agreement, or at least to avoid objections. > > >>>> >>>> So, I've checked the PostgreSQL and SQL standards to identify > > >>>> areas of > > >>>> >>>> alignment. The latter I think is relatively easy to support as > > >>>> >>>> aliases. > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> The syntax proposed in the CEP: > > >>>> >>>> > > >>>> >>>> EXECUTE COMMAND forcecompact WITH > > >>>> keyspace=distributed_test_keyspace > > >>>> >>>> AND table=tbl AND keys=["k4", "k2", "k7"]; > > >>>> >>>> > > >>>> >>>> Other Cassandra-style options that I had previously considered: > > >>>> >>>> > > >>>> >>>> 1. EXECUTE COMMAND forcecompact > > >>>> (keyspace=distributed_test_keyspace, > > >>>> >>>> table=tbl, keys=["k4", "k2", "k7"]); > > >>>> >>>> 2. EXECUTE COMMAND forcecompact WITH ARGS {"keyspace": > > >>>> >>>> "distributed_test_keyspace", "table": "tbl", "keys":["k4", > "k2", > > >>>> >>>> "k7"]}; > > >>>> >>>> > > >>>> >>>> With the postgresql context [2] it could look like: > > >>>> >>>> > > >>>> >>>> COMPACT (keys=["k4", "k2", "k7"]) > distributed_test_keyspace.tbl; > > >>>> >>>> > > >>>> >>>> The SQL-standard [3][4] procedural approach: > > >>>> >>>> > > >>>> >>>> CALL system_mgmt.forcecompact( > > >>>> >>>> keyspace => 'distributed_test_keyspace', > > >>>> >>>> table => 'tbl', > > >>>> >>>> keys => ['k4','k2','k7'], > > >>>> >>>> options => { "parallel": 2, "verbose": true } > > >>>> >>>> ); > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> Please let me know if you have any questions, or if you would > like > > >>>> us > > >>>> >>>> to arrange a call to discuss all the details. > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> [1] > > >>>> > https://www.instaclustr.com/support/documentation/cassandra/using-cassandra/connect-to-cassandra-with-cqlsh/ > > >>>> >>>> [2]https://www.postgresql.org/docs/current/sql-vacuum.html > > >>>> >>>> [3] > > >>>> > https://en.wikipedia.org/wiki/Stored_procedure?utm_source=chatgpt.com#Implementation > > >>>> >>>> [4]https://www.postgresql.org/docs/9.3/functions-admin.html > > >>>> >>>> > > >>>> >>>> > > >>>> >> > > >>>> > > >>>> > > >>>> > > >
