Hi Everyone, Thank you for all the great feedback! I’ve updated the proposed grammar in the CEP to align with our discussion and adopt PostgreSQL-style CQL statements. Below are a few clarifications on specific points:
*Providers* For SECURITY LABEL, we will accept CQL statements such as: SECURITY LABEL [FOR <provider>] ON <object> IS '<text>'; However, FOR <provider> will not be implemented in this CEP, as the scope here is limited to enriching schema elements. When a provider is supplied, value will be ignored. Server will log a warning message regarding the ignored field. Support for providers can be addressed in a future CEP, which would further strengthen Cassandra's security posture. *USE KEYSPACE and Autocomplete* We will support omitting the keyspace when USE KEYSPACE is active, enabling the expected autocomplete behavior. *CREATE TABLE LIKE* I agree with the suggestion to add a WITH COMMENTS option. By default, comments will not be copied. For example: CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND COMMENTS AND SECURITY LABEL; Thanks again for the thoughtful discussion and valuable input! Best, Jyothsna On Wed, Aug 13, 2025 at 4:03 AM Štefan Miklošovič <smikloso...@apache.org> wrote: > Thank you very much, Jyothsna, for being so receptive to community > suggestions. Really appreciate it. > > Regarding to your last example of comment creation, as you put that > > COMMENT ON COLUMN ks.tb.val IS 'credit card number' > SECURITY LABEL ON COLUMN ks.tb.val IS 'PII' > > having Cassandra which has also the concept of keyspaces, when I compare > it with PG which has this > > COMMENT ON COLUMN my_table.my_column IS 'Employee ID number'; > > and we would have this > > COMMENT ON COLUMN ks.tb.val IS 'credit card number' > > The construct of "ks.tb.val" is rather unusual but I think we could > definitely live with it. > > One more caveat to all these examples is that if we have > > USE KEYSPACE ks; > > then this should "autocomplete" ks: > > COMMENT ON COLUMN tb.val IS 'credit card number' > > Similarly, it would be nice if it was done like that of all other elements > which logically reside in a keyspace. > > There is also "CREATE TABLE LIKE" introduced recently (1, 2, 3) and if > there is a table we go to copy like that to another one, it is questionable > if we should automatically create all comments with it. We could follow how > it is done for indexes: > > CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES; > > so here it would be > > CREATE TABLE ks.tb_copy LIKE ks.tb WITH COMMENTS; > > and in case of both specified: > > CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND COMMENTS; > > and by default comments would _not_ be copied over. > > Regards and thank you > > (1) https://issues.apache.org/jira/browse/CASSANDRA-7662 > (2) https://issues.apache.org/jira/browse/CASSANDRA-19964 > (3) https://issues.apache.org/jira/browse/CASSANDRA-19965 > > On Tue, Aug 12, 2025 at 9:53 PM Jyothsna Konisa <jyothsna1...@gmail.com> > wrote: > >> >> Hi Stefan, Patrick, and everyone, >> >> Thank you all for your valuable feedback and suggestions. I've >> consolidated the key points and wanted to share our thinking on a path >> forward. >> >> >> *Regarding the PostgreSQL-style Syntax (COMMENT ON & SECURITY LABEL)* >> >> We agree with the consensus that adopting PostgreSQL-style syntax is the >> most promising approach for the following reasons, which were >> well-articulated in the thread: >> >> - Avoids introducing new Syntax >> >> - Keeps CQL closer to mainstream SQL >> >> - More SQL data for LLM training >> >> >> >> *Storing Annotations* >> We propose to store these comments as part of the schema element's >> metadata, which will be persisted to TCM. >> >> Regarding the discussion about a separate table for annotations: We want >> to present an alternative to store annotations/comments in a virtual table. >> We can address this during implementation or as a follow-up to this CEP. >> >> *Impact on DESCRIBE Statements* >> >> Adopting the COMMENT ON syntax will require some changes to how the >> schema is displayed. >> >> To maintain consistency and ensure the schema can be fully reproduced, >> the COMMENT ON statements must be included in the output of DESCRIBE TABLE. >> We propose that the output for DESCRIBE TABLE would look something like >> this: >> >> >> // Comment creation & DESC table output >> CREATE TABLE ks.tb >> ( >> id int PRIMARY KEY, >> val text >> ) >> >> COMMENT ON COLUMN ks.tb.val IS 'credit card number' >> SECURITY LABEL ON COLUMN ks.tb.val IS 'PII' >> >> >> Including the comment information within the CREATE TABLE statement >> itself might be redundant and displaying them as separate COMMENT ON >> statements might be better. >> >> Thanks >> Jyothsna >> >> On Tue, Aug 12, 2025 at 9:31 AM Štefan Miklošovič <smikloso...@apache.org> >> wrote: >> >>> One more point I would like to add. If we enrich the output with >>> comments, I think that seeing comments should be only default if I can take >>> what DESCRIBE prints and I can copy it as-is and create tables from it. >>> Very often, DESCRIBE acts as something like "I will copy this schema here >>> so I can reconstruct it later". So I would expect that, by default, what >>> DESCRIBE gives is "reconstructable". I think there are a lot of tests >>> already which tests what DESCRIBE prints can be reconstructed and this >>> would need to be preserved. >>> >>> We might still do "DESCRIBE ks.tb" without comments / annotations and >>> then "DESCRIBE ks.tb WITH COMMENTS / ANNOTATIONS" to print them. >>> >>> If we put comments on this it is "reconstructable by copy-pasting" as >>> well: >>> >>> create table ks.tb >>> ( >>> -- my primary key column >>> id int primary key, >>> -- this is my value >>> val text >>> ) >>> >>> however this is not >>> >>> create table ks.tb >>> ( >>> /** >>> my primary key column >>> */ >>> id int primary key, >>> val text >>> ) >>> >>> you got me ... >>> >>> Also, if we start to automatically enrich DESCRIBE output, it would be >>> very nice if this was digestible by previous versions. Because if I copy >>> DESCRIBE output in 5.1 with @PII then I can not just apply that to 5.0 >>> where that concept is not known yet. However plain comments do work in >>> previous versions as well. >>> >>> For this reason I would not make annotations visible by default, I would >>> opt-in by WITH COMMENTS / WITH ANNOTATIONS only and keep the current output >>> as is. >>> >>> >>> On Tue, Aug 12, 2025 at 10:56 AM Mick <m...@apache.org> wrote: >>> >>>> a point of order and a reminder: aside from suggestions that the CEP >>>> author is free to adopt or not, anything that's assuming to steer what the >>>> CEP should be should be accompanied with the willingness to commit in >>>> helping making it happen. we want to work as a meritocracy: those that >>>> lead the work have the say, and blocking their chosen approach against >>>> their wishes is only on clear technical reasons. API designs (CQL >>>> additions) always needs to be chosen and evolved carefully, and every CEP >>>> proposed should be open to that being naturally part of its discussion >>>> pre-vote. >>>> >>>> following the PG approach does make a lot of sense. >>>> what are your thoughts on it Jyothsna & Yifan ? >>>> >>>> >>>> >>>> > On 12 Aug 2025, at 09:14, Štefan Miklošovič <smikloso...@apache.org> >>>> wrote: >>>> > >>>> > I like the idea of COMMENT ON and alike from PG! Yes, great stuff, as >>>> we do not invent anything custom and we will be as close as possible to >>>> industry standard. >>>> > >>>> > So, if I understand this correctly, on COMMENT ON, we would save each >>>> comment to a dedicated table. Then on DESCRIBE, we would "enrich" the CQL >>>> element we are describing with commentary, if any, from that comment table, >>>> correct? >>>> > >>>> > I, in general, support this idea, but as usual the devil is in the >>>> details. I am just genuinely curious how this would work in practice. >>>> > >>>> > >>>> > If we go with COMMENT ON, is this going to be stored to TCM or not? >>>> > >>>> > >>>> > If the answer is yes, then it is way more simpler, because then this >>>> commentary would be dispersed by the means of TCM and each node would apply >>>> this transformation locally to system_schema.annotations. >>>> > >>>> > If the answer is no and if there is a cluster and we do COMMENT ON, >>>> then this comment has to be saved to a table. If we rule out TCM as a >>>> vehicle for the dispersion of these comments, that comment table has to be >>>> distributed / replicated, correct? I do not think that we can create that >>>> table under system_schema then, as that is on LocalStrategy and all >>>> modifications to that are, as I understand it, done via TCM? >>>> > >>>> > Hence, I guess the better place for that is under system_distributed? >>>> That means that if somebody changes that keyspace to NTS or nodes are not >>>> available, we will not be able to create any commentary. >>>> > >>>> > Also, if we remove / alter anything, like dropping a keyspace, table, >>>> index, removing column etc ... all these changes would need to also remove >>>> respective comments from that table etc etc. >>>> > >>>> > For these reasons, I think that having dedicated >>>> system_schema.annotations table while interacting with it via COMMENT ON to >>>> be "PG-compatible" so people can query that table directly, and backing >>>> COMMENT ON by TCM by having it as another transformation (as COMMENT ON is >>>> inherently part of the schema) is the best way to do this. >>>> > >>>> > On Mon, Aug 11, 2025 at 10:55 PM Patrick McFadin <pmcfa...@gmail.com> >>>> wrote: >>>> > One (of many) reasons I'm advocating we migrate away from CQL. It >>>> served a purpose at the time, but this project is evolving and this to me >>>> seems like the logical next iteration. The Cassandra project has built it's >>>> reputation on what it can do, not clever syntax design. ;) >>>> > >>>> > Patrick >>>> > >>>> > On Mon, Aug 11, 2025 at 1:51 PM Yifan Cai <yc25c...@gmail.com> wrote: >>>> > The reasonings on operator and LLM familiarity are spot on. >>>> > >>>> > I have experimented with LLM generated queries. It typically does a >>>> noticeably better job on SQL than CQL. >>>> > >>>> > - Yifan >>>> > >>>> > On Mon, Aug 11, 2025 at 1:44 PM Patrick McFadin <pmcfa...@gmail.com> >>>> wrote: >>>> > I really love this CEP. +1 on the goal. >>>> > >>>> > As you've already seen, I've been advocating to improve our syntax >>>> ergonomics towards more mainstream SQL and avoiding new/custom syntax. I >>>> would suggest the following changes towards that goal: >>>> > - Reuse PG-shaped DDL. Keep human text in COMMENT ON[1] (map >>>> existing table comments to that). For structured tags, mirror SECURITY >>>> LABEL[2]: >>>> > SECURITY LABEL FOR <provider> ON <object> IS '<text>'; >>>> > >>>> > - Allow multiple providers per object. Store the value as text in v1 >>>> (JSON or key/val later if we want), which avoids inventing new inline @ >>>> syntax. >>>> > >>>> > - Avoid new grammar in CREATE/ALTER. Skipping inline @PII keeps >>>> schemas readable and the grammar simple. Tools can issue COMMENT >>>> ON/SECURITY LABEL right after DDL, like PG users do today. >>>> > >>>> > - Names & built-ins. Case-insensitive provider names with canonical >>>> lowercase. No separate @Description type. COMMENT ON already covers that >>>> use case cleanly. >>>> > >>>> > - Introspection by query and by DESC. Keep annotations visible in >>>> DESCRIBE, but also expose a single system_schema.annotations view >>>> (provider, object_type, object_name, sub_name, value) so folks can get all >>>> annotations for a table. Example: “find all columns labeled PII,” etc. >>>> > >>>> > Why PG-like? Besides operator familiarity, there’s far more training >>>> data and tooling around COMMENT ON/SECURITY LABEL than around bespoke >>>> @annotation syntax. Sticking to that shape reduces LLM/tool friction and >>>> avoids teaching the world a new grammar. This has been a huge challenge for >>>> Cassandra work with LLMs as models tend to drift towards PG SQL in CQL >>>> often. (No Claude, JOIN is not a keyword in Cassandra) >>>> > >>>> > If this direction sounds good, happy to help update the CEP text and >>>> examples. >>>> > >>>> > Patrick >>>> > >>>> > 1: COMMENT ON docs >>>> https://www.postgresql.org/docs/current/sql-comment.html >>>> > 2: SECURITY LABEL docs >>>> https://www.postgresql.org/docs/current/sql-security-label.html >>>> > >>>> > >>>> > On Mon, Aug 11, 2025 at 10:18 AM Yifan Cai <yc25c...@gmail.com> >>>> wrote: >>>> > IMO, the full schema or table schema output already makes it possible >>>> to filter the fields (not limited to columns) that are using certain >>>> annotations, relatively easily. Grepping or parsing, whichever is more >>>> suitable for the scenarios; consumers make the call. >>>> > There is not much added value by providing such a dedicated query, >>>> however, adding quite a lot of complexity in the design of this CEP. Please >>>> correct me if I have the wrong understanding of the queries. >>>> > >>>> > Another reason for preferring the existing "DESCRIBE" statements is >>>> the gen-AI enrichment mentioned in the CEP. We most likely want to feed the >>>> LLM the full (table) schema. >>>> > >>>> > The primary goal is to enrich the schema with annotations. Through >>>> the discussion thread, we will find out whether there is enough motivation >>>> to support such queries to filter by annotation. I appreciate that you >>>> brought up the idea. >>>> > >>>> > Although we are not at the stage of talking about the implementation, >>>> just sharing my thoughts a bit, I am thinking of the approach (1) that >>>> Stefan mentioned. >>>> > >>>> > - Yifan >>>> > >>>> > On Mon, Aug 11, 2025 at 6:31 AM Francisco Guerrero < >>>> fran...@apache.org> wrote: >>>> > Another interesting query would be to retrieve all the fields >>>> annotated with PII >>>> > for example. >>>> > >>>> > On 2025/08/11 01:01:21 Yifan Cai wrote: >>>> > > > >>>> > > > Will there be an option to do a SELECT query to read all the >>>> annotations >>>> > > > of a table? >>>> > > >>>> > > >>>> > > It is an interesting question! Would you mind sharing an example of >>>> the >>>> > > output you'd expect from a query like *"SELECT * FROM >>>> > > system_schema.annotations where keyspace_name=<> and >>>> table_name=<>"*? I am >>>> > > curious how that might differ from what we get when running "DESC >>>> TABLE". >>>> > > >>>> > > - Yifan >>>> > > >>>> > > On Sat, Aug 9, 2025 at 9:43 AM Jaydeep Chovatia < >>>> chovatia.jayd...@gmail.com> >>>> > > wrote: >>>> > > >>>> > > > >we could explore enriching the syntax with DESCRIBE >>>> > > > >>>> > > > Will there be an option to do a SELECT query to read all the >>>> annotations >>>> > > > of a table? Something like *"SELECT * FROM >>>> system_schema.annotations >>>> > > > where keyspace_name=<> and table_name=<>"* >>>> > > > It would be helpful to have a structured CQL query on top of >>>> printing the >>>> > > > annotations through DESC so that the information can be consumed >>>> easily. >>>> > > > >>>> > > > Jaydeep >>>> > > > >>>> > > > On Fri, Aug 8, 2025 at 11:03 AM Jyothsna Konisa < >>>> jyothsna1...@gmail.com> >>>> > > > wrote: >>>> > > > >>>> > > >> Thanks, Joel, for the positive response. >>>> > > >> >>>> > > >> 1. User-defined vs. pre-defined annotation types >>>> > > >> >>>> > > >> We'd like to have one predefined annotation, Description, but >>>> also give >>>> > > >> users the flexibility to create new ones. If a user feels that a >>>> custom >>>> > > >> annotation like @Desc suits their use case, they should be >>>> allowed to use >>>> > > >> it, as these elements are purely descriptive and have no actions >>>> associated >>>> > > >> with them. >>>> > > >> >>>> > > >> 2. Syntactically, is it worth considering other alternatives? >>>> > > >> >>>> > > >> You're concerned that having several annotations on multiple >>>> columns >>>> > > >> could make schemas difficult to read. For now, we can have >>>> annotations >>>> > > >> printed as part of DESCRIBE statements. If there's a strong need >>>> to >>>> > > >> suppress annotations for readability, we could explore enriching >>>> the syntax >>>> > > >> with DESCRIBE [FULL] SCHEMA [WITH ANNOTATIONS], similar to the >>>> existing >>>> > > >> DESCRIBE [FULL] SCHEMA. >>>> > > >> >>>> > > >> Thanks, >>>> > > >> Jyothsna >>>> > > >> >>>> > > >> On Fri, Aug 8, 2025 at 10:56 AM Jyothsna Konisa < >>>> jyothsna1...@gmail.com> >>>> > > >> wrote: >>>> > > >> >>>> > > >>> Thanks, Stefan, for your feedback! >>>> > > >>> >>>> > > >>> To answer your questions, >>>> > > >>> >>>> > > >>> 1. I agree; annotations can optionally take arguments, and if an >>>> > > >>> annotation doesn't have an argument, we can skip the arguments >>>> in the >>>> > > >>> "DESCRIBE" statement's output. >>>> > > >>> >>>> > > >>> 2. Good point. We originally considered using "ANNOTATED WITH" >>>> but found >>>> > > >>> it too verbose. As an alternative, we proposed using "@" >>>> preceding the >>>> > > >>> annotation to signal it to the parser. We are open to using an >>>> explicit >>>> > > >>> phrase like "ANNOTATED WITH" if you think it would make the >>>> code more >>>> > > >>> readable. >>>> > > >>> >>>> > > >>> A full example of annotations along with constraints and >>>> masking could >>>> > > >>> be: >>>> > > >>> >>>> > > >>> >>>> > > >>> CREATE TABLE test_ks.test_table ( >>>> > > >>> id int PRIMARY KEY, >>>> > > >>> col2 int CHECK col2 > 0 ANNOTATED WITH @PII AND >>>> @DESCRIPTION('this >>>> > > >>> is column col2') MASKED WITH default() >>>> > > >>> ); >>>> > > >>> >>>> > > >>> OR >>>> > > >>> >>>> > > >>> CREATE TABLE test_ks.test_table ( >>>> > > >>> id int PRIMARY KEY, >>>> > > >>> col2 int CHECK col2 > 0 @PII AND @DESCRIPTION('this is >>>> column col2') >>>> > > >>> MASKED WITH default() >>>> > > >>> ); >>>> > > >>> >>>> > > >>> >>>> > > >>> >>>> > > >>> 3. We do not have a prototype yet, but I think we will have to >>>> introduce >>>> > > >>> new parsing branch for annotations at the table level >>>> > > >>> >>>> > > >>> I hope I answered all your questions! >>>> > > >>> >>>> > > >>> - Jyothsna >>>> > > >>> >>>> > > >>> On Thu, Aug 7, 2025 at 11:36 AM Joel Shepherd < >>>> sheph...@amazon.com> >>>> > > >>> wrote: >>>> > > >>> >>>> > > >>>> I like the aim of the CEP. Completely onboard with the idea >>>> that GenAI >>>> > > >>>> tooling works better when you can provide it useful context >>>> about the data >>>> > > >>>> it is working with. An organization I worked with in the past >>>> had a lot of >>>> > > >>>> good results with marking up API models (not DB schemas, but >>>> similar idea) >>>> > > >>>> with authorization-related annotations and using those to >>>> drive policy >>>> > > >>>> linters and end-user interfaces. So, sold on the value of the >>>> capability. >>>> > > >>>> >>>> > > >>>> Two things I'm less sure of: >>>> > > >>>> >>>> > > >>>> 1) User-defined vs pre-defined annotation types: I appreciate >>>> the >>>> > > >>>> flexibility that user-defined annotations appears to give, but >>>> it adds >>>> > > >>>> extra room for error. E.g. if annotation names are >>>> case-sensitive, do I >>>> > > >>>> (the user) have to actively prevent creation of @description? >>>> Or, police >>>> > > >>>> the accidental creation of alternative names like @Desc? If >>>> the community >>>> > > >>>> settled on a small, fixed set of supported annotations, so >>>> Cassandra itself >>>> > > >>>> was authoritative for valid annotation names, would make the >>>> feature a lot >>>> > > >>>> less valuable, or prevent offering user-defined annotations in >>>> the future? >>>> > > >>>> >>>> > > >>>> 2) Syntactically, is it worth considering other alternatives? >>>> I was >>>> > > >>>> trying to imagine a CREATE TABLE statement marked up with two >>>> or three >>>> > > >>>> types of column-level annotations, and my sense is that it >>>> could get hard >>>> > > >>>> to read quickly. Is it worth considering Javadoc-style >>>> annotations in >>>> > > >>>> schema comments instead? I think in today's world that means >>>> that they >>>> > > >>>> would not be accessible via CQL/Cassandra (CQL comments are >>>> not persisted >>>> > > >>>> as part of the schema, correct?) but they could be accessible >>>> to other >>>> > > >>>> schema-processing tools and IMO be a more readable syntax. >>>> It'd be good to >>>> > > >>>> work through a couple use-cases for actually using the data >>>> provided by the >>>> > > >>>> annotations and get a sense of whether making them first-class >>>> entities in >>>> > > >>>> CQL is necessary for getting most of the value from them. >>>> > > >>>> >>>> > > >>>> Thanks -- Joel. >>>> > > >>>> On 8/6/2025 6:59 PM, Jyothsna Konisa wrote: >>>> > > >>>> >>>> > > >>>> Sorry for the incorrect editable link, here is the updated >>>> link to the CEP >>>> > > >>>> 52: Schema Annotations for ApacheCassandra >>>> > > >>>> < >>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP+52%3A+Schema+Annotations+for+ApacheCassandra >>>> > >>>> > > >>>> >>>> > > >>>> On Wed, Aug 6, 2025 at 4:26 PM Jyothsna Konisa < >>>> jyothsna1...@gmail.com> >>>> > > >>>> wrote: >>>> > > >>>> >>>> > > >>>>> Hello Everyone! >>>> > > >>>>> >>>> > > >>>>> We would like to propose CEP 52: Schema Annotations for >>>> > > >>>>> ApacheCassandra >>>> > > >>>>> < >>>> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=373887528&draftShareId=339b7f4e-9bc2-45bd-9a80-b0d4215e3f45& >>>> > >>>> > > >>>>> >>>> > > >>>>> This CEP outlines a plan to introduce *Schema Annotations* as >>>> a way >>>> > > >>>>> to add better context to schema elements. We're also >>>> proposing a set of new >>>> > > >>>>> DDL statements to manage these annotations. >>>> > > >>>>> >>>> > > >>>>> We believe these annotations will be highly beneficial for >>>> several key >>>> > > >>>>> areas: >>>> > > >>>>> >>>> > > >>>>> - >>>> > > >>>>> >>>> > > >>>>> GenAI Applications: Providing more context to LLMs could >>>> > > >>>>> significantly improve the accuracy and relevance of >>>> generated content. >>>> > > >>>>> - >>>> > > >>>>> >>>> > > >>>>> Data Governance: Annotations can help in enforcing >>>> policies using >>>> > > >>>>> annotations >>>> > > >>>>> - >>>> > > >>>>> >>>> > > >>>>> Compliance: They can be used to track and manage compliance >>>> > > >>>>> requirements directly within the schema. >>>> > > >>>>> >>>> > > >>>>> We're eager to hear your thoughts and feedback on this >>>> proposal. >>>> > > >>>>> Please keep the discussion within this mailing thread. >>>> > > >>>>> >>>> > > >>>>> Thanks for your time and feedback in advance. >>>> > > >>>>> >>>> > > >>>>> Best regards, >>>> > > >>>>> >>>> > > >>>>> Jyothsna & Yifan >>>> > > >>>>> >>>> > > >>>>> >>>> > > >>>>> >>>> > > >>>>> >>>> > > >>>> >>>>