Re: [DISCUSS] How we version our releases

2025-04-11 Thread Benedict
I proposed dropping Minor versions a few years ago so I’m cool with this, but regarding policies I do think we’d be better off just creating a version compatibility matrix and quit agonising over it. With N policies across N releases I’m not sure they’re providing much certainty to users. We really just need a consistent mechanism for deciding upgrades. A compatibility matrix solves that problem, and we can safely update this across all of our present, past and future inanities.On 11 Apr 2025, at 08:15, Mick Semb Wever  wrote:On Thu, 10 Apr 2025 at 22:54, Josh McKenzie  wrote:…So here's what I'm thinking: a new release strategy that doesn't use .MINOR of semver. Goals:- Simplify versioning for end users- Provide clearer contracts for users as to what they can expect in releases- Simplify support for us (CI, merges, etc)- Clarify our public API deprecation processStructure / heuristic:- Online upgrades are supported for all GA supported releases at time of new .MAJOR- T-1 releases are guaranteed API compatible- We use a deprecate-then-remove strategy for API breaking changes…So: what do we think?+1 David, yeah, we avoid .1 minor releases altogether.IIUC this does not imply allowing breaking changes.  That ties into the recent thread about aiming to maintain compatibility forever.  That after a deprecation cycle, wanting to remove/break anything requires a discussion and evaluation to the cost of keeping that legacy/deprecated code.WRT jdks, Everytime we drop a jdk, we drop testing all upgrade paths from versions where that was the highest jdk.  In 6.0 if we drop jdk11 then we will stop testing upgrades from 4.x versions.  Our tests don't support it, but we also need to at some point for the sake of keeping the test matrix sane to our CI resources.The previous versioning scheme meant we chose when to drop a jdk (or break the upgrade supported paths).  The proposed versioning scheme means we have to wait and align dropping jdks with the T-2 approach.  I think it's a great idea that we internalise this cognitive load, making it simpler for the user.


Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Benedict
I would prefer require/expect/is over checkOn 11 Apr 2025, at 08:05, Štefan Miklošovič  wrote:Yes, you will have it like that :) Thank you for this idea. Great example of cooperation over diverse domains.On Fri, Apr 11, 2025 at 12:29 AM David Capwell  wrote:I am biased but I do preferval3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024Here is a similar accord CQLBEGIN TRANSACTION  LET a = (…);  IF a IS NOT NULL       AND a.b IS NOT NULL       AND a.c IS NULL; THEN    — profit  END IFCOMMIT TRANSACTIONOn Apr 10, 2025, at 8:46 AM, Yifan Cai  wrote:





Re: reserved keywords, “check” is currently not, and I don’t think it needs to be a reserved keyword with the proposal.







From: C. Scott Andreas 
Sent: Thursday, April 10, 2025 7:59:35 AM
To: dev@cassandra.apache.org 
Cc: dev@cassandra.apache.org 
Subject: Re: Constraint's "not null" alignment with transactions and their simplification
 



If the proposal does not introduce “check” as a reserved keyword that would require quoting in existing DDL/DML, this concern doesn’t apply and the email below can be ignored. This might be the case if “CHECK NOT NULL” is the full token introduced rather
 than “CHECK” separately from constraints that are checked.


If “check” is introduced as a standalone reserved keyword: my primary feedback is on the introduction of reserved words in the CQL grammar that may affect compatibility of existing schemas.



In the Cassandra 3.x series, several new CQL reserved words were added (more than necessary) and subsequently backed out, because it required users to begin quoting schemas and introduced incompatibility between 3.x and 4.x for queries and DDL that “just
 worked” before.


The word “check” is used in many domains (test/evaluation engineering, finance, business processes, etc) and is likely to be used in user schemas. If the proposal introduces this as a reserved word that would require it to be quoted if used in table or
 column names, this will create incompatibility for existing user queries on upgrade.


Otherwise, ignore me. :)


Thanks,


– Scott




–––
Mobile


On Apr 10, 2025, at 7:47 AM, Jon Haddad  wrote:





This looks like a really nice improvement to me. 




On Thu, Apr 10, 2025 at 7:27 AM Štefan Miklošovič  wrote:


Recently, David Capwell was commenting on constraints in one of Slack threads (1) in dev channel and he suggested that the current form of "not null" constraint we have right now in place, e.g like this

create table ks.tb (id int primary key, val int check not_null(val));

could be instead of that form used like this:

create table ks.tb (id int primary key, val int check not null);

That is - without the name of a column in the constraint's argument. The reasoning behind that was that it is not only easier to read but there is also this concept in transactions (cep-15) where there is also "not null" used in some fashion and it would be
 nice if this was aligned so a user does not encounter two usages of "not null"-s which are written down differently, syntax-wise.

Could the usage of "not null" in transactions be confirmed?

This rather innocent suggestion brought an idea to us that constraints could be quite simplified when it comes to their syntax, consider this:

val int check not_null(val)
val text check json(val)
val text check lenght(val) < 1000

to be used like this:

val int check not null
val text check json
val text check length() < 1000

more involved checks like this:

val text check not_null(val) and json(val) and length(val) < 1000

might be just simplified to:

val text check not null and json and length() < 1000

It almost reads like plain English. Isn't this just easier for an eye?

The reason we kept the column names in constraint definitions is that, frankly speaking, we just did not know any better at the time it was about to be implemented. It is a little bit more tricky to be able to use it without column names because in Parser.g
 / Antlr we just bound the grammar around constraints to a column name directly there. When column names are not going to be there anymore, we need to bind it later in the code behind the parser in server code. It is doable, it was just about being a little
 bit more involved there.

Also, one reason to keep the name of a column was that we might specify different columns in a constraint from a column that is defined on to have cross-column constraints but we abandoned this idea altogether for other reasons which rendered the occurrence
 of a column name in a constraint definition redundant.

To have some overview of what would be possible to do with this proposal:

val3 text CHECK SOMECONSTRAINT('a');
val3 text CHECK JSON;
val3 text CHECK SOMECONSTRAINT('a') > 1;
val3 text CHECK SOMECONSTRAINT('a', 'b', 'c') > 1;
val3 text CHECK JSON AND LENGTH() < 600;
afternoon time CHECK a

Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Štefan Miklošovič
While modelling that, we followed how it is done in SQL world, PostgreSQL
as well as MySQL both use CHECK.

https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS
https://dev.mysql.com/doc/refman/8.4/en/create-table-check-constraints.html

On Fri, Apr 11, 2025 at 10:43 AM Benedict  wrote:

> I would prefer require/expect/is over check
>
> On 11 Apr 2025, at 08:05, Štefan Miklošovič 
> wrote:
>
> 
> Yes, you will have it like that :) Thank you for this idea. Great example
> of cooperation over diverse domains.
>
> On Fri, Apr 11, 2025 at 12:29 AM David Capwell  wrote:
>
>> I am biased but I do prefer
>>
>> val3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024
>>
>> Here is a similar accord CQL
>>
>> BEGIN TRANSACTION
>>   LET a = (…);
>>   IF a IS NOT NULL
>>   AND a.b IS NOT NULL
>>   AND a.c IS NULL; THEN
>> — profit
>>   END IF
>> COMMIT TRANSACTION
>>
>> On Apr 10, 2025, at 8:46 AM, Yifan Cai  wrote:
>>
>> Re: reserved keywords, “check” is currently not, and I don’t think it
>> needs to be a reserved keyword with the proposal.
>>
>> --
>> *From:* C. Scott Andreas 
>> *Sent:* Thursday, April 10, 2025 7:59:35 AM
>> *To:* dev@cassandra.apache.org 
>> *Cc:* dev@cassandra.apache.org 
>> *Subject:* Re: Constraint's "not null" alignment with transactions and
>> their simplification
>>
>> If the proposal does not introduce “check” as a reserved keyword that
>> would require quoting in existing DDL/DML, this concern doesn’t apply and
>> the email below can be ignored. This might be the case if “CHECK NOT NULL”
>> is the full token introduced rather than “CHECK” separately from
>> constraints that are checked.
>>
>> If “check” is introduced as a standalone reserved keyword: my primary
>> feedback is on the introduction of reserved words in the CQL grammar that
>> may affect compatibility of existing schemas.
>>
>> In the Cassandra 3.x series, several new CQL reserved words were added
>> (more than necessary) and subsequently backed out, because it required
>> users to begin quoting schemas and introduced incompatibility between 3.x
>> and 4.x for queries and DDL that “just worked” before.
>>
>> The word “check” is used in many domains (test/evaluation engineering,
>> finance, business processes, etc) and is likely to be used in user schemas.
>> If the proposal introduces this as a reserved word that would require it to
>> be quoted if used in table or column names, this will create
>> incompatibility for existing user queries on upgrade.
>>
>> Otherwise, ignore me. :)
>>
>> Thanks,
>>
>> – Scott
>>
>> –––
>> Mobile
>>
>> On Apr 10, 2025, at 7:47 AM, Jon Haddad  wrote:
>>
>> 
>> This looks like a really nice improvement to me.
>>
>>
>> On Thu, Apr 10, 2025 at 7:27 AM Štefan Miklošovič 
>> wrote:
>>
>> Recently, David Capwell was commenting on constraints in one of Slack
>> threads (1) in dev channel and he suggested that the current form of "not
>> null" constraint we have right now in place, e.g like this
>>
>> create table ks.tb (id int primary key, val int check not_null(val));
>>
>> could be instead of that form used like this:
>>
>> create table ks.tb (id int primary key, val int check not null);
>>
>> That is - without the name of a column in the constraint's argument. The
>> reasoning behind that was that it is not only easier to read but there is
>> also this concept in transactions (cep-15) where there is also "not null"
>> used in some fashion and it would be nice if this was aligned so a user
>> does not encounter two usages of "not null"-s which are written down
>> differently, syntax-wise.
>>
>> Could the usage of "not null" in transactions be confirmed?
>>
>> This rather innocent suggestion brought an idea to us that constraints
>> could be quite simplified when it comes to their syntax, consider this:
>>
>> val int check not_null(val)
>> val text check json(val)
>> val text check lenght(val) < 1000
>>
>> to be used like this:
>>
>> val int check not null
>> val text check json
>> val text check length() < 1000
>>
>> more involved checks like this:
>>
>> val text check not_null(val) and json(val) and length(val) < 1000
>>
>> might be just simplified to:
>>
>> val text check not null and json and length() < 1000
>>
>> It almost reads like plain English. Isn't this just easier for an eye?
>>
>> The reason we kept the column names in constraint definitions is that,
>> frankly speaking, we just did not know any better at the time it was about
>> to be implemented. It is a little bit more tricky to be able to use it
>> without column names because in Parser.g / Antlr we just bound the grammar
>> around constraints to a column name directly there. When column names are
>> not going to be there anymore, we need to bind it later in the code behind
>> the parser in server code. It is doable, it was just about being a little
>> bit more involved there.
>>
>> Also, one reason to keep the name of a column was that we might specify

Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Benedict
We have taken a different approach though, as we do not actually take a predicate on the RHS and do not supply the column name. In our examples we had eg CHECK JSON, which doesn’t parse unambiguously to a human. The equivalent to Postgres would seem to be CHECK is_json(field).I’m all for following an existing example, but once we decide to diverge the justification is gone and we should decide holistically what we think is best. So if we want to elide the column entirely and have a list of built in restrictions, I’d prefer eg REQUIRE JSON since this parses unambiguously to a human, whereas if we want to follow Postgres let’s do that but do it but that means eg CHECK is_json(field).On 11 Apr 2025, at 10:57, Štefan Miklošovič  wrote:While modelling that, we followed how it is done in SQL world, PostgreSQL as well as MySQL both use CHECK.https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTShttps://dev.mysql.com/doc/refman/8.4/en/create-table-check-constraints.htmlOn Fri, Apr 11, 2025 at 10:43 AM Benedict  wrote:I would prefer require/expect/is over checkOn 11 Apr 2025, at 08:05, Štefan Miklošovič  wrote:Yes, you will have it like that :) Thank you for this idea. Great example of cooperation over diverse domains.On Fri, Apr 11, 2025 at 12:29 AM David Capwell  wrote:I am biased but I do preferval3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024Here is a similar accord CQLBEGIN TRANSACTION  LET a = (…);  IF a IS NOT NULL       AND a.b IS NOT NULL       AND a.c IS NULL; THEN    — profit  END IFCOMMIT TRANSACTIONOn Apr 10, 2025, at 8:46 AM, Yifan Cai  wrote:





Re: reserved keywords, “check” is currently not, and I don’t think it needs to be a reserved keyword with the proposal.







From: C. Scott Andreas 
Sent: Thursday, April 10, 2025 7:59:35 AM
To: dev@cassandra.apache.org 
Cc: dev@cassandra.apache.org 
Subject: Re: Constraint's "not null" alignment with transactions and their simplification
 



If the proposal does not introduce “check” as a reserved keyword that would require quoting in existing DDL/DML, this concern doesn’t apply and the email below can be ignored. This might be the case if “CHECK NOT NULL” is the full token introduced rather
 than “CHECK” separately from constraints that are checked.


If “check” is introduced as a standalone reserved keyword: my primary feedback is on the introduction of reserved words in the CQL grammar that may affect compatibility of existing schemas.



In the Cassandra 3.x series, several new CQL reserved words were added (more than necessary) and subsequently backed out, because it required users to begin quoting schemas and introduced incompatibility between 3.x and 4.x for queries and DDL that “just
 worked” before.


The word “check” is used in many domains (test/evaluation engineering, finance, business processes, etc) and is likely to be used in user schemas. If the proposal introduces this as a reserved word that would require it to be quoted if used in table or
 column names, this will create incompatibility for existing user queries on upgrade.


Otherwise, ignore me. :)


Thanks,


– Scott




–––
Mobile


On Apr 10, 2025, at 7:47 AM, Jon Haddad  wrote:





This looks like a really nice improvement to me. 




On Thu, Apr 10, 2025 at 7:27 AM Štefan Miklošovič  wrote:


Recently, David Capwell was commenting on constraints in one of Slack threads (1) in dev channel and he suggested that the current form of "not null" constraint we have right now in place, e.g like this

create table ks.tb (id int primary key, val int check not_null(val));

could be instead of that form used like this:

create table ks.tb (id int primary key, val int check not null);

That is - without the name of a column in the constraint's argument. The reasoning behind that was that it is not only easier to read but there is also this concept in transactions (cep-15) where there is also "not null" used in some fashion and it would be
 nice if this was aligned so a user does not encounter two usages of "not null"-s which are written down differently, syntax-wise.

Could the usage of "not null" in transactions be confirmed?

This rather innocent suggestion brought an idea to us that constraints could be quite simplified when it comes to their syntax, consider this:

val int check not_null(val)
val text check json(val)
val text check lenght(val) < 1000

to be used like this:

val int check not null
val text check json
val text check length() < 1000

more involved checks like this:

val text check not_null(val) and json(val) and length(val) < 1000

might be just simplified to:

val text check not null and json and length() < 1000

It almost reads like plain English. Isn't this just easier for an eye?

The reason we kept the column names 

Re: [DISCUSS] How we version our releases

2025-04-11 Thread Jeremiah Jordan
 +1 from me.
No more wondering what the next version number will be.
No more wondering what version I can upgrade from to use the new release.

-Jeremiah

On Apr 10, 2025 at 3:54:13 PM, Josh McKenzie  wrote:

> This came up in the thread from Jon on "5.1 should be 6.0".
>
> I think it's important that our release versioning is clear and simple.
> The current status quo of:
> - Any .MINOR to next MAJOR is supported
> - Any .MAJOR to next MAJOR is supported
> - We reserve .MAJOR for API breaking changes
> - except for when we get excited about a feature and want to .MAJOR to
> signal that
> - or we change JDK's and need to signal that
> - or any of another slew of caveats that require digging into NEWS.txt
> to see what the hell we're up to. :D
> - And all of our CI pain that ensues from the above
>
> In my opinion the above is overly complex and could use simplification. I
> also believe us re-litigating this on every release is a waste of time and
> energy that could better be spent elsewhere on the project or in life. It's
> also a signal about how confusing our release versioning has been for the
> community.
>
> Let's leave aside the decision about whether we scope releases based on
> time or based on features; let's keep this to the discussion about how we
> version our releases.
>
> So here's what I'm thinking: a new release strategy that doesn't use
> .MINOR of semver. Goals:
> - Simplify versioning for end users
> - Provide clearer contracts for users as to what they can expect in
> releases
> - Simplify support for us (CI, merges, etc)
> - Clarify our public API deprecation process
>
> Structure / heuristic:
> - Online upgrades are supported for all GA supported releases at time of
> new .MAJOR
> - T-1 releases are guaranteed API compatible
> - We use a deprecate-then-remove strategy for API breaking changes
>
> This would translate into the following for our upcoming releases
> (assuming we stick with 3 supported majors at any given time):
> 6.0:
> - 5.0, 4.1, 4.0 online upgrades are supported (grandfather window)
> - We drop support for 4.0
> - API compatibility is guaranteed w/5.0
> 7.0:
> - 6.0, 5.0, 4.1 online upgrades are supported (grandfather window)
> - We drop support for 4.1
> - API compatibility is guaranteed w/6.0
> 8.0:
> - 7.0, 6.0, 5.0 online upgrades are supported (fully on new paradigm)
> - We drop support for 5.0
> - API compatibility guaranteed w/7.0
>
> So: what do we think?
>


Re: Project hygiene on old PRs

2025-04-11 Thread Josh McKenzie
+1 from me.

My intuition is that this is a logical consequence of us not using github to 
merge PR's so they don't auto-close. Which seems like it's a logical 
consequence of us using merge commits instead of per-branch commits of patches.

The band-aid of at least having a human-in-the-loop to close out old inactive 
things is better than the status quo; the information is all still available in 
github but the status of the PR's will communicate different things.

On Thu, Apr 10, 2025, at 7:14 PM, Bernardo Botella wrote:
> Hi everyone!
> 
> First of all, this may have come out before, and I understand it is really 
> hard to keep a tidy house with so many different collaborations. But, I can't 
> help the feeling that coming to the main Apache Cassandra repository and 
> seeing more than 600 open PRs, some of them without activity for 5+ years, 
> gives the wrong impression about the love and care that we all share for this 
> code base. I think we can find an easy to follow agreement to try and keep 
> things a bit tidier. I wanted to propose some kind of "rule" that allow us to 
> directly close PRs that haven't had activity in a reasonable and conservative 
> amount of time of, let's say, 6 months? I want to reiterate that I mean no 
> activity at all for six months from the PR author. I understand that complex 
> PRs can be opened for longer than that period, and that's perfectly fine.
> 
> What do you all think?
> 
> Bernardo


Re: [DISCUSS] How we version our releases

2025-04-11 Thread Josh McKenzie
> So we avoid 6.1, 7.2, etc?  Does this imply that each release is allowed to 
> make breaking changes (assuming they followed the “correct” deprecation 
> process)? 
Yes and no.

A release can't make a breaking change *relative to the immediately preceding 
release*, if something has been deprecated. 

A release *can* make a breaking change *from another actively supported 
release* if it's not an adjacent release and the feature was signaled as 
deprecated in the interim release.

On Fri, Apr 11, 2025, at 10:39 AM, Jon Haddad wrote:
> +1.
> 
> It's the proper signal to the community.  A .1 release could still be done as 
> an exception, but I have a hard time thinking of a case other than supporting 
> a newer JDK without any other changes. 
> 
> On Fri, Apr 11, 2025 at 7:19 AM Jeremiah Jordan  wrote:
>> +1 from me.
>> No more wondering what the next version number will be.
>> No more wondering what version I can upgrade from to use the new release.
>> 
>> -Jeremiah
>> 
>> On Apr 10, 2025 at 3:54:13 PM, Josh McKenzie  wrote:
>>> 
>>> This came up in the thread from Jon on "5.1 should be 6.0".
>>> 
>>> I think it's important that our release versioning is clear and simple. The 
>>> current status quo of:
>>> - Any .MINOR to next MAJOR is supported  
>>> - Any .MAJOR to next MAJOR is supported  
>>> - We reserve .MAJOR for API breaking changes
>>> - except for when we get excited about a feature and want to .MAJOR to 
>>> signal that
>>> - or we change JDK's and need to signal that
>>> - or any of another slew of caveats that require digging into NEWS.txt 
>>> to see what the hell we're up to. :D
>>> - And all of our CI pain that ensues from the above
>>> 
>>> In my opinion the above is overly complex and could use simplification. I 
>>> also believe us re-litigating this on every release is a waste of time and 
>>> energy that could better be spent elsewhere on the project or in life. It's 
>>> also a signal about how confusing our release versioning has been for the 
>>> community.
>>> 
>>> Let's leave aside the decision about whether we scope releases based on 
>>> time or based on features; let's keep this to the discussion about how we 
>>> version our releases.
>>> 
>>> So here's what I'm thinking: a new release strategy that doesn't use .MINOR 
>>> of semver. Goals:
>>> - Simplify versioning for end users
>>> - Provide clearer contracts for users as to what they can expect in releases
>>> - Simplify support for us (CI, merges, etc)
>>> - Clarify our public API deprecation process
>>> 
>>> Structure / heuristic:
>>> - Online upgrades are supported for all GA supported releases at time of 
>>> new .MAJOR
>>> - T-1 releases are guaranteed API compatible
>>> - We use a deprecate-then-remove strategy for API breaking changes
>>> 
>>> This would translate into the following for our upcoming releases (assuming 
>>> we stick with 3 supported majors at any given time):
>>> 6.0:
>>> - 5.0, 4.1, 4.0 online upgrades are supported (grandfather window)
>>> - We drop support for 4.0
>>> - API compatibility is guaranteed w/5.0
>>> 7.0:
>>> - 6.0, 5.0, 4.1 online upgrades are supported (grandfather window)
>>> - We drop support for 4.1
>>> - API compatibility is guaranteed w/6.0
>>> 8.0:
>>> - 7.0, 6.0, 5.0 online upgrades are supported (fully on new paradigm)
>>> - We drop support for 5.0
>>> - API compatibility guaranteed w/7.0
>>> 
>>> So: what do we think?


Re: [DISCUSS] 5.1 should be 6.0

2025-04-11 Thread Josh McKenzie
> David makes a good point about making sure that we support 4.x to 6.0 
> upgrades.
Supporting live upgrades from every GA supported version 
 today seems obvious as a lazy 
consensus to me. Given how confusing our release versioning has been it's worth 
explicitly calling that out.


On Fri, Apr 11, 2025, at 10:33 AM, Aaron wrote:
> +1 to 6.0
> 
> And David makes a good point about making sure that we support 4.x to 6.0 
> upgrades.
> 
> Thanks,
> 
> Aaron
> 
> On Fri, Apr 11, 2025 at 1:03 AM guo Maxwell  wrote:
>> +1 to 6.0 
>> 
>> Berenguer Blasi  于2025年4月11日周五 13:53写道:
>>> __
>>> +1 6.0
>>> 
>>> On 10/4/25 23:57, David Capwell wrote:
 +1 to 6.0
 Strong +1 to T-3, we should support 4.0/4.1 to 6.0 upgrades.
 
> On Apr 10, 2025, at 2:18 PM, C. Scott Andreas  
> wrote:
> 
> +1 6.0
> 
> - Scott
> 
> —
> Mobile
> 
>> On Apr 10, 2025, at 1:34 PM, Jeremy Hanna  
>> wrote:
>>  +1 for 6.0 for TCM/Accord changes, making it easier to make a case to 
>> upgrade dependencies like the Java/Python versions.
>> 
>>> On Apr 10, 2025, at 3:24 PM, Bernardo Botella 
>>>  wrote:
>>> 
>>> +1 on 6.0
>>> 
 On Apr 10, 2025, at 1:07 PM, Josh McKenzie  
 wrote:
 
 Let's keep this thread to just +1's on 6.0; I'll see about a proper 
 isolated [DISCUSS] thread for my proposal above hopefully tomorrow, 
 schedule permitting.
 
 On Thu, Apr 10, 2025, at 3:46 PM, Jeremiah Jordan wrote:
> +1 to 6.0
> 
> On Thu, Apr 10, 2025 at 1:38 PM Josh McKenzie  
> wrote:
>> 
>> +1 to 6.0.
>> 
>> On Thu, Apr 10, 2025, at 2:28 PM, Jon Haddad wrote:
>>> Bringing this back up.
>>> 
>>> I don't think we have any reason to hold up renaming the version.  
>>> We can have a separate discussion about what upgrade paths are 
>>> supported, but let's at least address this one issue of version 
>>> number so we can have consistent messaging.  When i talk to people 
>>> about the next release, I'd like to be consistent with what I call 
>>> it, and have a unified voice as a project.
>>> 
>>> Jon
>>> 
>>> On Thu, Jan 30, 2025 at 1:41 AM Mick Semb Wever  
>>> wrote:
 .

>> If you mean only 4.1 and 5.0 would be online upgrade targets, I 
>> would suggest we change that to T-3 so you encompass all 
>> “currently supported” releases at the time the new branch is 
>> GAed.
> I think that's better actually, yeah. I was originally thinking 
> T-2 from the "what calendar time frame is reasonable" 
> perspective, but saying "if you're on a currently supported 
> branch you can upgrade to a release that comes out" makes clean 
> intuitive sense. That'd mean:
> 
> 6.0: 5.0, 4.1, 4.0 online upgrades supported. Drop support for 
> 4.0. API compatible guaranteed w/5.0.
> 7.0: 6.0, 5.0, 4.1 online upgrades supported. Drop support for 
> 4.1. API compatible guaranteed w/6.0.
> 8.0: 7.0, 6.0, 5.0 online upgrades supported. Drop support for 
> 5.0. API compatible guaranteed w/7.0.
> 
 
 
 
 I like this.


Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Benedict
Sure, and I won’t push hard on this but I’m not sure why we would copy without actually copying.For comparison, oracle and sql server also have the CHECK keyword for predicates involving the column, but permit other constraints, eg NOT NULL or UNIQUE without first writing CHECK.So, to me, CHECK is consistently used across industry as a predicate _expression_ on the field where the field is specified in the predicate. If we aren’t doing this, we’d be better using our own keyword to distinguish our approach, and having built in constraints like NOT NULL appears to be standard industry practice.But, I don’t mind terribly. Just want to push a little on the decisions here before we cement them.On 11 Apr 2025, at 16:33, Bernardo Botella  wrote:Benedict:An alternative for that, keeping the CHECK word, would be to change the constraint name to IS_JSON. CHECK IS_JSON would read as you intend without the need to jump to REQUIRE. I think that’s true for the rest of provided constraints as well.BernardoOn Apr 11, 2025, at 6:02 AM, Benedict  wrote:We have taken a different approach though, as we do not actually take a predicate on the RHS and do not supply the column name. In our examples we had eg CHECK JSON, which doesn’t parse unambiguously to a human. The equivalent to Postgres would seem to be CHECK is_json(field).I’m all for following an existing example, but once we decide to diverge the justification is gone and we should decide holistically what we think is best. So if we want to elide the column entirely and have a list of built in restrictions, I’d prefer eg REQUIRE JSON since this parses unambiguously to a human, whereas if we want to follow Postgres let’s do that but do it but that means eg CHECK is_json(field).On 11 Apr 2025, at 10:57, Štefan Miklošovič  wrote:While modelling that, we followed how it is done in SQL world, PostgreSQL as well as MySQL both use CHECK.https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTShttps://dev.mysql.com/doc/refman/8.4/en/create-table-check-constraints.htmlOn Fri, Apr 11, 2025 at 10:43 AM Benedict  wrote:I would prefer require/expect/is over checkOn 11 Apr 2025, at 08:05, Štefan Miklošovič  wrote:Yes, you will have it like that :) Thank you for this idea. Great example of cooperation over diverse domains.On Fri, Apr 11, 2025 at 12:29 AM David Capwell  wrote:I am biased but I do preferval3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024Here is a similar accord CQLBEGIN TRANSACTION  LET a = (…);  IF a IS NOT NULL       AND a.b IS NOT NULL       AND a.c IS NULL; THEN    — profit  END IFCOMMIT TRANSACTIONOn Apr 10, 2025, at 8:46 AM, Yifan Cai  wrote:





Re: reserved keywords, “check” is currently not, and I don’t think it needs to be a reserved keyword with the proposal.







From: C. Scott Andreas 
Sent: Thursday, April 10, 2025 7:59:35 AM
To: dev@cassandra.apache.org 
Cc: dev@cassandra.apache.org 
Subject: Re: Constraint's "not null" alignment with transactions and their simplification
 



If the proposal does not introduce “check” as a reserved keyword that would require quoting in existing DDL/DML, this concern doesn’t apply and the email below can be ignored. This might be the case if “CHECK NOT NULL” is the full token introduced rather
 than “CHECK” separately from constraints that are checked.


If “check” is introduced as a standalone reserved keyword: my primary feedback is on the introduction of reserved words in the CQL grammar that may affect compatibility of existing schemas.



In the Cassandra 3.x series, several new CQL reserved words were added (more than necessary) and subsequently backed out, because it required users to begin quoting schemas and introduced incompatibility between 3.x and 4.x for queries and DDL that “just
 worked” before.


The word “check” is used in many domains (test/evaluation engineering, finance, business processes, etc) and is likely to be used in user schemas. If the proposal introduces this as a reserved word that would require it to be quoted if used in table or
 column names, this will create incompatibility for existing user queries on upgrade.


Otherwise, ignore me. :)


Thanks,


– Scott




–––
Mobile


On Apr 10, 2025, at 7:47 AM, Jon Haddad  wrote:





This looks like a really nice improvement to me. 




On Thu, Apr 10, 2025 at 7:27 AM Štefan Miklošovič  wrote:


Recently, David Capwell was commenting on constraints in one of Slack threads (1) in dev channel and he suggested that the current form of "not null" constraint we have right now in place, e.g like this

create table ks.tb (id int primary key, val int check not_null(val));

could be instead of that form used like this:

create table ks.tb (id int primary key, val int check not null);

That is - without the

Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Bernardo Botella
Benedict:

An alternative for that, keeping the CHECK word, would be to change the 
constraint name to IS_JSON. CHECK IS_JSON would read as you intend without the 
need to jump to REQUIRE. I think that’s true for the rest of provided 
constraints as well.

Bernardo


> On Apr 11, 2025, at 6:02 AM, Benedict  wrote:
> 
> We have taken a different approach though, as we do not actually take a 
> predicate on the RHS and do not supply the column name. In our examples we 
> had eg CHECK JSON, which doesn’t parse unambiguously to a human. The 
> equivalent to Postgres would seem to be CHECK is_json(field).
> 
> I’m all for following an existing example, but once we decide to diverge the 
> justification is gone and we should decide holistically what we think is 
> best. So if we want to elide the column entirely and have a list of built in 
> restrictions, I’d prefer eg REQUIRE JSON since this parses unambiguously to a 
> human, whereas if we want to follow Postgres let’s do that but do it but that 
> means eg CHECK is_json(field).
> 
>> On 11 Apr 2025, at 10:57, Štefan Miklošovič  wrote:
>> 
>> 
>> While modelling that, we followed how it is done in SQL world, PostgreSQL as 
>> well as MySQL both use CHECK.
>> 
>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS
>> https://dev.mysql.com/doc/refman/8.4/en/create-table-check-constraints.html
>> 
>> On Fri, Apr 11, 2025 at 10:43 AM Benedict > > wrote:
>>> I would prefer require/expect/is over check
>>> 
 On 11 Apr 2025, at 08:05, Štefan Miklošovič >>> > wrote:
 
 
 Yes, you will have it like that :) Thank you for this idea. Great example 
 of cooperation over diverse domains.
 
 On Fri, Apr 11, 2025 at 12:29 AM David Capwell >>> > wrote:
> I am biased but I do prefer
> 
> val3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024
> 
> Here is a similar accord CQL
> 
> BEGIN TRANSACTION
>   LET a = (…);
>   IF a IS NOT NULL 
>   AND a.b IS NOT NULL 
>   AND a.c IS NULL; THEN
> — profit
>   END IF
> COMMIT TRANSACTION
> 
>> On Apr 10, 2025, at 8:46 AM, Yifan Cai > > wrote:
>> 
>> Re: reserved keywords, “check” is currently not, and I don’t think it 
>> needs to be a reserved keyword with the proposal.
>> 
>> From: C. Scott Andreas > >
>> Sent: Thursday, April 10, 2025 7:59:35 AM
>> To: dev@cassandra.apache.org  
>> mailto:dev@cassandra.apache.org>>
>> Cc: dev@cassandra.apache.org  
>> mailto:dev@cassandra.apache.org>>
>> Subject: Re: Constraint's "not null" alignment with transactions and 
>> their simplification
>>  
>> If the proposal does not introduce “check” as a reserved keyword that 
>> would require quoting in existing DDL/DML, this concern doesn’t apply 
>> and the email below can be ignored. This might be the case if “CHECK NOT 
>> NULL” is the full token introduced rather than “CHECK” separately from 
>> constraints that are checked.
>> 
>> If “check” is introduced as a standalone reserved keyword: my primary 
>> feedback is on the introduction of reserved words in the CQL grammar 
>> that may affect compatibility of existing schemas.
>> 
>> In the Cassandra 3.x series, several new CQL reserved words were added 
>> (more than necessary) and subsequently backed out, because it required 
>> users to begin quoting schemas and introduced incompatibility between 
>> 3.x and 4.x for queries and DDL that “just worked” before.
>> 
>> The word “check” is used in many domains (test/evaluation engineering, 
>> finance, business processes, etc) and is likely to be used in user 
>> schemas. If the proposal introduces this as a reserved word that would 
>> require it to be quoted if used in table or column names, this will 
>> create incompatibility for existing user queries on upgrade.
>> 
>> Otherwise, ignore me. :)
>> 
>> Thanks,
>> 
>> – Scott
>> 
>> –––
>> Mobile
>> 
>>> On Apr 10, 2025, at 7:47 AM, Jon Haddad >> > wrote:
>>> 
>>> 
>>> This looks like a really nice improvement to me. 
>>> 
>>> 
>>> On Thu, Apr 10, 2025 at 7:27 AM Štefan Miklošovič 
>>> mailto:smikloso...@apache.org>> wrote:
>>> Recently, David Capwell was commenting on constraints in one of Slack 
>>> threads (1) in dev channel and he suggested that the current form of 
>>> "not null" constraint we have right now in place, e.g like this
>>> 
>>> create table ks.tb (id int primary key, val int check not_null(val));
>>> 
>>> could be instead of that form used like this:
>>> 

Re: [DISCUSS] How we version our releases

2025-04-11 Thread Jon Haddad
+1.

It's the proper signal to the community.  A .1 release could still be done
as an exception, but I have a hard time thinking of a case other than
supporting a newer JDK without any other changes.

On Fri, Apr 11, 2025 at 7:19 AM Jeremiah Jordan  wrote:

> +1 from me.
> No more wondering what the next version number will be.
> No more wondering what version I can upgrade from to use the new release.
>
> -Jeremiah
>
> On Apr 10, 2025 at 3:54:13 PM, Josh McKenzie  wrote:
>
>> This came up in the thread from Jon on "5.1 should be 6.0".
>>
>> I think it's important that our release versioning is clear and simple.
>> The current status quo of:
>> - Any .MINOR to next MAJOR is supported
>> - Any .MAJOR to next MAJOR is supported
>> - We reserve .MAJOR for API breaking changes
>> - except for when we get excited about a feature and want to .MAJOR
>> to signal that
>> - or we change JDK's and need to signal that
>> - or any of another slew of caveats that require digging into
>> NEWS.txt to see what the hell we're up to. :D
>> - And all of our CI pain that ensues from the above
>>
>> In my opinion the above is overly complex and could use simplification. I
>> also believe us re-litigating this on every release is a waste of time and
>> energy that could better be spent elsewhere on the project or in life. It's
>> also a signal about how confusing our release versioning has been for the
>> community.
>>
>> Let's leave aside the decision about whether we scope releases based on
>> time or based on features; let's keep this to the discussion about how we
>> version our releases.
>>
>> So here's what I'm thinking: a new release strategy that doesn't use
>> .MINOR of semver. Goals:
>> - Simplify versioning for end users
>> - Provide clearer contracts for users as to what they can expect in
>> releases
>> - Simplify support for us (CI, merges, etc)
>> - Clarify our public API deprecation process
>>
>> Structure / heuristic:
>> - Online upgrades are supported for all GA supported releases at time of
>> new .MAJOR
>> - T-1 releases are guaranteed API compatible
>> - We use a deprecate-then-remove strategy for API breaking changes
>>
>> This would translate into the following for our upcoming releases
>> (assuming we stick with 3 supported majors at any given time):
>> 6.0:
>> - 5.0, 4.1, 4.0 online upgrades are supported (grandfather window)
>> - We drop support for 4.0
>> - API compatibility is guaranteed w/5.0
>> 7.0:
>> - 6.0, 5.0, 4.1 online upgrades are supported (grandfather window)
>> - We drop support for 4.1
>> - API compatibility is guaranteed w/6.0
>> 8.0:
>> - 7.0, 6.0, 5.0 online upgrades are supported (fully on new paradigm)
>> - We drop support for 5.0
>> - API compatibility guaranteed w/7.0
>>
>> So: what do we think?
>>
>


Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Štefan Miklošovič
I went through Postgres' docs in a quite elaborate manner and I do not see
any usage of "constraint functions" as we have them. Both MySQL and
Postgres seem to only provide simple predicates (using relational
operators) and "functions" as we have them (e.g. json / length) are not
supported. So in this regard we are providing more than they do. Where have
you taken "is_json(field)" form from? Not saying it's wrong, I am just
curious where this is coming from.

I do not like that we would have "NOT NULL" without "CHECK". I think we can
go a little bit our own way as we have the comfort of modelling this from
scratch. CQL is already different from SQL as is and I do not think that
trying to follow SQL _orthodoxly_ is absolutely necessary but at the same
time I find it easier and more welcoming for users coming to Cassandra for
the first time to have syntax which is as close as possible to what they
are used to.

I find having constraints starting with "CHECK" _every time_ consistent.
They do not need to think twice if "check" is going to be there or not. It
is there every time. I do not know why SQL did not do it the same way, most
probably because "NOT NULL" was the first being introduced and "CHECK"
followed afterwards and it was just too late to it consistent.



On Fri, Apr 11, 2025 at 5:33 PM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> Benedict:
>
> An alternative for that, keeping the CHECK word, would be to change the
> constraint name to IS_JSON. CHECK IS_JSON would read as you intend without
> the need to jump to REQUIRE. I think that’s true for the rest of provided
> constraints as well.
>
> Bernardo
>
>
> On Apr 11, 2025, at 6:02 AM, Benedict  wrote:
>
> We have taken a different approach though, as we do not actually take a
> predicate on the RHS and do not supply the column name. In our examples we
> had eg CHECK JSON, which doesn’t parse unambiguously to a human. The
> equivalent to Postgres would seem to be CHECK is_json(field).
>
> I’m all for following an existing example, but once we decide to diverge
> the justification is gone and we should decide holistically what we think
> is best. So if we want to elide the column entirely and have a list of
> built in restrictions, I’d prefer eg REQUIRE JSON since this parses
> unambiguously to a human, whereas if we want to follow Postgres let’s do
> that but do it but that means eg CHECK is_json(field).
>
> On 11 Apr 2025, at 10:57, Štefan Miklošovič 
> wrote:
>
> 
> While modelling that, we followed how it is done in SQL world, PostgreSQL
> as well as MySQL both use CHECK.
>
>
> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS
> https://dev.mysql.com/doc/refman/8.4/en/create-table-check-constraints.html
>
> On Fri, Apr 11, 2025 at 10:43 AM Benedict  wrote:
>
>> I would prefer require/expect/is over check
>>
>> On 11 Apr 2025, at 08:05, Štefan Miklošovič 
>> wrote:
>>
>> 
>> Yes, you will have it like that :) Thank you for this idea. Great example
>> of cooperation over diverse domains.
>>
>> On Fri, Apr 11, 2025 at 12:29 AM David Capwell 
>> wrote:
>>
>>> I am biased but I do prefer
>>>
>>> val3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024
>>>
>>> Here is a similar accord CQL
>>>
>>> BEGIN TRANSACTION
>>>   LET a = (…);
>>>   IF a IS NOT NULL
>>>   AND a.b IS NOT NULL
>>>   AND a.c IS NULL; THEN
>>> — profit
>>>   END IF
>>> COMMIT TRANSACTION
>>>
>>> On Apr 10, 2025, at 8:46 AM, Yifan Cai  wrote:
>>>
>>> Re: reserved keywords, “check” is currently not, and I don’t think it
>>> needs to be a reserved keyword with the proposal.
>>>
>>> --
>>> *From:* C. Scott Andreas 
>>> *Sent:* Thursday, April 10, 2025 7:59:35 AM
>>> *To:* dev@cassandra.apache.org 
>>> *Cc:* dev@cassandra.apache.org 
>>> *Subject:* Re: Constraint's "not null" alignment with transactions and
>>> their simplification
>>>
>>> If the proposal does not introduce “check” as a reserved keyword that
>>> would require quoting in existing DDL/DML, this concern doesn’t apply and
>>> the email below can be ignored. This might be the case if “CHECK NOT NULL”
>>> is the full token introduced rather than “CHECK” separately from
>>> constraints that are checked.
>>>
>>> If “check” is introduced as a standalone reserved keyword: my primary
>>> feedback is on the introduction of reserved words in the CQL grammar that
>>> may affect compatibility of existing schemas.
>>>
>>> In the Cassandra 3.x series, several new CQL reserved words were added
>>> (more than necessary) and subsequently backed out, because it required
>>> users to begin quoting schemas and introduced incompatibility between 3.x
>>> and 4.x for queries and DDL that “just worked” before.
>>>
>>> The word “check” is used in many domains (test/evaluation engineering,
>>> finance, business processes, etc) and is likely to be used in user schemas.
>>> If the proposal introduces this as a reserved word that would require it to
>>> be 

Re: [DISCUSS] 5.1 should be 6.0

2025-04-11 Thread Aaron
+1 to 6.0

And David makes a good point about making sure that we support 4.x to 6.0
upgrades.

Thanks,

Aaron

On Fri, Apr 11, 2025 at 1:03 AM guo Maxwell  wrote:

> +1 to 6.0
>
> Berenguer Blasi  于2025年4月11日周五 13:53写道:
>
>> +1 6.0
>> On 10/4/25 23:57, David Capwell wrote:
>>
>> +1 to 6.0
>> Strong +1 to T-3, we should support 4.0/4.1 to 6.0 upgrades.
>>
>> On Apr 10, 2025, at 2:18 PM, C. Scott Andreas 
>>  wrote:
>>
>> +1 6.0
>>
>> - Scott
>>
>> —
>> Mobile
>>
>> On Apr 10, 2025, at 1:34 PM, Jeremy Hanna 
>>  wrote:
>>
>>  +1 for 6.0 for TCM/Accord changes, making it easier to make a case to
>> upgrade dependencies like the Java/Python versions.
>>
>> On Apr 10, 2025, at 3:24 PM, Bernardo Botella
>>   wrote:
>>
>> +1 on 6.0
>>
>> On Apr 10, 2025, at 1:07 PM, Josh McKenzie 
>>  wrote:
>>
>> Let's keep this thread to just +1's on 6.0; I'll see about a proper
>> isolated [DISCUSS] thread for my proposal above hopefully tomorrow,
>> schedule permitting.
>>
>> On Thu, Apr 10, 2025, at 3:46 PM, Jeremiah Jordan wrote:
>>
>> +1 to 6.0
>>
>> On Thu, Apr 10, 2025 at 1:38 PM Josh McKenzie 
>> wrote:
>>
>>
>> +1 to 6.0.
>>
>> On Thu, Apr 10, 2025, at 2:28 PM, Jon Haddad wrote:
>>
>> Bringing this back up.
>>
>> I don't think we have any reason to hold up renaming the version.  We can
>> have a separate discussion about what upgrade paths are supported, but
>> let's at least address this one issue of version number so we can have
>> consistent messaging.  When i talk to people about the next release, I'd
>> like to be consistent with what I call it, and have a unified voice as a
>> project.
>>
>> Jon
>>
>> On Thu, Jan 30, 2025 at 1:41 AM Mick Semb Wever  wrote:
>>
>> .
>>
>>
>> If you mean only 4.1 and 5.0 would be online upgrade targets, I would
>> suggest we change that to T-3 so you encompass all “currently supported”
>> releases at the time the new branch is GAed.
>>
>> I think that's better actually, yeah. I was originally thinking T-2 from
>> the "what calendar time frame is reasonable" perspective, but saying "if
>> you're on a currently supported branch you can upgrade to a release that
>> comes out" makes clean intuitive sense. That'd mean:
>>
>> 6.0: 5.0, 4.1, 4.0 online upgrades supported. Drop support for 4.0. API
>> compatible guaranteed w/5.0.
>> 7.0: 6.0, 5.0, 4.1 online upgrades supported. Drop support for 4.1. API
>> compatible guaranteed w/6.0.
>> 8.0: 7.0, 6.0, 5.0 online upgrades supported. Drop support for 5.0. API
>> compatible guaranteed w/7.0.
>>
>>
>>
>>
>> I like this.
>>
>>
>>
>>
>>


Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Benedict
5.5. Constraintspostgresql.orgSee the note directly above section 5.5.2 - Postgres supports functions in the CHECK predicate and requires they are pure functions (ie always yield the same result).The point is, CHECK is consistently a predicate defined over a field _expression_. If we aren’t matching the feature, why are we matching the keyword?I also disagree that CHECK is a useful prefix to all constraints, and would prefer consistency with Postgres here that also uses plain NOT NULL for a NOT NULL constraint.On 11 Apr 2025, at 17:05, Štefan Miklošovič  wrote:I went through Postgres' docs in a quite elaborate manner and I do not see any usage of "constraint functions" as we have them. Both MySQL and Postgres seem to only provide simple predicates (using relational operators) and "functions" as we have them (e.g. json / length) are not supported. So in this regard we are providing more than they do. Where have you taken "is_json(field)" form from? Not saying it's wrong, I am just curious where this is coming from.I do not like that we would have "NOT NULL" without "CHECK". I think we can go a little bit our own way as we have the comfort of modelling this from scratch. CQL is already different from SQL as is and I do not think that trying to follow SQL _orthodoxly_ is absolutely necessary but at the same time I find it easier and more welcoming for users coming to Cassandra for the first time to have syntax which is as close as possible to what they are used to.I find having constraints starting with "CHECK" _every time_ consistent. They do not need to think twice if "check" is going to be there or not. It is there every time. I do not know why SQL did not do it the same way, most probably because "NOT NULL" was the first being introduced and "CHECK" followed afterwards and it was just too late to it consistent. On Fri, Apr 11, 2025 at 5:33 PM Bernardo Botella  wrote:Benedict:An alternative for that, keeping the CHECK word, would be to change the constraint name to IS_JSON. CHECK IS_JSON would read as you intend without the need to jump to REQUIRE. I think that’s true for the rest of provided constraints as well.BernardoOn Apr 11, 2025, at 6:02 AM, Benedict  wrote:We have taken a different approach though, as we do not actually take a predicate on the RHS and do not supply the column name. In our examples we had eg CHECK JSON, which doesn’t parse unambiguously to a human. The equivalent to Postgres would seem to be CHECK is_json(field).I’m all for following an existing example, but once we decide to diverge the justification is gone and we should decide holistically what we think is best. So if we want to elide the column entirely and have a list of built in restrictions, I’d prefer eg REQUIRE JSON since this parses unambiguously to a human, whereas if we want to follow Postgres let’s do that but do it but that means eg CHECK is_json(field).On 11 Apr 2025, at 10:57, Štefan Miklošovič  wrote:While modelling that, we followed how it is done in SQL world, PostgreSQL as well as MySQL both use CHECK.https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTShttps://dev.mysql.com/doc/refman/8.4/en/create-table-check-constraints.htmlOn Fri, Apr 11, 2025 at 10:43 AM Benedict  wrote:I would prefer require/expect/is over checkOn 11 Apr 2025, at 08:05, Štefan Miklošovič  wrote:Yes, you will have it like that :) Thank you for this idea. Great example of cooperation over diverse domains.On Fri, Apr 11, 2025 at 12:29 AM David Capwell  wrote:I am biased but I do preferval3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024Here is a similar accord CQLBEGIN TRANSACTION  LET a = (…);  IF a IS NOT NULL       AND a.b IS NOT NULL       AND a.c IS NULL; THEN    — profit  END IFCOMMIT TRANSACTIONOn Apr 10, 2025, at 8:46 AM, Yifan Cai  wrote:





Re: reserved keywords, “check” is currently not, and I don’t think it needs to be a reserved keyword with the proposal.







From: C. Scott Andreas 
Sent: Thursday, April 10, 2025 7:59:35 AM
To: dev@cassandra.apache.org 
Cc: dev@cassandra.apache.org 
Subject: Re: Constraint's "not null" alignment with transactions and their simplification
 



If the proposal does not introduce “check” as a reserved keyword that would require quoting in existing DDL/DML, this concern doesn’t apply and the email below can be ignored. This might be the case if “CHECK NOT NULL” is the full token introduced rather
 than “CHECK” separately from constraints that are checked.


If “check” is introduced as a standalone reserved keyword: my primary feedback is on the introduction of reserved words in the CQL grammar that may affect compatibility of existing schemas.



In the Cassandra 3.x series, several new CQL reserved words were added (more 

Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Jon Haddad
I agree NOT NULL is nicer than check not null.

Taking the JSON bit a step further, a dedicated JSON type is better than a
check, but that's probably out of scope.

For inspiration, I've worked with Postgre's JSONB type in the past, and it
was awesome.

https://www.postgresql.org/docs/current/functions-json.html

On Fri, Apr 11, 2025 at 9:15 AM Benedict  wrote:

> [image: elephant.png]
>
> 5.5. Constraints
> 
> postgresql.org
> 
>
> 
>
>
> See the note directly above section 5.5.2 - Postgres supports functions in
> the CHECK predicate and requires they are pure functions (ie always yield
> the same result).
>
> The point is, CHECK is consistently a predicate defined over a field
> expression. If we aren’t matching the feature, why are we matching the
> keyword?
>
> I also disagree that CHECK is a useful prefix to all constraints, and
> would prefer consistency with Postgres here that also uses plain NOT NULL
> for a NOT NULL constraint.
>
>
> On 11 Apr 2025, at 17:05, Štefan Miklošovič 
> wrote:
>
> 
> I went through Postgres' docs in a quite elaborate manner and I do not see
> any usage of "constraint functions" as we have them. Both MySQL and
> Postgres seem to only provide simple predicates (using relational
> operators) and "functions" as we have them (e.g. json / length) are not
> supported. So in this regard we are providing more than they do. Where have
> you taken "is_json(field)" form from? Not saying it's wrong, I am just
> curious where this is coming from.
>
> I do not like that we would have "NOT NULL" without "CHECK". I think we
> can go a little bit our own way as we have the comfort of modelling this
> from scratch. CQL is already different from SQL as is and I do not think
> that trying to follow SQL _orthodoxly_ is absolutely necessary but at the
> same time I find it easier and more welcoming for users coming to Cassandra
> for the first time to have syntax which is as close as possible to what
> they are used to.
>
> I find having constraints starting with "CHECK" _every time_ consistent.
> They do not need to think twice if "check" is going to be there or not. It
> is there every time. I do not know why SQL did not do it the same way, most
> probably because "NOT NULL" was the first being introduced and "CHECK"
> followed afterwards and it was just too late to it consistent.
>
>
>
> On Fri, Apr 11, 2025 at 5:33 PM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> Benedict:
>>
>> An alternative for that, keeping the CHECK word, would be to change the
>> constraint name to IS_JSON. CHECK IS_JSON would read as you intend without
>> the need to jump to REQUIRE. I think that’s true for the rest of provided
>> constraints as well.
>>
>> Bernardo
>>
>>
>> On Apr 11, 2025, at 6:02 AM, Benedict  wrote:
>>
>> We have taken a different approach though, as we do not actually take a
>> predicate on the RHS and do not supply the column name. In our examples we
>> had eg CHECK JSON, which doesn’t parse unambiguously to a human. The
>> equivalent to Postgres would seem to be CHECK is_json(field).
>>
>> I’m all for following an existing example, but once we decide to diverge
>> the justification is gone and we should decide holistically what we think
>> is best. So if we want to elide the column entirely and have a list of
>> built in restrictions, I’d prefer eg REQUIRE JSON since this parses
>> unambiguously to a human, whereas if we want to follow Postgres let’s do
>> that but do it but that means eg CHECK is_json(field).
>>
>> On 11 Apr 2025, at 10:57, Štefan Miklošovič 
>> wrote:
>>
>> 
>> While modelling that, we followed how it is done in SQL world, PostgreSQL
>> as well as MySQL both use CHECK.
>>
>>
>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS
>>
>> https://dev.mysql.com/doc/refman/8.4/en/create-table-check-constraints.html
>>
>> On Fri, Apr 11, 2025 at 10:43 AM Benedict  wrote:
>>
>>> I would prefer require/expect/is over check
>>>
>>> On 11 Apr 2025, at 08:05, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>> Yes, you will have it like that :) Thank you for this idea. Great
>>> example of cooperation over diverse domains.
>>>
>>> On Fri, Apr 11, 2025 at 12:29 AM David Capwell 
>>> wrote:
>>>
 I am biased but I do prefer

 val3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024

 Here is a similar accord CQL

 BEGIN TRANSACTION
   LET a = (…);
   IF a IS NOT NULL
   AND a.b IS NOT NULL
   AND a.c IS NULL; THEN
 — profit
   END IF
 COMMIT TRANSACTION

 On Apr 10, 2025, at 8:46 AM, Yifan Cai  wrote:

 Re: reserved keywords, “check” is currently not, and I don’t think it
>

Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Štefan Miklošovič
_If we aren’t matching the feature, why are we matching the keyword?_

??? Nobody said that if we do not use the exact same feature we have to
rename it.

On Fri, Apr 11, 2025 at 6:13 PM Benedict  wrote:

> [image: elephant.png]
>
> 5.5. Constraints
> 
> postgresql.org
> 
>
> 
>
>
> See the note directly above section 5.5.2 - Postgres supports functions in
> the CHECK predicate and requires they are pure functions (ie always yield
> the same result).
>
> The point is, CHECK is consistently a predicate defined over a field
> expression. If we aren’t matching the feature, why are we matching the
> keyword?
>
> I also disagree that CHECK is a useful prefix to all constraints, and
> would prefer consistency with Postgres here that also uses plain NOT NULL
> for a NOT NULL constraint.
>
>
> On 11 Apr 2025, at 17:05, Štefan Miklošovič 
> wrote:
>
> 
> I went through Postgres' docs in a quite elaborate manner and I do not see
> any usage of "constraint functions" as we have them. Both MySQL and
> Postgres seem to only provide simple predicates (using relational
> operators) and "functions" as we have them (e.g. json / length) are not
> supported. So in this regard we are providing more than they do. Where have
> you taken "is_json(field)" form from? Not saying it's wrong, I am just
> curious where this is coming from.
>
> I do not like that we would have "NOT NULL" without "CHECK". I think we
> can go a little bit our own way as we have the comfort of modelling this
> from scratch. CQL is already different from SQL as is and I do not think
> that trying to follow SQL _orthodoxly_ is absolutely necessary but at the
> same time I find it easier and more welcoming for users coming to Cassandra
> for the first time to have syntax which is as close as possible to what
> they are used to.
>
> I find having constraints starting with "CHECK" _every time_ consistent.
> They do not need to think twice if "check" is going to be there or not. It
> is there every time. I do not know why SQL did not do it the same way, most
> probably because "NOT NULL" was the first being introduced and "CHECK"
> followed afterwards and it was just too late to it consistent.
>
>
>
> On Fri, Apr 11, 2025 at 5:33 PM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> Benedict:
>>
>> An alternative for that, keeping the CHECK word, would be to change the
>> constraint name to IS_JSON. CHECK IS_JSON would read as you intend without
>> the need to jump to REQUIRE. I think that’s true for the rest of provided
>> constraints as well.
>>
>> Bernardo
>>
>>
>> On Apr 11, 2025, at 6:02 AM, Benedict  wrote:
>>
>> We have taken a different approach though, as we do not actually take a
>> predicate on the RHS and do not supply the column name. In our examples we
>> had eg CHECK JSON, which doesn’t parse unambiguously to a human. The
>> equivalent to Postgres would seem to be CHECK is_json(field).
>>
>> I’m all for following an existing example, but once we decide to diverge
>> the justification is gone and we should decide holistically what we think
>> is best. So if we want to elide the column entirely and have a list of
>> built in restrictions, I’d prefer eg REQUIRE JSON since this parses
>> unambiguously to a human, whereas if we want to follow Postgres let’s do
>> that but do it but that means eg CHECK is_json(field).
>>
>> On 11 Apr 2025, at 10:57, Štefan Miklošovič 
>> wrote:
>>
>> 
>> While modelling that, we followed how it is done in SQL world, PostgreSQL
>> as well as MySQL both use CHECK.
>>
>>
>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS
>>
>> https://dev.mysql.com/doc/refman/8.4/en/create-table-check-constraints.html
>>
>> On Fri, Apr 11, 2025 at 10:43 AM Benedict  wrote:
>>
>>> I would prefer require/expect/is over check
>>>
>>> On 11 Apr 2025, at 08:05, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>> Yes, you will have it like that :) Thank you for this idea. Great
>>> example of cooperation over diverse domains.
>>>
>>> On Fri, Apr 11, 2025 at 12:29 AM David Capwell 
>>> wrote:
>>>
 I am biased but I do prefer

 val3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024

 Here is a similar accord CQL

 BEGIN TRANSACTION
   LET a = (…);
   IF a IS NOT NULL
   AND a.b IS NOT NULL
   AND a.c IS NULL; THEN
 — profit
   END IF
 COMMIT TRANSACTION

 On Apr 10, 2025, at 8:46 AM, Yifan Cai  wrote:

 Re: reserved keywords, “check” is currently not, and I don’t think it
 needs to be a reserved keyword with the proposal.

 --
 *From:* C. Scott Andreas 
 *Sent:* Thursday, April 10, 2

Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Štefan Miklošovič
We can think about this. I don't mind supporting both. It needs a little
bit more love in parser I guess. Most probably doable, as syntactic sugar.

On Fri, Apr 11, 2025 at 6:25 PM Jon Haddad  wrote:

> I agree NOT NULL is nicer than check not null.
>
> Taking the JSON bit a step further, a dedicated JSON type is better than a
> check, but that's probably out of scope.
>
> For inspiration, I've worked with Postgre's JSONB type in the past, and it
> was awesome.
>
> https://www.postgresql.org/docs/current/functions-json.html
>
> On Fri, Apr 11, 2025 at 9:15 AM Benedict  wrote:
>
>> [image: elephant.png]
>>
>> 5.5. Constraints
>> 
>> postgresql.org
>> 
>>
>> 
>>
>>
>> See the note directly above section 5.5.2 - Postgres supports functions
>> in the CHECK predicate and requires they are pure functions (ie always
>> yield the same result).
>>
>> The point is, CHECK is consistently a predicate defined over a field
>> expression. If we aren’t matching the feature, why are we matching the
>> keyword?
>>
>> I also disagree that CHECK is a useful prefix to all constraints, and
>> would prefer consistency with Postgres here that also uses plain NOT NULL
>> for a NOT NULL constraint.
>>
>>
>> On 11 Apr 2025, at 17:05, Štefan Miklošovič 
>> wrote:
>>
>> 
>> I went through Postgres' docs in a quite elaborate manner and I do not
>> see any usage of "constraint functions" as we have them. Both MySQL and
>> Postgres seem to only provide simple predicates (using relational
>> operators) and "functions" as we have them (e.g. json / length) are not
>> supported. So in this regard we are providing more than they do. Where have
>> you taken "is_json(field)" form from? Not saying it's wrong, I am just
>> curious where this is coming from.
>>
>> I do not like that we would have "NOT NULL" without "CHECK". I think we
>> can go a little bit our own way as we have the comfort of modelling this
>> from scratch. CQL is already different from SQL as is and I do not think
>> that trying to follow SQL _orthodoxly_ is absolutely necessary but at the
>> same time I find it easier and more welcoming for users coming to Cassandra
>> for the first time to have syntax which is as close as possible to what
>> they are used to.
>>
>> I find having constraints starting with "CHECK" _every time_ consistent.
>> They do not need to think twice if "check" is going to be there or not. It
>> is there every time. I do not know why SQL did not do it the same way, most
>> probably because "NOT NULL" was the first being introduced and "CHECK"
>> followed afterwards and it was just too late to it consistent.
>>
>>
>>
>> On Fri, Apr 11, 2025 at 5:33 PM Bernardo Botella <
>> conta...@bernardobotella.com> wrote:
>>
>>> Benedict:
>>>
>>> An alternative for that, keeping the CHECK word, would be to change the
>>> constraint name to IS_JSON. CHECK IS_JSON would read as you intend without
>>> the need to jump to REQUIRE. I think that’s true for the rest of provided
>>> constraints as well.
>>>
>>> Bernardo
>>>
>>>
>>> On Apr 11, 2025, at 6:02 AM, Benedict  wrote:
>>>
>>> We have taken a different approach though, as we do not actually take a
>>> predicate on the RHS and do not supply the column name. In our examples we
>>> had eg CHECK JSON, which doesn’t parse unambiguously to a human. The
>>> equivalent to Postgres would seem to be CHECK is_json(field).
>>>
>>> I’m all for following an existing example, but once we decide to diverge
>>> the justification is gone and we should decide holistically what we think
>>> is best. So if we want to elide the column entirely and have a list of
>>> built in restrictions, I’d prefer eg REQUIRE JSON since this parses
>>> unambiguously to a human, whereas if we want to follow Postgres let’s do
>>> that but do it but that means eg CHECK is_json(field).
>>>
>>> On 11 Apr 2025, at 10:57, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>> While modelling that, we followed how it is done in SQL world,
>>> PostgreSQL as well as MySQL both use CHECK.
>>>
>>>
>>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS
>>>
>>> https://dev.mysql.com/doc/refman/8.4/en/create-table-check-constraints.html
>>>
>>> On Fri, Apr 11, 2025 at 10:43 AM Benedict  wrote:
>>>
 I would prefer require/expect/is over check

 On 11 Apr 2025, at 08:05, Štefan Miklošovič 
 wrote:

 
 Yes, you will have it like that :) Thank you for this idea. Great
 example of cooperation over diverse domains.

 On Fri, Apr 11, 2025 at 12:29 AM David Capwell 
 wrote:

> I am biased but I do prefer
>
> val3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024
>
> Here is a similar accord CQL
>

Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Štefan Miklošovič
Jon,

that dedicated JSON type, I was thinking about this literally yesterday. I
like the idea. There would be an implicit check on its value with json
constraint under the hood. But that also got me thinking "why just JSON"?
As if it was the only format out there ... Probably very widely used and
the usage of JSON type would be justified but at the same time it starts to
introduce this small "discrepancy" that we seem to favor one specific type
/ format over the others.

On Fri, Apr 11, 2025 at 6:25 PM Jon Haddad  wrote:

> I agree NOT NULL is nicer than check not null.
>
> Taking the JSON bit a step further, a dedicated JSON type is better than a
> check, but that's probably out of scope.
>
> For inspiration, I've worked with Postgre's JSONB type in the past, and it
> was awesome.
>
> https://www.postgresql.org/docs/current/functions-json.html
>
> On Fri, Apr 11, 2025 at 9:15 AM Benedict  wrote:
>
>> [image: elephant.png]
>>
>> 5.5. Constraints
>> 
>> postgresql.org
>> 
>>
>> 
>>
>>
>> See the note directly above section 5.5.2 - Postgres supports functions
>> in the CHECK predicate and requires they are pure functions (ie always
>> yield the same result).
>>
>> The point is, CHECK is consistently a predicate defined over a field
>> expression. If we aren’t matching the feature, why are we matching the
>> keyword?
>>
>> I also disagree that CHECK is a useful prefix to all constraints, and
>> would prefer consistency with Postgres here that also uses plain NOT NULL
>> for a NOT NULL constraint.
>>
>>
>> On 11 Apr 2025, at 17:05, Štefan Miklošovič 
>> wrote:
>>
>> 
>> I went through Postgres' docs in a quite elaborate manner and I do not
>> see any usage of "constraint functions" as we have them. Both MySQL and
>> Postgres seem to only provide simple predicates (using relational
>> operators) and "functions" as we have them (e.g. json / length) are not
>> supported. So in this regard we are providing more than they do. Where have
>> you taken "is_json(field)" form from? Not saying it's wrong, I am just
>> curious where this is coming from.
>>
>> I do not like that we would have "NOT NULL" without "CHECK". I think we
>> can go a little bit our own way as we have the comfort of modelling this
>> from scratch. CQL is already different from SQL as is and I do not think
>> that trying to follow SQL _orthodoxly_ is absolutely necessary but at the
>> same time I find it easier and more welcoming for users coming to Cassandra
>> for the first time to have syntax which is as close as possible to what
>> they are used to.
>>
>> I find having constraints starting with "CHECK" _every time_ consistent.
>> They do not need to think twice if "check" is going to be there or not. It
>> is there every time. I do not know why SQL did not do it the same way, most
>> probably because "NOT NULL" was the first being introduced and "CHECK"
>> followed afterwards and it was just too late to it consistent.
>>
>>
>>
>> On Fri, Apr 11, 2025 at 5:33 PM Bernardo Botella <
>> conta...@bernardobotella.com> wrote:
>>
>>> Benedict:
>>>
>>> An alternative for that, keeping the CHECK word, would be to change the
>>> constraint name to IS_JSON. CHECK IS_JSON would read as you intend without
>>> the need to jump to REQUIRE. I think that’s true for the rest of provided
>>> constraints as well.
>>>
>>> Bernardo
>>>
>>>
>>> On Apr 11, 2025, at 6:02 AM, Benedict  wrote:
>>>
>>> We have taken a different approach though, as we do not actually take a
>>> predicate on the RHS and do not supply the column name. In our examples we
>>> had eg CHECK JSON, which doesn’t parse unambiguously to a human. The
>>> equivalent to Postgres would seem to be CHECK is_json(field).
>>>
>>> I’m all for following an existing example, but once we decide to diverge
>>> the justification is gone and we should decide holistically what we think
>>> is best. So if we want to elide the column entirely and have a list of
>>> built in restrictions, I’d prefer eg REQUIRE JSON since this parses
>>> unambiguously to a human, whereas if we want to follow Postgres let’s do
>>> that but do it but that means eg CHECK is_json(field).
>>>
>>> On 11 Apr 2025, at 10:57, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>> While modelling that, we followed how it is done in SQL world,
>>> PostgreSQL as well as MySQL both use CHECK.
>>>
>>>
>>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS
>>>
>>> https://dev.mysql.com/doc/refman/8.4/en/create-table-check-constraints.html
>>>
>>> On Fri, Apr 11, 2025 at 10:43 AM Benedict  wrote:
>>>
 I would prefer require/expect/is over check

 On 11 Apr 2025, at 08:05, Štefan Miklošovič 
 wrote:

 
 Yes, 

Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Jon Haddad
Fair question.  I think the only reason to favor JSON over anything else is
it's popularity.

I think if a JSON type is simply a text type with validation, then adding a
simple validator for other formats is probably easy.  We've already got
YAML support in the codebase so adding yaml would be trival.

If you want to take it a step further (like postgres does), then you can
think of the JSON type as validation on top of an arbitrary object with any
number of nested fields.  If you transform the JSON into that object, you
can do queries on it (again, like PG), but you can also support other
structures like yaml, toml, whatever.  Because at the end of the day you're
only storing the binary optimized version of a data structure, and you can
go back and forth between the binary and the format, just like we already
can do with Jackson.

The true power here would being able to put indexes on the fields and have
all the path functionality for searches / updates / etc.

I never loved our existing JSON functionality, it never made sense to me.
This does.  It's all the power of collections + UDTs merged together.

I have no clue how hard it would be to build, but I know it would be
incredibly useful for end users.

Happy to discuss it further but probably best to move to another thread :)

Jon


On Fri, Apr 11, 2025 at 9:54 AM Štefan Miklošovič 
wrote:

> Jon,
>
> that dedicated JSON type, I was thinking about this literally yesterday. I
> like the idea. There would be an implicit check on its value with json
> constraint under the hood. But that also got me thinking "why just JSON"?
> As if it was the only format out there ... Probably very widely used and
> the usage of JSON type would be justified but at the same time it starts to
> introduce this small "discrepancy" that we seem to favor one specific type
> / format over the others.
>
> On Fri, Apr 11, 2025 at 6:25 PM Jon Haddad 
> wrote:
>
>> I agree NOT NULL is nicer than check not null.
>>
>> Taking the JSON bit a step further, a dedicated JSON type is better than
>> a check, but that's probably out of scope.
>>
>> For inspiration, I've worked with Postgre's JSONB type in the past, and
>> it was awesome.
>>
>> https://www.postgresql.org/docs/current/functions-json.html
>>
>> On Fri, Apr 11, 2025 at 9:15 AM Benedict  wrote:
>>
>>> [image: elephant.png]
>>>
>>> 5.5. Constraints
>>> 
>>> postgresql.org
>>> 
>>>
>>> 
>>>
>>>
>>> See the note directly above section 5.5.2 - Postgres supports functions
>>> in the CHECK predicate and requires they are pure functions (ie always
>>> yield the same result).
>>>
>>> The point is, CHECK is consistently a predicate defined over a field
>>> expression. If we aren’t matching the feature, why are we matching the
>>> keyword?
>>>
>>> I also disagree that CHECK is a useful prefix to all constraints, and
>>> would prefer consistency with Postgres here that also uses plain NOT NULL
>>> for a NOT NULL constraint.
>>>
>>>
>>> On 11 Apr 2025, at 17:05, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>> I went through Postgres' docs in a quite elaborate manner and I do not
>>> see any usage of "constraint functions" as we have them. Both MySQL and
>>> Postgres seem to only provide simple predicates (using relational
>>> operators) and "functions" as we have them (e.g. json / length) are not
>>> supported. So in this regard we are providing more than they do. Where have
>>> you taken "is_json(field)" form from? Not saying it's wrong, I am just
>>> curious where this is coming from.
>>>
>>> I do not like that we would have "NOT NULL" without "CHECK". I think we
>>> can go a little bit our own way as we have the comfort of modelling this
>>> from scratch. CQL is already different from SQL as is and I do not think
>>> that trying to follow SQL _orthodoxly_ is absolutely necessary but at the
>>> same time I find it easier and more welcoming for users coming to Cassandra
>>> for the first time to have syntax which is as close as possible to what
>>> they are used to.
>>>
>>> I find having constraints starting with "CHECK" _every time_ consistent.
>>> They do not need to think twice if "check" is going to be there or not. It
>>> is there every time. I do not know why SQL did not do it the same way, most
>>> probably because "NOT NULL" was the first being introduced and "CHECK"
>>> followed afterwards and it was just too late to it consistent.
>>>
>>>
>>>
>>> On Fri, Apr 11, 2025 at 5:33 PM Bernardo Botella <
>>> conta...@bernardobotella.com> wrote:
>>>
 Benedict:

 An alternative for that, keeping the CHECK word, would be to change the
 constraint name to IS_JSON. CHECK IS_JSON would read as you intend without
 the need to jump to REQUIRE. I thi

Re: Project hygiene on old PRs

2025-04-11 Thread Štefan Miklošovič
I have a small script which scans GH pull requests (their titles) and looks
into JIRA to see what is their status. When it is "resolved" it prints it
to the console. Then I go over the links of PRs and close them one by one.
This relies on the title of the PR to be in exact format (CASSANDRA-123 a
title of the ticket) and not bullet proof but I have not come up with
anything better so far.

On Fri, Apr 11, 2025 at 5:19 PM Josh McKenzie  wrote:

> +1 from me.
>
> My intuition is that this is a logical consequence of us not using github
> to merge PR's so they don't auto-close. Which seems like it's a logical
> consequence of us using merge commits instead of per-branch commits of
> patches.
>
> The band-aid of at least having a human-in-the-loop to close out old
> inactive things is better than the status quo; the information is all still
> available in github but the status of the PR's will communicate different
> things.
>
> On Thu, Apr 10, 2025, at 7:14 PM, Bernardo Botella wrote:
>
> Hi everyone!
>
> First of all, this may have come out before, and I understand it is really
> hard to keep a tidy house with so many different collaborations. But, I
> can't help the feeling that coming to the main Apache Cassandra repository
> and seeing more than 600 open PRs, some of them without activity for 5+
> years, gives the wrong impression about the love and care that we all share
> for this code base. I think we can find an easy to follow agreement to try
> and keep things a bit tidier. I wanted to propose some kind of "rule" that
> allow us to directly close PRs that haven't had activity in a reasonable
> and conservative amount of time of, let's say, 6 months? I want to
> reiterate that I mean no activity at all for six months from the PR author.
> I understand that complex PRs can be opened for longer than that period,
> and that's perfectly fine.
>
> What do you all think?
>
> Bernardo
>
>
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-04-11 Thread Jon Haddad
I've been thinking about this a bit more recently, and I think Joey's
suggestion about improving the yaml based disk configuration is a better
first step than what I wrote (table definition), for a couple reasons.

1. Attaching it to the schema means we need to have the disk configuration
as part of the schema as well, or we need to start enforcing homogeneous
disk setups.  We can't rely on the drive configs to be the same across the
cluster currently.
2. Changing the disk configuration is something we can only do as an
offline operation today.  Making it mutable opens up a world of complexity.
3. I'm not even sure how we'd define this at a configuration level in a way
that wouldn't be insanely confusing.  We currently don't allow people to
have multiple disk configs or allow for custom placement, so this would add
that complexity as well as the complexity of caching.

I also keep running up against my concern about treating object store as a
write back cache instead of write through.  "Tiering" data off has real
consequences for the user, the big one being data loss, especially with
regards to tombstones.  I think this is a pretty serious foot gun.  It's
the same problem we originally had with JBOD, where we could have
tombstones on one disk and the shadowed data on the other.  Losing one disk
results in data getting resurrected.  Anthony covered it in a blog post [1]
and I believe CASSANDRA-6696 was the JIRA that addressed the problem.
Introducing tiering would essentially bring this problem back.

I think we should update the proposal as Joey suggested, improving
data_file_locations to allow for the configuration of treating a local disk
as a cache and another disk or object store as the durable store.  Also
known as a writethrough cache.  It's defined clearly in the LVM docs:

> Writethrough ensures that any data written will be stored both in the
cache and on the origin LV.  The loss of a device associated with the cache
in this case would not mean the loss of any data.

We can discuss the other stuff as a follow up, but IMO we should first
focus on this use case, which has the fewest sharp edges.  Getting this in
would be a huge win on it's own and I think it's what the majority of users
would benefit from.  That makes it a pretty big project on it's own, and we
can work on the follow up CEPs to enhance this body of work in parallel
with the implementation of this CEP, if it's really desired.

Jon

[1]
https://thelastpickle.com/blog/2018/08/22/the-fine-print-when-using-multiple-data-directories.html
[2] https://issues.apache.org/jira/browse/CASSANDRA-6696
[3] https://www.man7.org/linux/man-pages/man7/lvmcache.7.html



On Sat, Mar 8, 2025 at 3:00 PM Jon Haddad  wrote:

> I really like the data directories and replication configuration.  I think
> it makes a ton of sense to put it in the yaml, but we should probably yaml
> it all, and not nest JSON :), and we can probably simplify it a little with
> a file uri scheme, something like this:
>
> data_file_locations:
>   disk:
> path: "file:///var/lib/cassandra/data"
>   object:
> path: "s3://..."
>
> You've opened up a couple fun options here with this as well.  I hadn't
> thought about 3 tiers of storage at all, but I could definitely see
>
> NVMe -> EBS -> Object Store
>
> as a valuable setup.
>
> Does spread == JBOD here?  Maybe we just call it jbod?  I can't help but
> bikeshed :)
>
> Jon
>
> On Sat, Mar 8, 2025 at 8:06 AM Joseph Lynch  wrote:
>
>> Jon, I like where you are headed with that, just brainstorming out what
>> the end interface might look like (might be getting a bit ahead of things
>> talking about directories if we don't even have files implemented yet).
>> What do folks think about pairing data_file_locations (fka
>> data_file_directories) with three table level tunables: replication
>> strategy ("spread", "tier"), the eviction strategy ("none", "lfu", "lru",
>> etc...) and the writeback duration (0s, 10s, 8d)? So your three examples
>>
>> data_file_locations:
>>   disk: {type: "filesystem", "path": "/var/lib/cassandra/data"}
>>   object: {type: "s3", "path": "s3://..."}
>>
>> data_file_eviction_strategies:
>>   none: {type: "NONE"}
>>   lfu: {type: "LFU", "min_retention": "7d"}
>>   hot-cold: {type: "LRU", "min_retention": "60d"}
>>
>> Then on tables to achieve your three proposals
>> WITH storage = {locations: ["disk", "object"], "replication": "tier"}
>> WITH storage = {locations: ["disk", "object"], "replication": "tier",
>> "eviction": ["lfu"], "writeback": ["10s"]}
>> WITH storage = {locations: ["disk", "object"], "replication": "tier",
>> "eviction": ["hot-cold"], "writeback": ["8d"]}
>>
>> We definitely wouldn't want to implement all of the eviction strategies
>> in the first cut - probably just the object file location (CEP 36 fwict)
>> and eviction strategy "none". Default eviction would be "none" if not
>> specified (throw errors on full storage), default writeback would be "0s".
>> I am thinking that the strategies 

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-04-11 Thread Jeff Jirsa



> On Apr 11, 2025, at 1:15 PM, Jon Haddad  wrote:
> 
> 
> I also keep running up against my concern about treating object store as a 
> write back cache instead of write through.  "Tiering" data off has real 
> consequences for the user, the big one being data loss, especially with 
> regards to tombstones.  I think this is a pretty serious foot gun.  It's the 
> same problem we originally had with JBOD, where we could have tombstones on 
> one disk and the shadowed data on the other.  Losing one disk results in data 
> getting resurrected.  Anthony covered it in a blog post [1] and I believe 
> CASSANDRA-6696 was the JIRA that addressed the problem.  Introducing tiering 
> would essentially bring this problem back.


If you lose one disk, you shoot the instance. We have to stop pretending you 
can have partial failures. That’s it. That’s the fix. You don’t get to lose 
part of a machine and pretend it’s still viable. Just like losing a a commit 
log segment or losing an object in a bucket, if you lose one object, you throw 
it away or you’ve resurrected data / violated consistency. 





Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-04-11 Thread Jon Haddad
Right, exactly.  Which (I think) makes the object store about as valuable
as an ephemeral disk if you don't keep everything on there.  It's a
tradeoff I'd never use given the cost / benefit.

Does that mean you agree that we should focus on writethrough cache mode
first?

Jon



On Fri, Apr 11, 2025 at 2:09 PM Jeff Jirsa  wrote:

>
>
> > On Apr 11, 2025, at 1:15 PM, Jon Haddad  wrote:
> >
> >
> > I also keep running up against my concern about treating object store as
> a write back cache instead of write through.  "Tiering" data off has real
> consequences for the user, the big one being data loss, especially with
> regards to tombstones.  I think this is a pretty serious foot gun.  It's
> the same problem we originally had with JBOD, where we could have
> tombstones on one disk and the shadowed data on the other.  Losing one disk
> results in data getting resurrected.  Anthony covered it in a blog post [1]
> and I believe CASSANDRA-6696 was the JIRA that addressed the problem.
> Introducing tiering would essentially bring this problem back.
>
>
> If you lose one disk, you shoot the instance. We have to stop pretending
> you can have partial failures. That’s it. That’s the fix. You don’t get to
> lose part of a machine and pretend it’s still viable. Just like losing a a
> commit log segment or losing an object in a bucket, if you lose one object,
> you throw it away or you’ve resurrected data / violated consistency.
>
>
>
>


Re: [DISCUSS] How we version our releases

2025-04-11 Thread Mick Semb Wever
On Thu, 10 Apr 2025 at 22:54, Josh McKenzie  wrote:

> …
> So here's what I'm thinking: a new release strategy that doesn't use
> .MINOR of semver. Goals:
> - Simplify versioning for end users
> - Provide clearer contracts for users as to what they can expect in
> releases
> - Simplify support for us (CI, merges, etc)
> - Clarify our public API deprecation process
>
> Structure / heuristic:
> - Online upgrades are supported for all GA supported releases at time of
> new .MAJOR
> - T-1 releases are guaranteed API compatible
> - We use a deprecate-then-remove strategy for API breaking changes
> …
> So: what do we think?
>


+1

David, yeah, we avoid .1 minor releases altogether.

IIUC this does not imply allowing breaking changes.  That ties into the
recent thread about aiming to maintain compatibility forever.  That after a
deprecation cycle, wanting to remove/break anything requires a discussion
and evaluation to the cost of keeping that legacy/deprecated code.

WRT jdks, Everytime we drop a jdk, we drop testing all upgrade paths from
versions where that was the highest jdk.  In 6.0 if we drop jdk11 then we
will stop testing upgrades from 4.x versions.  Our tests don't support it,
but we also need to at some point for the sake of keeping the test matrix
sane to our CI resources.

The previous versioning scheme meant we chose when to drop a jdk (or break
the upgrade supported paths).  The proposed versioning scheme means we have
to wait and align dropping jdks with the T-2 approach.  I think it's a
great idea that we internalise this cognitive load, making it simpler for
the user.


Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Štefan Miklošovič
Yes, you will have it like that :) Thank you for this idea. Great example
of cooperation over diverse domains.

On Fri, Apr 11, 2025 at 12:29 AM David Capwell  wrote:

> I am biased but I do prefer
>
> val3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024
>
> Here is a similar accord CQL
>
> BEGIN TRANSACTION
>   LET a = (…);
>   IF a IS NOT NULL
>   AND a.b IS NOT NULL
>   AND a.c IS NULL; THEN
> — profit
>   END IF
> COMMIT TRANSACTION
>
> On Apr 10, 2025, at 8:46 AM, Yifan Cai  wrote:
>
> Re: reserved keywords, “check” is currently not, and I don’t think it
> needs to be a reserved keyword with the proposal.
>
> --
> *From:* C. Scott Andreas 
> *Sent:* Thursday, April 10, 2025 7:59:35 AM
> *To:* dev@cassandra.apache.org 
> *Cc:* dev@cassandra.apache.org 
> *Subject:* Re: Constraint's "not null" alignment with transactions and
> their simplification
>
> If the proposal does not introduce “check” as a reserved keyword that
> would require quoting in existing DDL/DML, this concern doesn’t apply and
> the email below can be ignored. This might be the case if “CHECK NOT NULL”
> is the full token introduced rather than “CHECK” separately from
> constraints that are checked.
>
> If “check” is introduced as a standalone reserved keyword: my primary
> feedback is on the introduction of reserved words in the CQL grammar that
> may affect compatibility of existing schemas.
>
> In the Cassandra 3.x series, several new CQL reserved words were added
> (more than necessary) and subsequently backed out, because it required
> users to begin quoting schemas and introduced incompatibility between 3.x
> and 4.x for queries and DDL that “just worked” before.
>
> The word “check” is used in many domains (test/evaluation engineering,
> finance, business processes, etc) and is likely to be used in user schemas.
> If the proposal introduces this as a reserved word that would require it to
> be quoted if used in table or column names, this will create
> incompatibility for existing user queries on upgrade.
>
> Otherwise, ignore me. :)
>
> Thanks,
>
> – Scott
>
> –––
> Mobile
>
> On Apr 10, 2025, at 7:47 AM, Jon Haddad  wrote:
>
> 
> This looks like a really nice improvement to me.
>
>
> On Thu, Apr 10, 2025 at 7:27 AM Štefan Miklošovič 
> wrote:
>
> Recently, David Capwell was commenting on constraints in one of Slack
> threads (1) in dev channel and he suggested that the current form of "not
> null" constraint we have right now in place, e.g like this
>
> create table ks.tb (id int primary key, val int check not_null(val));
>
> could be instead of that form used like this:
>
> create table ks.tb (id int primary key, val int check not null);
>
> That is - without the name of a column in the constraint's argument. The
> reasoning behind that was that it is not only easier to read but there is
> also this concept in transactions (cep-15) where there is also "not null"
> used in some fashion and it would be nice if this was aligned so a user
> does not encounter two usages of "not null"-s which are written down
> differently, syntax-wise.
>
> Could the usage of "not null" in transactions be confirmed?
>
> This rather innocent suggestion brought an idea to us that constraints
> could be quite simplified when it comes to their syntax, consider this:
>
> val int check not_null(val)
> val text check json(val)
> val text check lenght(val) < 1000
>
> to be used like this:
>
> val int check not null
> val text check json
> val text check length() < 1000
>
> more involved checks like this:
>
> val text check not_null(val) and json(val) and length(val) < 1000
>
> might be just simplified to:
>
> val text check not null and json and length() < 1000
>
> It almost reads like plain English. Isn't this just easier for an eye?
>
> The reason we kept the column names in constraint definitions is that,
> frankly speaking, we just did not know any better at the time it was about
> to be implemented. It is a little bit more tricky to be able to use it
> without column names because in Parser.g / Antlr we just bound the grammar
> around constraints to a column name directly there. When column names are
> not going to be there anymore, we need to bind it later in the code behind
> the parser in server code. It is doable, it was just about being a little
> bit more involved there.
>
> Also, one reason to keep the name of a column was that we might specify
> different columns in a constraint from a column that is defined on to have
> cross-column constraints but we abandoned this idea altogether for other
> reasons which rendered the occurrence of a column name in a constraint
> definition redundant.
>
> To have some overview of what would be possible to do with this proposal:
>
> val3 text CHECK SOMECONSTRAINT('a');
> val3 text CHECK JSON;
> val3 text CHECK SOMECONSTRAINT('a') > 1;
> val3 text CHECK SOMECONSTRAINT('a', 'b', 'c') > 1;
> val3 text CHECK JSON AND LENGTH() < 600;
> afternoon time CHECK afternoon >= '