Re: 【DISCUSS】What is the current status of triggers in Cassandra ?

2025-02-10 Thread guo Maxwell
Then we don’t support trigger until CASSANDRA-20287 is fixed, and this rule
applies to custom index , too. right ?


Sam Tunnicliffe  于2025年2月5日周三 01:24写道:

> This is really a bug with the current implementation of
> CreateTriggerStatement and I've filed CASSANDRA-20287 to address it.
>
> Thanks,
> Sam
>
> > On 3 Feb 2025, at 21:06, Štefan Miklošovič 
> wrote:
> >
> > Correct, snapshotting is the way to go here via nodetool cms snapshot
> >
> > But, you see? One more "problem" ... I bet my boots that in the majority
> of cases this will be forgotten. Then we would need to put that JAR back
> just to boot the cluster for the sake of snapshotting it.
> >
> > On Mon, Feb 3, 2025 at 9:58 PM Abe Ratnofsky  wrote:
> > AFAIK the TCM replay issue you're describing (something is created and
> dropped during replay, fails if can't create) applies to custom types and a
> few other things, and one way around it is CMS snapshotting so replay
> doesn't start at epoch 0; it wouldn't be safe to remove the trigger from
> the classpath until the trigger drop epoch has been included in a snapshot:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-SnapshottingMetadataLog
> >
> > I also think it's reasonable to not include triggers (or other custom
> fields) in the clone, but if users need to sort out what parts of their
> original table were copied to their clone it's not as convenient to have a
> separate command for it.
>
>


Re: 【DISCUSS】What is the current status of triggers in Cassandra ?

2025-02-10 Thread guo Maxwell
Referring to the opinions of most people, we ignore the cloning of triggers
just for the CREATE TABLE LIKE feature.

guo Maxwell  于2025年2月10日周一 16:39写道:

> Then we don’t support trigger until CASSANDRA-20287 is fixed, and this
> rule applies to custom index , too. right ?
>
>
> Sam Tunnicliffe  于2025年2月5日周三 01:24写道:
>
>> This is really a bug with the current implementation of
>> CreateTriggerStatement and I've filed CASSANDRA-20287 to address it.
>>
>> Thanks,
>> Sam
>>
>> > On 3 Feb 2025, at 21:06, Štefan Miklošovič 
>> wrote:
>> >
>> > Correct, snapshotting is the way to go here via nodetool cms snapshot
>> >
>> > But, you see? One more "problem" ... I bet my boots that in the
>> majority of cases this will be forgotten. Then we would need to put that
>> JAR back just to boot the cluster for the sake of snapshotting it.
>> >
>> > On Mon, Feb 3, 2025 at 9:58 PM Abe Ratnofsky  wrote:
>> > AFAIK the TCM replay issue you're describing (something is created and
>> dropped during replay, fails if can't create) applies to custom types and a
>> few other things, and one way around it is CMS snapshotting so replay
>> doesn't start at epoch 0; it wouldn't be safe to remove the trigger from
>> the classpath until the trigger drop epoch has been included in a snapshot:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-SnapshottingMetadataLog
>> >
>> > I also think it's reasonable to not include triggers (or other custom
>> fields) in the clone, but if users need to sort out what parts of their
>> original table were copied to their clone it's not as convenient to have a
>> separate command for it.
>>
>>


Re: 【DISCUSS】What is the current status of triggers in Cassandra ?

2025-02-10 Thread guo Maxwell
Hi Stefan ,
Yes ,you are right, That is why I send the second email stating that we are
not going to support cloning triggers in CREATE TABLE LIKE .

Štefan Miklošovič  于2025年2月10日周一 18:12写道:

> Hi Maxwell,
>
> even the issues mentioned in CASSANDRA-20287 are fixed, I still do not
> think we should support the copying of triggers. Reasons I think we should
> not support that are:
>
> 1) According to this comment (1) Triggers API is considered beta and can
> change. Unless we promote trigger's API out of beta state, I do not think
> we should build anything on top of that and integrate it deeper with the
> rest of the codebase, as CASSANDRA-19965 patch would do.
>
> 2) What the fix in CASSANDRA-20287 would do is to prevent the cases when
> trigger implementation is not on the class path anymore and a node is
> restarted. For now, it would just fail to start. The fix would enable it to
> start even if it is not present. This is different from supporting copying
> triggers as such. The fact we just did not fail a node to start does not
> mean that we are embracing triggers as a whole.
>
> 3) Even if we supported copying of a trigger between tables, since its
> implementation is outside of our control, such a trigger might contain
> arbitrary logic, which might be applicable for the source table but its
> behavior would not be desirable for the other. The implementation of a
> trigger might be so that there might be logic specifically wired for the
> source table and when applied to a target table, it would stop behaving
> correctly. For example, if a trigger is processing Partition inside its
> "augment" method, and it is coded in such a way that it is specifically
> processing specific columns, what would happen if a target table altered
> its schema after we copy it (e.g. by adding or removing columns) while this
> trigger would be active? And so on, and so on ...
>
> (1)
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/triggers/ITrigger.java#L37-L39
>
> On Mon, Feb 10, 2025 at 10:22 AM guo Maxwell  wrote:
>
>> Referring to the opinions of most people, we ignore the cloning of
>> triggers just for the CREATE TABLE LIKE feature.
>>
>> guo Maxwell  于2025年2月10日周一 16:39写道:
>>
>>> Then we don’t support trigger until CASSANDRA-20287 is fixed, and this
>>> rule applies to custom index , too. right ?
>>>
>>>
>>> Sam Tunnicliffe  于2025年2月5日周三 01:24写道:
>>>
 This is really a bug with the current implementation of
 CreateTriggerStatement and I've filed CASSANDRA-20287 to address it.

 Thanks,
 Sam

 > On 3 Feb 2025, at 21:06, Štefan Miklošovič 
 wrote:
 >
 > Correct, snapshotting is the way to go here via nodetool cms snapshot
 >
 > But, you see? One more "problem" ... I bet my boots that in the
 majority of cases this will be forgotten. Then we would need to put that
 JAR back just to boot the cluster for the sake of snapshotting it.
 >
 > On Mon, Feb 3, 2025 at 9:58 PM Abe Ratnofsky  wrote:
 > AFAIK the TCM replay issue you're describing (something is created
 and dropped during replay, fails if can't create) applies to custom types
 and a few other things, and one way around it is CMS snapshotting so replay
 doesn't start at epoch 0; it wouldn't be safe to remove the trigger from
 the classpath until the trigger drop epoch has been included in a snapshot:
 https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-SnapshottingMetadataLog
 >
 > I also think it's reasonable to not include triggers (or other custom
 fields) in the clone, but if users need to sort out what parts of their
 original table were copied to their clone it's not as convenient to have a
 separate command for it.




Re: 【DISCUSS】What is the current status of triggers in Cassandra ?

2025-02-10 Thread Štefan Miklošovič
Hi Maxwell,

even the issues mentioned in CASSANDRA-20287 are fixed, I still do not
think we should support the copying of triggers. Reasons I think we should
not support that are:

1) According to this comment (1) Triggers API is considered beta and can
change. Unless we promote trigger's API out of beta state, I do not think
we should build anything on top of that and integrate it deeper with the
rest of the codebase, as CASSANDRA-19965 patch would do.

2) What the fix in CASSANDRA-20287 would do is to prevent the cases when
trigger implementation is not on the class path anymore and a node is
restarted. For now, it would just fail to start. The fix would enable it to
start even if it is not present. This is different from supporting copying
triggers as such. The fact we just did not fail a node to start does not
mean that we are embracing triggers as a whole.

3) Even if we supported copying of a trigger between tables, since its
implementation is outside of our control, such a trigger might contain
arbitrary logic, which might be applicable for the source table but its
behavior would not be desirable for the other. The implementation of a
trigger might be so that there might be logic specifically wired for the
source table and when applied to a target table, it would stop behaving
correctly. For example, if a trigger is processing Partition inside its
"augment" method, and it is coded in such a way that it is specifically
processing specific columns, what would happen if a target table altered
its schema after we copy it (e.g. by adding or removing columns) while this
trigger would be active? And so on, and so on ...

(1)
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/triggers/ITrigger.java#L37-L39

On Mon, Feb 10, 2025 at 10:22 AM guo Maxwell  wrote:

> Referring to the opinions of most people, we ignore the cloning of
> triggers just for the CREATE TABLE LIKE feature.
>
> guo Maxwell  于2025年2月10日周一 16:39写道:
>
>> Then we don’t support trigger until CASSANDRA-20287 is fixed, and this
>> rule applies to custom index , too. right ?
>>
>>
>> Sam Tunnicliffe  于2025年2月5日周三 01:24写道:
>>
>>> This is really a bug with the current implementation of
>>> CreateTriggerStatement and I've filed CASSANDRA-20287 to address it.
>>>
>>> Thanks,
>>> Sam
>>>
>>> > On 3 Feb 2025, at 21:06, Štefan Miklošovič 
>>> wrote:
>>> >
>>> > Correct, snapshotting is the way to go here via nodetool cms snapshot
>>> >
>>> > But, you see? One more "problem" ... I bet my boots that in the
>>> majority of cases this will be forgotten. Then we would need to put that
>>> JAR back just to boot the cluster for the sake of snapshotting it.
>>> >
>>> > On Mon, Feb 3, 2025 at 9:58 PM Abe Ratnofsky  wrote:
>>> > AFAIK the TCM replay issue you're describing (something is created and
>>> dropped during replay, fails if can't create) applies to custom types and a
>>> few other things, and one way around it is CMS snapshotting so replay
>>> doesn't start at epoch 0; it wouldn't be safe to remove the trigger from
>>> the classpath until the trigger drop epoch has been included in a snapshot:
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-SnapshottingMetadataLog
>>> >
>>> > I also think it's reasonable to not include triggers (or other custom
>>> fields) in the clone, but if users need to sort out what parts of their
>>> original table were copied to their clone it's not as convenient to have a
>>> separate command for it.
>>>
>>>


Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Bernardo Botella
Hi. These was a topic we discussed during the ML thread:
https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj

Here was one of my answers on that:
https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv

It was also specified in the CEP 
(https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime):
"Note: This constraints are only enforced at write time. So, an ALTER 
CONSTRAINT with more restrictive constraints shouldn’t affect preexisting data.”

Long story short, constraints are only checked at write time. If a constraint 
is added to a table with preexisting offending data, that data stays untouched.

I hope this helps,
Bernardo

> On Feb 10, 2025, at 7:00 AM, Benedict  wrote:
> 
> This is counterintuitive to me. The constraint should be applied to the 
> table, not to the update. NOT NULL should imply a value is always specified.
> 
> How are you handling this for tables that already exist? Can we alter table 
> to add constraints, and if so what are the semantics?
> 
>> On 10 Feb 2025, at 14:50, Bernardo Botella  
>> wrote:
>> 
>> Hi everyone,
>> 
>> Stefan Miklosovic and I have been working on a NOT_NULL 
>> (https://github.com/apache/cassandra/pull/3867) constraint to be added to 
>> the constraints tool belt, and a really interesting conversation came up.
>> 
>> First, as a problem statement, let's consider this:
>> 
>> -
>> CREATE TABLE ks.tb2 (
>>   id int,
>>   cl1 int,
>>   cl2 int,
>>   val text CHECK NOT_NULL(val),
>>   PRIMARY KEY (id, cl1, cl2)
>> )
>> 
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>> null);
>> InvalidRequest: Error from server: code=2200 [Invalid query] message="Column 
>> value does not satisfy value constraint for column 'val' as it is null."
>> 
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>> “text");
>> cassandra@cqlsh> select * from ks.tb2;
>> 
>> id | cl1 | cl2 | val
>> +-+-+--
>> 1 |   2 |   3 | text
>> 
>> (1 rows)
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
>> cassandra@cqlsh> select * from ks.tb2;
>> 
>> id | cl1 | cl2 | val
>> +-+-+--
>> 1 |   2 |   3 | text
>> 1 |   2 |   4 | null
>> 
>> -
>> 
>> As you see, we have a hole in which a 'null' value is getting written on 
>> column val even if we have a NOT_NULL on that particular column whenever the 
>> column is NOT specified on the write. That raises the question on how this 
>> particular constraint should behave.
>> 
>> If we consider the other constraints (scalar constraint and length 
>> constraint so far), this particular behavior is fine. But, if the constraint 
>> is NOT_NULL, then it becomes a little bit trickier.
>> 
>> The conclusions we have reached is that the meaning of constraints should be 
>> interpreted like: I check whatever you give me as part of the write, 
>> ignoring everything else. Let me elaborate:
>> If we decide to treat this particular NOT_NULL constraint differently, and 
>> check if the value for that column is present in the insert statement, we 
>> then open a different can of worms. What happens if the row already exists 
>> with a valid value, and that insert statement is only trying to do an update 
>> to a different column in the row? If that was the case, we would be forcing 
>> the user to specify the 'val' column value for every update, even if it is 
>> not needed.
>> 
>> Mainly for this reason, we think it is better to treat this NOT_NULL 
>> constraint just like the other constraints, and execute it ONLY on the 
>> values that are present on the insert statement.
>> 
>> The main con is that it may lead to a little bit of confussion (as in, why I 
>> just added a null value to the table even if I have a NOT_NULL constraint?). 
>> We have thought on aliviating this particular confusion by:
>> - Extensive documentation. Let's be upfront on what this constraint does and 
>> does not. 
>> (https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint)
>> - Adding, as part of this patch, yet another constraint (STRICTLY_NOT_NULL), 
>> that checks for the actual column value to be present in the insert 
>> statement..
>> 
>> If you've made it until here, that means you are really interested in 
>> constraints. Thanks! The question for you is, would you have any concern 
>> with this approach?
>> 
>> Thanks,
>> Bernardo



Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Bernardo Botella
I will create a Jira to keep track of that “NO VERIFY” suggestion. For this 
thread, I’d like to stick to the actual proposal for both NOT_NULL and 
STRICTLY_NOT_NULL constraints Stefan and I are adding on the patch.


> On Feb 10, 2025, at 7:18 AM, Benedict  wrote:
> 
> Thanks. While I agree we shouldn’t be applying these constraints post hoc on 
> read or compaction, I think we need to make clear to the user whether we are 
> validating a new constraint before accepting it for alter table. Which is to 
> say I think alter table should require something like “NO VERIFY” or some 
> other additional keywords to make clear we aren’t checking the constraint 
> applies to existing data.
> 
> 
>> On 10 Feb 2025, at 15:10, Bernardo Botella  
>> wrote:
>> 
>> Hi. These was a topic we discussed during the ML thread:
>> https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj
>> 
>> Here was one of my answers on that:
>> https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv
>> 
>> It was also specified in the CEP 
>> (https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime):
>> "Note: This constraints are only enforced at write time. So, an ALTER 
>> CONSTRAINT with more restrictive constraints shouldn’t affect preexisting 
>> data.”
>> 
>> Long story short, constraints are only checked at write time. If a 
>> constraint is added to a table with preexisting offending data, that data 
>> stays untouched.
>> 
>> I hope this helps,
>> Bernardo
>> 
>>> On Feb 10, 2025, at 7:00 AM, Benedict  wrote:
>>> 
>>> This is counterintuitive to me. The constraint should be applied to the 
>>> table, not to the update. NOT NULL should imply a value is always specified.
>>> 
>>> How are you handling this for tables that already exist? Can we alter table 
>>> to add constraints, and if so what are the semantics?
>>> 
 On 10 Feb 2025, at 14:50, Bernardo Botella  
 wrote:
 
 Hi everyone,
 
 Stefan Miklosovic and I have been working on a NOT_NULL 
 (https://github.com/apache/cassandra/pull/3867) constraint to be added to 
 the constraints tool belt, and a really interesting conversation came up.
 
 First, as a problem statement, let's consider this:
 
 -
 CREATE TABLE ks.tb2 (
   id int,
   cl1 int,
   cl2 int,
   val text CHECK NOT_NULL(val),
   PRIMARY KEY (id, cl1, cl2)
 )
 
 cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
 null);
 InvalidRequest: Error from server: code=2200 [Invalid query] 
 message="Column value does not satisfy value constraint for column 'val' 
 as it is null."
 
 cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
 “text");
 cassandra@cqlsh> select * from ks.tb2;
 
 id | cl1 | cl2 | val
 +-+-+--
 1 |   2 |   3 | text
 
 (1 rows)
 cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
 cassandra@cqlsh> select * from ks.tb2;
 
 id | cl1 | cl2 | val
 +-+-+--
 1 |   2 |   3 | text
 1 |   2 |   4 | null
 
 -
 
 As you see, we have a hole in which a 'null' value is getting written on 
 column val even if we have a NOT_NULL on that particular column whenever 
 the column is NOT specified on the write. That raises the question on how 
 this particular constraint should behave.
 
 If we consider the other constraints (scalar constraint and length 
 constraint so far), this particular behavior is fine. But, if the 
 constraint is NOT_NULL, then it becomes a little bit trickier.
 
 The conclusions we have reached is that the meaning of constraints should 
 be interpreted like: I check whatever you give me as part of the write, 
 ignoring everything else. Let me elaborate:
 If we decide to treat this particular NOT_NULL constraint differently, and 
 check if the value for that column is present in the insert statement, we 
 then open a different can of worms. What happens if the row already exists 
 with a valid value, and that insert statement is only trying to do an 
 update to a different column in the row? If that was the case, we would be 
 forcing the user to specify the 'val' column value for every update, even 
 if it is not needed.
 
 Mainly for this reason, we think it is better to treat this NOT_NULL 
 constraint just like the other constraints, and execute it ONLY on the 
 values that are present on the insert statement.
 
 The main con is that it may lead to a little bit of confussion (as in, why 
 I just added a null value to the table even if I have a NOT_NULL 
 constraint?). We have thought on aliviating this particular confusion by:
 - Ex

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Benedict
Thanks. While I agree we shouldn’t be applying these constraints post hoc on read or compaction, I think we need to make clear to the user whether we are validating a new constraint before accepting it for alter table. Which is to say I think alter table should require something like “NO VERIFY” or some other additional keywords to make clear we aren’t checking the constraint applies to existing data.On 10 Feb 2025, at 15:10, Bernardo Botella  wrote:Hi. These was a topic we discussed during the ML thread:lists.apache.orgHere was one of my answers on that:lists.apache.orgIt was also specified in the CEP (https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime):"Note: This constraints are only enforced at write time. So, an ALTER CONSTRAINT with more restrictive constraints shouldn’t affect preexisting data.”Long story short, constraints are only checked at write time. If a constraint is added to a table with preexisting offending data, that data stays untouched.I hope this helps,BernardoOn Feb 10, 2025, at 7:00 AM, Benedict  wrote:This is counterintuitive to me. The constraint should be applied to the table, not to the update. NOT NULL should imply a value is always specified.How are you handling this for tables that already exist? Can we alter table to add constraints, and if so what are the semantics?On 10 Feb 2025, at 14:50, Bernardo Botella  wrote:Hi everyone,Stefan Miklosovic and I have been working on a NOT_NULL (https://github.com/apache/cassandra/pull/3867) constraint to be added to the constraints tool belt, and a really interesting conversation came up.First, as a problem statement, let's consider this:-CREATE TABLE ks.tb2 (   id int,   cl1 int,   cl2 int,   val text CHECK NOT_NULL(val),   PRIMARY KEY (id, cl1, cl2))cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, null);InvalidRequest: Error from server: code=2200 [Invalid query] message="Column value does not satisfy value constraint for column 'val' as it is null."cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, “text");cassandra@cqlsh> select * from ks.tb2;id | cl1 | cl2 | val+-+-+-- 1 |   2 |   3 | text(1 rows)cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);cassandra@cqlsh> select * from ks.tb2;id | cl1 | cl2 | val+-+-+-- 1 |   2 |   3 | text 1 |   2 |   4 | null-As you see, we have a hole in which a 'null' value is getting written on column val even if we have a NOT_NULL on that particular column whenever the column is NOT specified on the write. That raises the question on how this particular constraint should behave.If we consider the other constraints (scalar constraint and length constraint so far), this particular behavior is fine. But, if the constraint is NOT_NULL, then it becomes a little bit trickier.The conclusions we have reached is that the meaning of constraints should be interpreted like: I check whatever you give me as part of the write, ignoring everything else. Let me elaborate:If we decide to treat this particular NOT_NULL constraint differently, and check if the value for that column is present in the insert statement, we then open a different can of worms. What happens if the row already exists with a valid value, and that insert statement is only trying to do an update to a different column in the row? If that was the case, we would be forcing the user to specify the 'val' column value for every update, even if it is not needed.Mainly for this reason, we think it is better to treat this NOT_NULL constraint just like the other constraints, and execute it ONLY on the values that are present on the insert statement.The main con is that it may lead to a little bit of confussion (as in, why I just added a null value to the table even if I have a NOT_NULL constraint?). We have thought on aliviating this particular confusion by:- Extensive documentation. Let's be upfront on what this constraint does and does not. (https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint)- Adding, as part of this patch, yet another constraint (STRICTLY_NOT_NULL), that checks for the actual column value to be present in the insert statement..If you've made it until here, that means you are really interested in constraints. Thanks! The question for you is, would you have any concern with this approach?Thanks,Bernardo

[DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Bernardo Botella
Hi everyone,

Stefan Miklosovic and I have been working on a NOT_NULL 
(https://github.com/apache/cassandra/pull/3867) constraint to be added to the 
constraints tool belt, and a really interesting conversation came up.

First, as a problem statement, let's consider this:

-
CREATE TABLE ks.tb2 (
id int,
cl1 int,
cl2 int,
val text CHECK NOT_NULL(val),
PRIMARY KEY (id, cl1, cl2)
) 

cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, null);
InvalidRequest: Error from server: code=2200 [Invalid query] message="Column 
value does not satisfy value constraint for column 'val' as it is null."

cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
“text");
cassandra@cqlsh> select * from ks.tb2;

 id | cl1 | cl2 | val
+-+-+--
  1 |   2 |   3 | text

(1 rows)
cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
cassandra@cqlsh> select * from ks.tb2;

 id | cl1 | cl2 | val
+-+-+--
  1 |   2 |   3 | text
  1 |   2 |   4 | null

-

As you see, we have a hole in which a 'null' value is getting written on column 
val even if we have a NOT_NULL on that particular column whenever the column is 
NOT specified on the write. That raises the question on how this particular 
constraint should behave.

If we consider the other constraints (scalar constraint and length constraint 
so far), this particular behavior is fine. But, if the constraint is NOT_NULL, 
then it becomes a little bit trickier.

The conclusions we have reached is that the meaning of constraints should be 
interpreted like: I check whatever you give me as part of the write, ignoring 
everything else. Let me elaborate:
If we decide to treat this particular NOT_NULL constraint differently, and 
check if the value for that column is present in the insert statement, we then 
open a different can of worms. What happens if the row already exists with a 
valid value, and that insert statement is only trying to do an update to a 
different column in the row? If that was the case, we would be forcing the user 
to specify the 'val' column value for every update, even if it is not needed. 

Mainly for this reason, we think it is better to treat this NOT_NULL constraint 
just like the other constraints, and execute it ONLY on the values that are 
present on the insert statement.

The main con is that it may lead to a little bit of confussion (as in, why I 
just added a null value to the table even if I have a NOT_NULL constraint?). We 
have thought on aliviating this particular confusion by:
- Extensive documentation. Let's be upfront on what this constraint does and 
does not. 
(https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint)
- Adding, as part of this patch, yet another constraint (STRICTLY_NOT_NULL), 
that checks for the actual column value to be present in the insert statement..

If you've made it until here, that means you are really interested in 
constraints. Thanks! The question for you is, would you have any concern with 
this approach?

Thanks,
Bernardo

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Štefan Miklošovič
The reason it looks "strange" is that we are dealing with a completely
different concept of "NOT NULL" from e.g. MySQL.

In MySQL (as I just tried here locally so I describe what I saw), a user
has to, by default, specify all columns in the table for an insert. There
is, by default, nothing like what Cassandra has that we may just omit
columns when we are not putting any value there. A user can indeed specify
"null" for a particular column but it has to be explicitly enumerated
anyway.


MySQL also does not have any concept of "upsert".


A user can omit columns in an INSERT only if there is a "default value" for
such column:

mysql> CREATE TABLE test (id INT NOT NULL, name VARCHAR(255) NOT NULL,
surname VARCHAR(255) DEFAULT ('-'), CONSTRAINT pk_test PRIMARY KEY
(id));
Query OK, 0 rows affected (0,04 sec)

mysql> describe test;
+-+--+--+-+---+---+
| Field   | Type | Null | Key | Default   | Extra
  |
+-+--+--+-+---+---+
| id  | int  | NO   | PRI | NULL  |
  |
| name| varchar(255) | NO   | | NULL  |
  |
| surname | varchar(255) | YES  | | _utf8mb4\'-\' |
DEFAULT_GENERATED |
+-+--+--+-+---+---+
3 rows in set (0,00 sec)

mysql> insert into test (id , name) values (5, 'stefan');
Query OK, 1 row affected (0,01 sec)

mysql> select * from test;
+++-+
| id | name   | surname |
+++-+
|  5 | stefan | -   |
+++-+
1 row in set (0,00 sec)

This is not applicable in Cassandra or at least I do not have any clue how
this would look like in practice. I am not a big brain on mysql but I guess
that they just do not need to worry at all and they just slap a default
"-" on that because it is basically a single-machine DB.

If we wanted to check that when a column is not specified it still can not
end up being null, we would need to basically "read before write", no? That
is quite a heavy-weight solution for stuff like this and it complicates
what we agreed on quite dramatically (and we actually agreed on not doing
this, it was part of the discussions, afaik).

NOT_NULL and STRICT_NOT_NULL (if you guys have any better idea how this
should be called, tell us), is the consequence of how Cassandra acts when
it comes to this.

On Mon, Feb 10, 2025 at 4:30 PM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> I will create a Jira to keep track of that “NO VERIFY” suggestion. For
> this thread, I’d like to stick to the actual proposal for both NOT_NULL and
> STRICTLY_NOT_NULL constraints Stefan and I are adding on the patch.
>
>
> On Feb 10, 2025, at 7:18 AM, Benedict  wrote:
>
> Thanks. While I agree we shouldn’t be applying these constraints post hoc
> on read or compaction, I think we need to make clear to the user whether we
> are validating a new constraint before accepting it for alter table. Which
> is to say I think alter table should require something like “NO VERIFY” or
> some other additional keywords to make clear we aren’t checking the
> constraint applies to existing data.
>
>
> On 10 Feb 2025, at 15:10, Bernardo Botella 
> wrote:
>
> Hi. These was a topic we discussed during the ML thread:
> lists.apache.org
> 
> 
> 
> 
>
> Here was one of my answers on that:
> lists.apache.org
> 
> 
> 
> 
>
> It was also specified in the CEP (
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime
> ):
> "Note: This constraints are only enforced at write time. So, an ALTER
> CONSTRAINT with more restrictive constraints shouldn’t affect preexisting
> data.”
>
> Long story short, constraints are only checked at write time. If a
> constraint is added to a table with preexisting offending data, that data
> stays untouched.
>
> I hope this helps,
> Bernardo
>
> On Feb 10, 2025, at 7:00 AM, Benedict  wrote:
>
> This is counterintuitive to me. The constraint should be applied to the
> table, not to the update. NOT NULL should imply a value is always specified.
>
> How are you handling this for tables that already exist? Can we alter
> table to add constraints, and if so what are the semantics?
>
> On 10 Feb 2025, at 14:50, Bernardo Botella 
> wrote:
>
> Hi everyone,
>
> Stefan Miklosovic and I have been working on a NOT_NULL (
> https://github.com/apache/cassandra/pull/3867) constraint to be added to
> the constraints tool belt, and a reall

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Jeremiah Jordan
 Having thought about this in the past, some options that have come up in
those discussions  were:

   1. Constraints forcing users to always specify a value for a given
   column or all columns.  Only allow NOT NULL for columns with such a
   constraint applied.
   2. Similar to the above but only requiring that for INSERT, letting
   UPDATE be “user beware”.
   3. Forcing a read before write for all cases where it is not specified.
  1. You have to consider some problem cases here with optimizing
  this.  If you want to only do the check on the replica, you need to
  correctly handle the case where the value only exists on some
replicas and
  not others.


I do think any implementation of NOT NULL that has a way to let NULL in is
bad.  So I would be -1 on the proposal here that lets through INSERTs that
don’t specify the column (also I would be -1 on the option 2 above, but I
included it as something I have discussed with others in the past).

-Jeremiah

On Feb 10, 2025 at 9:27:52 AM, Bernardo Botella <
conta...@bernardobotella.com> wrote:

> I will create a Jira to keep track of that “NO VERIFY” suggestion. For
> this thread, I’d like to stick to the actual proposal for both NOT_NULL and
> STRICTLY_NOT_NULL constraints Stefan and I are adding on the patch.
>
>
> On Feb 10, 2025, at 7:18 AM, Benedict  wrote:
>
> Thanks. While I agree we shouldn’t be applying these constraints post hoc
> on read or compaction, I think we need to make clear to the user whether we
> are validating a new constraint before accepting it for alter table. Which
> is to say I think alter table should require something like “NO VERIFY” or
> some other additional keywords to make clear we aren’t checking the
> constraint applies to existing data.
>
>
> On 10 Feb 2025, at 15:10, Bernardo Botella 
> wrote:
>
> Hi. These was a topic we discussed during the ML thread:
> lists.apache.org
> 
> 
> 
> 
>
> Here was one of my answers on that:
> lists.apache.org
> 
> 
> 
> 
>
> It was also specified in the CEP (
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime
> ):
> "Note: This constraints are only enforced at write time. So, an ALTER
> CONSTRAINT with more restrictive constraints shouldn’t affect preexisting
> data.”
>
> Long story short, constraints are only checked at write time. If a
> constraint is added to a table with preexisting offending data, that data
> stays untouched.
>
> I hope this helps,
> Bernardo
>
> On Feb 10, 2025, at 7:00 AM, Benedict  wrote:
>
> This is counterintuitive to me. The constraint should be applied to the
> table, not to the update. NOT NULL should imply a value is always specified.
>
> How are you handling this for tables that already exist? Can we alter
> table to add constraints, and if so what are the semantics?
>
> On 10 Feb 2025, at 14:50, Bernardo Botella 
> wrote:
>
> Hi everyone,
>
> Stefan Miklosovic and I have been working on a NOT_NULL (
> https://github.com/apache/cassandra/pull/3867) constraint to be added to
> the constraints tool belt, and a really interesting conversation came up.
>
> First, as a problem statement, let's consider this:
>
> -
> CREATE TABLE ks.tb2 (
>   id int,
>   cl1 int,
>   cl2 int,
>   val text CHECK NOT_NULL(val),
>   PRIMARY KEY (id, cl1, cl2)
> )
>
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3,
> null);
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Column value does not satisfy value constraint for column 'val' as
> it is null."
>
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3,
> “text");
> cassandra@cqlsh> select * from ks.tb2;
>
> id | cl1 | cl2 | val
> +-+-+--
> 1 |   2 |   3 | text
>
> (1 rows)
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
> cassandra@cqlsh> select * from ks.tb2;
>
> id | cl1 | cl2 | val
> +-+-+--
> 1 |   2 |   3 | text
> 1 |   2 |   4 | null
>
> -
>
> As you see, we have a hole in which a 'null' value is getting written on
> column val even if we have a NOT_NULL on that particular column whenever
> the column is NOT specified on the write. That raises the question on how
> this particular constraint should behave.
>
> If we consider the other constraints (scalar constraint and length
> constraint so far), this particular behavior is fine. But, if the
> constraint is NOT_NULL, then it becomes a little bit trickier.
>
> The conc

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Yifan Cai
While LOOSE_NOT_NULL might improve the clarity a bit, what is the value of such 
constraint provides to users? It still permits null. Meanwhile, it is easier to 
check the nullness of the bound values on the application side.
IMO, what benefits users is a way to ensure no null value can exist for the 
constrained columns. Reading the thread, it is the behavior of the strict 
version.
How about we just drop the LOOSE one and call the STRICT one “NOT_NULL”?

- Yifan

From: Bernardo Botella 
Sent: Monday, February 10, 2025 8:44:13 AM
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

To recap,

The sentiment I am getting is that NOT_NULL allowing null values is too 
confusing. Nice, that’s why we started the thread.

As an alternative, instead of ditching the loose not null constraint, I propose 
we change the “default” behavior. From my initial proposal, I suggest renaming 
the Constraints:
- NOT_NULL -> LOOSE_NOT_NULL
- STRICTLY_NOT_NULL -> NOT_NULL

The reasoning behind trying to keep it is:
- It is already implemented.
- By being explicit with it being loose, we avoid the confusion of allowing 
nulls.
- It still adds value on its own.

With, the “by default” not null doesn’t allow null or non present values on the 
insert statement, while we still support the more relaxed LOOSE_NOT_NULL for 
updates.

Thoughts?


On Feb 10, 2025, at 8:29 AM, Štefan Miklošovič  wrote:



On Mon, Feb 10, 2025 at 5:20 PM Dinesh Joshi 
mailto:djo...@apache.org>> wrote:
In my head NOT_NULL constraint implies that the column must be specified on 
each write and must not be NULL. If a column with the NOT_NULL constraint is 
omitted during a write then shouldn’t it be treated as if it was specified and 
set to NULL?

Well, yes. One may also look at it that way. But then we would end up with 
"null" in a column, while it would be quite surprising for users to see that 
because they were thinking that if they specified it as NOT NULL on a table 
creation, then it is "guaranteed" that it will not be null ever again. It just 
looks strange to say in table schema it is not null but then it actually might 
be.


If the column has a non-NULL value that was previously written and you’re 
updating the rest of the columns, you still have to force the user to specify 
it otherwise you will have to perform a read before write to validate that the 
column was not NULL. I think this is a fine compromise given that the goal here 
is to ensure that an application shouldn’t inadvertently write a NULL value for 
a column specified as NOT_NULL.


Yes. I see it the same way.

On Mon, Feb 10, 2025 at 6:50 AM Bernardo Botella 
mailto:conta...@bernardobotella.com>> wrote:
Hi everyone,

Stefan Miklosovic and I have been working on a NOT_NULL 
(https://github.com/apache/cassandra/pull/3867) constraint to be added to the 
constraints tool belt, and a really interesting conversation came up.

First, as a problem statement, let's consider this:

-
CREATE TABLE ks.tb2 (
id int,
cl1 int,
cl2 int,
val text CHECK NOT_NULL(val),
PRIMARY KEY (id, cl1, cl2)
)

cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, null);
InvalidRequest: Error from server: code=2200 [Invalid query] message="Column 
value does not satisfy value constraint for column 'val' as it is null."

cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
“text");
cassandra@cqlsh> select * from ks.tb2;

 id | cl1 | cl2 | val
+-+-+--
  1 |   2 |   3 | text

(1 rows)
cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
cassandra@cqlsh> select * from ks.tb2;

 id | cl1 | cl2 | val
+-+-+--
  1 |   2 |   3 | text
  1 |   2 |   4 | null

-

As you see, we have a hole in which a 'null' value is getting written on column 
val even if we have a NOT_NULL on that particular column whenever the column is 
NOT specified on the write. That raises the question on how this particular 
constraint should behave.

If we consider the other constraints (scalar constraint and length constraint 
so far), this particular behavior is fine. But, if the constraint is NOT_NULL, 
then it becomes a little bit trickier.

The conclusions we have reached is that the meaning of constraints should be 
interpreted like: I check whatever you give me as part of the write, ignoring 
everything else. Let me elaborate:
If we decide to treat this particular NOT_NULL constraint differently, and 
check if the value for that column is present in the insert statement, we then 
open a different can of worms. What happens if the row already exists with a 
valid value, and that insert statement is only trying to do an update to a 
different column in the row? If that was the case, we would be forcing the user 
to specify the 'val' column value for every update, even if it i

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Štefan Miklošovič
I agree.

The only reason would be purely practical: if a user has a table consisting
of 1000 columns not being null and a user wants to modify 1 column only,
then a user would be forced to specify the remaining 999 columns just for
the sake of it.

But in this case, I think it would be more practical just to ensure in the
application that what he is putting there is not null rather than having
1000 constraints on the table.

On Mon, Feb 10, 2025 at 5:52 PM Yifan Cai  wrote:

> While LOOSE_NOT_NULL might improve the clarity a bit, what is the value of
> such constraint provides to users? It still permits null. Meanwhile, it is
> easier to check the nullness of the bound values on the application side.
> IMO, what benefits users is a way to ensure no null value can exist for
> the constrained columns. Reading the thread, it is the behavior of the
> strict version.
> How about we just drop the LOOSE one and call the STRICT one “NOT_NULL”?
>
> - Yifan
> --
> *From:* Bernardo Botella 
> *Sent:* Monday, February 10, 2025 8:44:13 AM
> *To:* dev@cassandra.apache.org 
> *Subject:* Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL
> constraint
>
> To recap,
>
> The sentiment I am getting is that NOT_NULL allowing null values is too
> confusing. Nice, that’s why we started the thread.
>
> As an alternative, instead of ditching the loose not null constraint, I
> propose we change the “default” behavior. From my initial proposal, I
> suggest renaming the Constraints:
> - NOT_NULL -> LOOSE_NOT_NULL
> - STRICTLY_NOT_NULL -> NOT_NULL
>
> The reasoning behind trying to keep it is:
> - It is already implemented.
> - By being explicit with it being loose, we avoid the confusion of
> allowing nulls.
> - It still adds value on its own.
>
> With, the “by default” not null doesn’t allow null or non present values
> on the insert statement, while we still support the more relaxed
> LOOSE_NOT_NULL for updates.
>
> Thoughts?
>
>
> On Feb 10, 2025, at 8:29 AM, Štefan Miklošovič 
> wrote:
>
>
>
> On Mon, Feb 10, 2025 at 5:20 PM Dinesh Joshi  wrote:
>
> In my head NOT_NULL constraint implies that the column must be specified
> on each write and must not be NULL. If a column with the NOT_NULL
> constraint is omitted during a write then shouldn’t it be treated as if it
> was specified and set to NULL?
>
>
> Well, yes. One may also look at it that way. But then we would end up with
> "null" in a column, while it would be quite surprising for users to see
> that because they were thinking that if they specified it as NOT NULL on a
> table creation, then it is "guaranteed" that it will not be null ever
> again. It just looks strange to say in table schema it is not null but then
> it actually might be.
>
>
>
> If the column has a non-NULL value that was previously written and you’re
> updating the rest of the columns, you still have to force the user to
> specify it otherwise you will have to perform a read before write to
> validate that the column was not NULL. I think this is a fine compromise
> given that the goal here is to ensure that an application shouldn’t
> inadvertently write a NULL value for a column specified as NOT_NULL.
>
>
> Yes. I see it the same way.
>
>
> On Mon, Feb 10, 2025 at 6:50 AM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
> Hi everyone,
>
> Stefan Miklosovic and I have been working on a NOT_NULL (
> https://github.com/apache/cassandra/pull/3867) constraint to be added to
> the constraints tool belt, and a really interesting conversation came up.
>
> First, as a problem statement, let's consider this:
>
> -
> CREATE TABLE ks.tb2 (
> id int,
> cl1 int,
> cl2 int,
> val text CHECK NOT_NULL(val),
> PRIMARY KEY (id, cl1, cl2)
> )
>
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3,
> null);
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Column value does not satisfy value constraint for column 'val' as
> it is null."
>
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3,
> “text");
> cassandra@cqlsh> select * from ks.tb2;
>
>  id | cl1 | cl2 | val
> +-+-+--
>   1 |   2 |   3 | text
>
> (1 rows)
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
> cassandra@cqlsh> select * from ks.tb2;
>
>  id | cl1 | cl2 | val
> +-+-+--
>   1 |   2 |   3 | text
>   1 |   2 |   4 | null
>
> -
>
> As you see, we have a hole in which a 'null' value is getting written on
> column val even if we have a NOT_NULL on that particular column whenever
> the column is NOT specified on the write. That raises the question on how
> this particular constraint should behave.
>
> If we consider the other constraints (scalar constraint and length
> constraint so far), this particular behavior is fine. But, if the
> constraint is NOT_NULL, then it becomes a little bit trickier.
>
>

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Bernardo Botella
We have consensus then. Let’s ditch the non strict version, and rename the 
STRICTLY_NOT_NULL to NOT_NULL.

Thanks everyone!
Bernardo

> On Feb 10, 2025, at 8:58 AM, Štefan Miklošovič  wrote:
> 
> I agree.
> 
> The only reason would be purely practical: if a user has a table consisting 
> of 1000 columns not being null and a user wants to modify 1 column only, then 
> a user would be forced to specify the remaining 999 columns just for the sake 
> of it.
> 
> But in this case, I think it would be more practical just to ensure in the 
> application that what he is putting there is not null rather than having 1000 
> constraints on the table.
> 
> On Mon, Feb 10, 2025 at 5:52 PM Yifan Cai  > wrote:
>> While LOOSE_NOT_NULL might improve the clarity a bit, what is the value of 
>> such constraint provides to users? It still permits null. Meanwhile, it is 
>> easier to check the nullness of the bound values on the application side.
>> IMO, what benefits users is a way to ensure no null value can exist for the 
>> constrained columns. Reading the thread, it is the behavior of the strict 
>> version. 
>> How about we just drop the LOOSE one and call the STRICT one “NOT_NULL”?
>> 
>> - Yifan
>> From: Bernardo Botella > >
>> Sent: Monday, February 10, 2025 8:44:13 AM
>> To: dev@cassandra.apache.org  
>> mailto:dev@cassandra.apache.org>>
>> Subject: Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint
>>  
>> To recap,
>> 
>> The sentiment I am getting is that NOT_NULL allowing null values is too 
>> confusing. Nice, that’s why we started the thread.
>> 
>> As an alternative, instead of ditching the loose not null constraint, I 
>> propose we change the “default” behavior. From my initial proposal, I 
>> suggest renaming the Constraints:
>> - NOT_NULL -> LOOSE_NOT_NULL
>> - STRICTLY_NOT_NULL -> NOT_NULL
>> 
>> The reasoning behind trying to keep it is:
>> - It is already implemented.
>> - By being explicit with it being loose, we avoid the confusion of allowing 
>> nulls.
>> - It still adds value on its own.
>> 
>> With, the “by default” not null doesn’t allow null or non present values on 
>> the insert statement, while we still support the more relaxed LOOSE_NOT_NULL 
>> for updates.
>> 
>> Thoughts?
>> 
>> 
>>> On Feb 10, 2025, at 8:29 AM, Štefan Miklošovič >> > wrote:
>>> 
>>> 
>>> 
>>> On Mon, Feb 10, 2025 at 5:20 PM Dinesh Joshi >> > wrote:
>>> In my head NOT_NULL constraint implies that the column must be specified on 
>>> each write and must not be NULL. If a column with the NOT_NULL constraint 
>>> is omitted during a write then shouldn’t it be treated as if it was 
>>> specified and set to NULL?
>>> 
>>> Well, yes. One may also look at it that way. But then we would end up with 
>>> "null" in a column, while it would be quite surprising for users to see 
>>> that because they were thinking that if they specified it as NOT NULL on a 
>>> table creation, then it is "guaranteed" that it will not be null ever 
>>> again. It just looks strange to say in table schema it is not null but then 
>>> it actually might be.
>>>  
>>> 
>>> If the column has a non-NULL value that was previously written and you’re 
>>> updating the rest of the columns, you still have to force the user to 
>>> specify it otherwise you will have to perform a read before write to 
>>> validate that the column was not NULL. I think this is a fine compromise 
>>> given that the goal here is to ensure that an application shouldn’t 
>>> inadvertently write a NULL value for a column specified as NOT_NULL.
>>> 
>>> 
>>> Yes. I see it the same way. 
>>>  
>>> On Mon, Feb 10, 2025 at 6:50 AM Bernardo Botella 
>>> mailto:conta...@bernardobotella.com>> wrote:
>>> Hi everyone,
>>> 
>>> Stefan Miklosovic and I have been working on a NOT_NULL 
>>> (https://github.com/apache/cassandra/pull/3867) constraint to be added to 
>>> the constraints tool belt, and a really interesting conversation came up.
>>> 
>>> First, as a problem statement, let's consider this:
>>> 
>>> -
>>> CREATE TABLE ks.tb2 (
>>> id int,
>>> cl1 int,
>>> cl2 int,
>>> val text CHECK NOT_NULL(val),
>>> PRIMARY KEY (id, cl1, cl2)
>>> ) 
>>> 
>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>>> null);
>>> InvalidRequest: Error from server: code=2200 [Invalid query] 
>>> message="Column value does not satisfy value constraint for column 'val' as 
>>> it is null."
>>> 
>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>>> “text");
>>> cassandra@cqlsh> select * from ks.tb2;
>>> 
>>>  id | cl1 | cl2 | val
>>> +-+-+--
>>>   1 |   2 |   3 | text
>>> 
>>> (1 rows)
>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
>>> cassandra@cqlsh> select * from ks.tb2;
>>> 
>>>  id | cl1 | c

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Štefan Miklošovič
That looks like our STRICT_NOT_NULL would be "NOT NULL", so we would
collapse the second, stricter, case into the default one, if I understand
correctly.

Would you mind telling us what you would actually +1? You have -1 both 1)
and 2).

On Mon, Feb 10, 2025 at 5:00 PM Jeremiah Jordan 
wrote:

> Having thought about this in the past, some options that have come up in
> those discussions  were:
>
>1. Constraints forcing users to always specify a value for a given
>column or all columns.  Only allow NOT NULL for columns with such a
>constraint applied.
>2. Similar to the above but only requiring that for INSERT, letting
>UPDATE be “user beware”.
>3. Forcing a read before write for all cases where it is not specified.
>   1. You have to consider some problem cases here with optimizing
>   this.  If you want to only do the check on the replica, you need to
>   correctly handle the case where the value only exists on some replicas 
> and
>   not others.
>
>
> I do think any implementation of NOT NULL that has a way to let NULL in is
> bad.  So I would be -1 on the proposal here that lets through INSERTs that
> don’t specify the column (also I would be -1 on the option 2 above, but I
> included it as something I have discussed with others in the past).
>
> -Jeremiah
>
> On Feb 10, 2025 at 9:27:52 AM, Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> I will create a Jira to keep track of that “NO VERIFY” suggestion. For
>> this thread, I’d like to stick to the actual proposal for both NOT_NULL and
>> STRICTLY_NOT_NULL constraints Stefan and I are adding on the patch.
>>
>>
>> On Feb 10, 2025, at 7:18 AM, Benedict  wrote:
>>
>> Thanks. While I agree we shouldn’t be applying these constraints post hoc
>> on read or compaction, I think we need to make clear to the user whether we
>> are validating a new constraint before accepting it for alter table. Which
>> is to say I think alter table should require something like “NO VERIFY” or
>> some other additional keywords to make clear we aren’t checking the
>> constraint applies to existing data.
>>
>>
>> On 10 Feb 2025, at 15:10, Bernardo Botella 
>> wrote:
>>
>> Hi. These was a topic we discussed during the ML thread:
>> lists.apache.org
>> 
>> 
>> 
>> 
>>
>> Here was one of my answers on that:
>> lists.apache.org
>> 
>> 
>> 
>> 
>>
>> It was also specified in the CEP (
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime
>> ):
>> "Note: This constraints are only enforced at write time. So, an ALTER
>> CONSTRAINT with more restrictive constraints shouldn’t affect preexisting
>> data.”
>>
>> Long story short, constraints are only checked at write time. If a
>> constraint is added to a table with preexisting offending data, that data
>> stays untouched.
>>
>> I hope this helps,
>> Bernardo
>>
>> On Feb 10, 2025, at 7:00 AM, Benedict  wrote:
>>
>> This is counterintuitive to me. The constraint should be applied to the
>> table, not to the update. NOT NULL should imply a value is always specified.
>>
>> How are you handling this for tables that already exist? Can we alter
>> table to add constraints, and if so what are the semantics?
>>
>> On 10 Feb 2025, at 14:50, Bernardo Botella 
>> wrote:
>>
>> Hi everyone,
>>
>> Stefan Miklosovic and I have been working on a NOT_NULL (
>> https://github.com/apache/cassandra/pull/3867) constraint to be added to
>> the constraints tool belt, and a really interesting conversation came up.
>>
>> First, as a problem statement, let's consider this:
>>
>> -
>> CREATE TABLE ks.tb2 (
>>   id int,
>>   cl1 int,
>>   cl2 int,
>>   val text CHECK NOT_NULL(val),
>>   PRIMARY KEY (id, cl1, cl2)
>> )
>>
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2,
>> 3, null);
>> InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Column value does not satisfy value constraint for column 'val' as
>> it is null."
>>
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2,
>> 3, “text");
>> cassandra@cqlsh> select * from ks.tb2;
>>
>> id | cl1 | cl2 | val
>> +-+-+--
>> 1 |   2 |   3 | text
>>
>> (1 rows)
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
>> cassandra@cqlsh> select * from ks.tb2;
>>
>> id | cl1 | cl2 | val
>> +-+-+--
>> 1 |   2 |   3 | text
>> 1 |   2 |   4 | null
>>
>> -
>>
>> As you see, we have a hole in which a 'null

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Štefan Miklošovič
Rereading this:

I do think any implementation of NOT NULL that has a way to let NULL in is
bad.  So I would be -1 on the proposal here that lets through INSERTs that
don’t specify the column

and also: " requiring that for INSERT, letting UPDATE be “user beware” -
and you -1 it as well, that looks like our "strict not null" is satisfying
this, yes.

On Mon, Feb 10, 2025 at 5:00 PM Jeremiah Jordan 
wrote:

> Having thought about this in the past, some options that have come up in
> those discussions  were:
>
>1. Constraints forcing users to always specify a value for a given
>column or all columns.  Only allow NOT NULL for columns with such a
>constraint applied.
>2. Similar to the above but only requiring that for INSERT, letting
>UPDATE be “user beware”.
>3. Forcing a read before write for all cases where it is not specified.
>   1. You have to consider some problem cases here with optimizing
>   this.  If you want to only do the check on the replica, you need to
>   correctly handle the case where the value only exists on some replicas 
> and
>   not others.
>
>
> I do think any implementation of NOT NULL that has a way to let NULL in is
> bad.  So I would be -1 on the proposal here that lets through INSERTs that
> don’t specify the column (also I would be -1 on the option 2 above, but I
> included it as something I have discussed with others in the past).
>
> -Jeremiah
>
> On Feb 10, 2025 at 9:27:52 AM, Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> I will create a Jira to keep track of that “NO VERIFY” suggestion. For
>> this thread, I’d like to stick to the actual proposal for both NOT_NULL and
>> STRICTLY_NOT_NULL constraints Stefan and I are adding on the patch.
>>
>>
>> On Feb 10, 2025, at 7:18 AM, Benedict  wrote:
>>
>> Thanks. While I agree we shouldn’t be applying these constraints post hoc
>> on read or compaction, I think we need to make clear to the user whether we
>> are validating a new constraint before accepting it for alter table. Which
>> is to say I think alter table should require something like “NO VERIFY” or
>> some other additional keywords to make clear we aren’t checking the
>> constraint applies to existing data.
>>
>>
>> On 10 Feb 2025, at 15:10, Bernardo Botella 
>> wrote:
>>
>> Hi. These was a topic we discussed during the ML thread:
>> lists.apache.org
>> 
>> 
>> 
>> 
>>
>> Here was one of my answers on that:
>> lists.apache.org
>> 
>> 
>> 
>> 
>>
>> It was also specified in the CEP (
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime
>> ):
>> "Note: This constraints are only enforced at write time. So, an ALTER
>> CONSTRAINT with more restrictive constraints shouldn’t affect preexisting
>> data.”
>>
>> Long story short, constraints are only checked at write time. If a
>> constraint is added to a table with preexisting offending data, that data
>> stays untouched.
>>
>> I hope this helps,
>> Bernardo
>>
>> On Feb 10, 2025, at 7:00 AM, Benedict  wrote:
>>
>> This is counterintuitive to me. The constraint should be applied to the
>> table, not to the update. NOT NULL should imply a value is always specified.
>>
>> How are you handling this for tables that already exist? Can we alter
>> table to add constraints, and if so what are the semantics?
>>
>> On 10 Feb 2025, at 14:50, Bernardo Botella 
>> wrote:
>>
>> Hi everyone,
>>
>> Stefan Miklosovic and I have been working on a NOT_NULL (
>> https://github.com/apache/cassandra/pull/3867) constraint to be added to
>> the constraints tool belt, and a really interesting conversation came up.
>>
>> First, as a problem statement, let's consider this:
>>
>> -
>> CREATE TABLE ks.tb2 (
>>   id int,
>>   cl1 int,
>>   cl2 int,
>>   val text CHECK NOT_NULL(val),
>>   PRIMARY KEY (id, cl1, cl2)
>> )
>>
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2,
>> 3, null);
>> InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Column value does not satisfy value constraint for column 'val' as
>> it is null."
>>
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2,
>> 3, “text");
>> cassandra@cqlsh> select * from ks.tb2;
>>
>> id | cl1 | cl2 | val
>> +-+-+--
>> 1 |   2 |   3 | text
>>
>> (1 rows)
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
>> cassandra@cqlsh> select * from ks.tb2;
>>
>> id | cl1 | cl2 | val
>> +-+-+--
>> 1 |   2 |   3 | text
>> 1 |

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Štefan Miklošovič
On Mon, Feb 10, 2025 at 5:20 PM Dinesh Joshi  wrote:

> In my head NOT_NULL constraint implies that the column must be specified
> on each write and must not be NULL. If a column with the NOT_NULL
> constraint is omitted during a write then shouldn’t it be treated as if it
> was specified and set to NULL?
>

Well, yes. One may also look at it that way. But then we would end up with
"null" in a column, while it would be quite surprising for users to see
that because they were thinking that if they specified it as NOT NULL on a
table creation, then it is "guaranteed" that it will not be null ever
again. It just looks strange to say in table schema it is not null but then
it actually might be.


>
> If the column has a non-NULL value that was previously written and you’re
> updating the rest of the columns, you still have to force the user to
> specify it otherwise you will have to perform a read before write to
> validate that the column was not NULL. I think this is a fine compromise
> given that the goal here is to ensure that an application shouldn’t
> inadvertently write a NULL value for a column specified as NOT_NULL.
>
>
Yes. I see it the same way.


> On Mon, Feb 10, 2025 at 6:50 AM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> Hi everyone,
>>
>> Stefan Miklosovic and I have been working on a NOT_NULL (
>> https://github.com/apache/cassandra/pull/3867) constraint to be added to
>> the constraints tool belt, and a really interesting conversation came up.
>>
>> First, as a problem statement, let's consider this:
>>
>> -
>> CREATE TABLE ks.tb2 (
>> id int,
>> cl1 int,
>> cl2 int,
>> val text CHECK NOT_NULL(val),
>> PRIMARY KEY (id, cl1, cl2)
>> )
>>
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2,
>> 3, null);
>> InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Column value does not satisfy value constraint for column 'val' as
>> it is null."
>>
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2,
>> 3, “text");
>> cassandra@cqlsh> select * from ks.tb2;
>>
>>  id | cl1 | cl2 | val
>> +-+-+--
>>   1 |   2 |   3 | text
>>
>> (1 rows)
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
>> cassandra@cqlsh> select * from ks.tb2;
>>
>>  id | cl1 | cl2 | val
>> +-+-+--
>>   1 |   2 |   3 | text
>>   1 |   2 |   4 | null
>>
>> -
>>
>> As you see, we have a hole in which a 'null' value is getting written on
>> column val even if we have a NOT_NULL on that particular column whenever
>> the column is NOT specified on the write. That raises the question on how
>> this particular constraint should behave.
>>
>> If we consider the other constraints (scalar constraint and length
>> constraint so far), this particular behavior is fine. But, if the
>> constraint is NOT_NULL, then it becomes a little bit trickier.
>>
>> The conclusions we have reached is that the meaning of constraints should
>> be interpreted like: I check whatever you give me as part of the write,
>> ignoring everything else. Let me elaborate:
>> If we decide to treat this particular NOT_NULL constraint differently,
>> and check if the value for that column is present in the insert statement,
>> we then open a different can of worms. What happens if the row already
>> exists with a valid value, and that insert statement is only trying to do
>> an update to a different column in the row? If that was the case, we would
>> be forcing the user to specify the 'val' column value for every update,
>> even if it is not needed.
>>
>> Mainly for this reason, we think it is better to treat this NOT_NULL
>> constraint just like the other constraints, and execute it ONLY on the
>> values that are present on the insert statement.
>>
>> The main con is that it may lead to a little bit of confussion (as in,
>> why I just added a null value to the table even if I have a NOT_NULL
>> constraint?). We have thought on aliviating this particular confusion by:
>> - Extensive documentation. Let's be upfront on what this constraint does
>> and does not. (
>> https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint
>> )
>> - Adding, as part of this patch, yet another constraint
>> (STRICTLY_NOT_NULL), that checks for the actual column value to be present
>> in the insert statement..
>>
>> If you've made it until here, that means you are really interested in
>> constraints. Thanks! The question for you is, would you have any concern
>> with this approach?
>>
>> Thanks,
>> Bernardo
>
>


Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Dinesh Joshi
In my head NOT_NULL constraint implies that the column must be specified on
each write and must not be NULL. If a column with the NOT_NULL constraint
is omitted during a write then shouldn’t it be treated as if it was
specified and set to NULL?

If the column has a non-NULL value that was previously written and you’re
updating the rest of the columns, you still have to force the user to
specify it otherwise you will have to perform a read before write to
validate that the column was not NULL. I think this is a fine compromise
given that the goal here is to ensure that an application shouldn’t
inadvertently write a NULL value for a column specified as NOT_NULL.

On Mon, Feb 10, 2025 at 6:50 AM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> Hi everyone,
>
> Stefan Miklosovic and I have been working on a NOT_NULL (
> https://github.com/apache/cassandra/pull/3867) constraint to be added to
> the constraints tool belt, and a really interesting conversation came up.
>
> First, as a problem statement, let's consider this:
>
> -
> CREATE TABLE ks.tb2 (
> id int,
> cl1 int,
> cl2 int,
> val text CHECK NOT_NULL(val),
> PRIMARY KEY (id, cl1, cl2)
> )
>
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3,
> null);
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Column value does not satisfy value constraint for column 'val' as
> it is null."
>
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3,
> “text");
> cassandra@cqlsh> select * from ks.tb2;
>
>  id | cl1 | cl2 | val
> +-+-+--
>   1 |   2 |   3 | text
>
> (1 rows)
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
> cassandra@cqlsh> select * from ks.tb2;
>
>  id | cl1 | cl2 | val
> +-+-+--
>   1 |   2 |   3 | text
>   1 |   2 |   4 | null
>
> -
>
> As you see, we have a hole in which a 'null' value is getting written on
> column val even if we have a NOT_NULL on that particular column whenever
> the column is NOT specified on the write. That raises the question on how
> this particular constraint should behave.
>
> If we consider the other constraints (scalar constraint and length
> constraint so far), this particular behavior is fine. But, if the
> constraint is NOT_NULL, then it becomes a little bit trickier.
>
> The conclusions we have reached is that the meaning of constraints should
> be interpreted like: I check whatever you give me as part of the write,
> ignoring everything else. Let me elaborate:
> If we decide to treat this particular NOT_NULL constraint differently, and
> check if the value for that column is present in the insert statement, we
> then open a different can of worms. What happens if the row already exists
> with a valid value, and that insert statement is only trying to do an
> update to a different column in the row? If that was the case, we would be
> forcing the user to specify the 'val' column value for every update, even
> if it is not needed.
>
> Mainly for this reason, we think it is better to treat this NOT_NULL
> constraint just like the other constraints, and execute it ONLY on the
> values that are present on the insert statement.
>
> The main con is that it may lead to a little bit of confussion (as in, why
> I just added a null value to the table even if I have a NOT_NULL
> constraint?). We have thought on aliviating this particular confusion by:
> - Extensive documentation. Let's be upfront on what this constraint does
> and does not. (
> https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint
> )
> - Adding, as part of this patch, yet another constraint
> (STRICTLY_NOT_NULL), that checks for the actual column value to be present
> in the insert statement..
>
> If you've made it until here, that means you are really interested in
> constraints. Thanks! The question for you is, would you have any concern
> with this approach?
>
> Thanks,
> Bernardo


Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Bernardo Botella
To recap,

The sentiment I am getting is that NOT_NULL allowing null values is too 
confusing. Nice, that’s why we started the thread.

As an alternative, instead of ditching the loose not null constraint, I propose 
we change the “default” behavior. From my initial proposal, I suggest renaming 
the Constraints:
- NOT_NULL -> LOOSE_NOT_NULL
- STRICTLY_NOT_NULL -> NOT_NULL

The reasoning behind trying to keep it is:
- It is already implemented.
- By being explicit with it being loose, we avoid the confusion of allowing 
nulls.
- It still adds value on its own.

With, the “by default” not null doesn’t allow null or non present values on the 
insert statement, while we still support the more relaxed LOOSE_NOT_NULL for 
updates.

Thoughts?


> On Feb 10, 2025, at 8:29 AM, Štefan Miklošovič  wrote:
> 
> 
> 
> On Mon, Feb 10, 2025 at 5:20 PM Dinesh Joshi  > wrote:
>> In my head NOT_NULL constraint implies that the column must be specified on 
>> each write and must not be NULL. If a column with the NOT_NULL constraint is 
>> omitted during a write then shouldn’t it be treated as if it was specified 
>> and set to NULL?
> 
> Well, yes. One may also look at it that way. But then we would end up with 
> "null" in a column, while it would be quite surprising for users to see that 
> because they were thinking that if they specified it as NOT NULL on a table 
> creation, then it is "guaranteed" that it will not be null ever again. It 
> just looks strange to say in table schema it is not null but then it actually 
> might be.
>  
>> 
>> If the column has a non-NULL value that was previously written and you’re 
>> updating the rest of the columns, you still have to force the user to 
>> specify it otherwise you will have to perform a read before write to 
>> validate that the column was not NULL. I think this is a fine compromise 
>> given that the goal here is to ensure that an application shouldn’t 
>> inadvertently write a NULL value for a column specified as NOT_NULL.
>> 
> 
> Yes. I see it the same way. 
>  
>> On Mon, Feb 10, 2025 at 6:50 AM Bernardo Botella 
>> mailto:conta...@bernardobotella.com>> wrote:
>>> Hi everyone,
>>> 
>>> Stefan Miklosovic and I have been working on a NOT_NULL 
>>> (https://github.com/apache/cassandra/pull/3867) constraint to be added to 
>>> the constraints tool belt, and a really interesting conversation came up.
>>> 
>>> First, as a problem statement, let's consider this:
>>> 
>>> -
>>> CREATE TABLE ks.tb2 (
>>> id int,
>>> cl1 int,
>>> cl2 int,
>>> val text CHECK NOT_NULL(val),
>>> PRIMARY KEY (id, cl1, cl2)
>>> ) 
>>> 
>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>>> null);
>>> InvalidRequest: Error from server: code=2200 [Invalid query] 
>>> message="Column value does not satisfy value constraint for column 'val' as 
>>> it is null."
>>> 
>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>>> “text");
>>> cassandra@cqlsh> select * from ks.tb2;
>>> 
>>>  id | cl1 | cl2 | val
>>> +-+-+--
>>>   1 |   2 |   3 | text
>>> 
>>> (1 rows)
>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
>>> cassandra@cqlsh> select * from ks.tb2;
>>> 
>>>  id | cl1 | cl2 | val
>>> +-+-+--
>>>   1 |   2 |   3 | text
>>>   1 |   2 |   4 | null
>>> 
>>> -
>>> 
>>> As you see, we have a hole in which a 'null' value is getting written on 
>>> column val even if we have a NOT_NULL on that particular column whenever 
>>> the column is NOT specified on the write. That raises the question on how 
>>> this particular constraint should behave.
>>> 
>>> If we consider the other constraints (scalar constraint and length 
>>> constraint so far), this particular behavior is fine. But, if the 
>>> constraint is NOT_NULL, then it becomes a little bit trickier.
>>> 
>>> The conclusions we have reached is that the meaning of constraints should 
>>> be interpreted like: I check whatever you give me as part of the write, 
>>> ignoring everything else. Let me elaborate:
>>> If we decide to treat this particular NOT_NULL constraint differently, and 
>>> check if the value for that column is present in the insert statement, we 
>>> then open a different can of worms. What happens if the row already exists 
>>> with a valid value, and that insert statement is only trying to do an 
>>> update to a different column in the row? If that was the case, we would be 
>>> forcing the user to specify the 'val' column value for every update, even 
>>> if it is not needed. 
>>> 
>>> Mainly for this reason, we think it is better to treat this NOT_NULL 
>>> constraint just like the other constraints, and execute it ONLY on the 
>>> values that are present on the insert statement.
>>> 
>>> The main con is that it may lead to a little bit of confussion (as in, why 
>>> I just added a null value to the tabl

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Dinesh Joshi
On Mon, Feb 10, 2025 at 9:05 AM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> We have consensus then. Let’s ditch the non strict version, and rename the
> STRICTLY_NOT_NULL to NOT_NULL.
>

Can you give this thread at least 24-48 hours to ensure we capture any
other perspectives?


Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Benedict
This is counterintuitive to me. The constraint should be applied to the table, 
not to the update. NOT NULL should imply a value is always specified.

How are you handling this for tables that already exist? Can we alter table to 
add constraints, and if so what are the semantics?

> On 10 Feb 2025, at 14:50, Bernardo Botella  
> wrote:
> 
> Hi everyone,
> 
> Stefan Miklosovic and I have been working on a NOT_NULL 
> (https://github.com/apache/cassandra/pull/3867) constraint to be added to the 
> constraints tool belt, and a really interesting conversation came up.
> 
> First, as a problem statement, let's consider this:
> 
> -
> CREATE TABLE ks.tb2 (
>id int,
>cl1 int,
>cl2 int,
>val text CHECK NOT_NULL(val),
>PRIMARY KEY (id, cl1, cl2)
> )
> 
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
> null);
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Column 
> value does not satisfy value constraint for column 'val' as it is null."
> 
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
> “text");
> cassandra@cqlsh> select * from ks.tb2;
> 
> id | cl1 | cl2 | val
> +-+-+--
>  1 |   2 |   3 | text
> 
> (1 rows)
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
> cassandra@cqlsh> select * from ks.tb2;
> 
> id | cl1 | cl2 | val
> +-+-+--
>  1 |   2 |   3 | text
>  1 |   2 |   4 | null
> 
> -
> 
> As you see, we have a hole in which a 'null' value is getting written on 
> column val even if we have a NOT_NULL on that particular column whenever the 
> column is NOT specified on the write. That raises the question on how this 
> particular constraint should behave.
> 
> If we consider the other constraints (scalar constraint and length constraint 
> so far), this particular behavior is fine. But, if the constraint is 
> NOT_NULL, then it becomes a little bit trickier.
> 
> The conclusions we have reached is that the meaning of constraints should be 
> interpreted like: I check whatever you give me as part of the write, ignoring 
> everything else. Let me elaborate:
> If we decide to treat this particular NOT_NULL constraint differently, and 
> check if the value for that column is present in the insert statement, we 
> then open a different can of worms. What happens if the row already exists 
> with a valid value, and that insert statement is only trying to do an update 
> to a different column in the row? If that was the case, we would be forcing 
> the user to specify the 'val' column value for every update, even if it is 
> not needed.
> 
> Mainly for this reason, we think it is better to treat this NOT_NULL 
> constraint just like the other constraints, and execute it ONLY on the values 
> that are present on the insert statement.
> 
> The main con is that it may lead to a little bit of confussion (as in, why I 
> just added a null value to the table even if I have a NOT_NULL constraint?). 
> We have thought on aliviating this particular confusion by:
> - Extensive documentation. Let's be upfront on what this constraint does and 
> does not. 
> (https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint)
> - Adding, as part of this patch, yet another constraint (STRICTLY_NOT_NULL), 
> that checks for the actual column value to be present in the insert 
> statement..
> 
> If you've made it until here, that means you are really interested in 
> constraints. Thanks! The question for you is, would you have any concern with 
> this approach?
> 
> Thanks,
> Bernardo


Re: [VOTE] Release Apache Cassandra Java Driver 4.19.0

2025-02-10 Thread Tolbert, Andy
+1 (nb)

Tried a few things
* Added the maven repository to a project and did some quick sanity testing
w/ 4.19.0 (java-driver-core and java-driver-core-shaded)
* signature matches Bret's from KEYS
 and also verified checksums
from subversion
* built the source tarball with JDK8

Thanks,
Andy

On Thu, Feb 6, 2025 at 4:34 PM Bret McGuire  wrote:

>Greetings all!  I’m proposing the test build of Cassandra Java Driver
> 4.19.0 for release.
>
> sha1: 46444eaabdbd23e9231123198536d070e99aca27
>
> Git:  
> https://github.com/apache/cassandra-java-driver/tree/4.19.0
>
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1364/
>
>The vote will be open for 120 hours (longer if needed). Everyone who
> has tested the build is invited to vote. Votes by PMC members are
> considered binding. A vote passes if there are at least three binding +1s
> and no -1's.
>
>Thanks!
>


Re: [VOTE] Release Apache Cassandra Java Driver 4.19.0

2025-02-10 Thread Nate McCall
+1
Verified sigs and artifact coordinates.

On Tue, Feb 11, 2025 at 12:30 PM Brandon Williams  wrote:

> +1
>
> Checked sha/sig, maven artifacts, built on j8.
>
> Kind Regards,
> Brandon
>
> On Thu, Feb 6, 2025 at 4:34 PM Bret McGuire 
> wrote:
> >
> >Greetings all!  I’m proposing the test build of Cassandra Java Driver
> 4.19.0 for release.
> >
> >
> > sha1: 46444eaabdbd23e9231123198536d070e99aca27
> >
> > Git: https://github.com/apache/cassandra-java-driver/tree/4.19.0
> >
> > Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1364/
> >
> >
> >The vote will be open for 120 hours (longer if needed). Everyone who
> has tested the build is invited to vote. Votes by PMC members are
> considered binding. A vote passes if there are at least three binding +1s
> and no -1's.
> >
> >
> >Thanks!
>


Re: CVE-2025-23015: Apache Cassandra: User with MODIFY permission on ALL KEYSPACES can escalate privileges to superuser via unsafe actions

2025-02-10 Thread Paulo Motta
A performance regression was detected in the security releases 3.0.31
[1] and 3.11.18 [2]. Users affected by this vulnerability are
recommended to upgrade to versions 3.0.32 and 3.11.19 instead.

Remaining versions are unaffected.

[1] - https://lists.apache.org/thread/yprngr9cmp9c43m1c56thv1v0v6y5ywq
[2] - https://lists.apache.org/thread/hc9shwlm1kmxdxosbh3qo2xooqoo3sc6

On Mon, Feb 3, 2025 at 6:19 PM Paulo Motta  wrote:
>
> Severity: moderate
>
> Affected versions:
>
> - Apache Cassandra 3.0.0 through 3.0.30
> - Apache Cassandra 3.1.0 through 3.11.17
> - Apache Cassandra 4.0.0 through 4.0.15
> - Apache Cassandra 4.1.0 through 4.1.7
> - Apache Cassandra 5.0.0 through 5.0.2
>
> Description:
>
> Privilege Defined With Unsafe Actions vulnerability in Apache Cassandra. An 
> user with MODIFY permission ON ALL KEYSPACES can escalate privileges to 
> superuser within a targeted Cassandra cluster via unsafe actions to a system 
> resource. Operators granting data MODIFY permission on all keyspaces on 
> affected versions should review data access rules for potential breaches.
>
> This issue affects Apache Cassandra through 3.0.30, 3.11.17, 4.0.15, 4.1.7, 
> 5.0.2.
>
> Users are recommended to upgrade to versions 3.0.31, 3.11.18, 4.0.16, 4.1.8, 
> 5.0.3, which fixes the issue.
>
> This issue was reported by Adam Pond, Ali Mirheidari, Terry Thibault, and 
> Will Brattain of Apple Services Engineering Security.
>
> References:
>
> https://cassandra.apache.org/
> https://www.cve.org/CVERecord?id=CVE-2025-23015
>


Re: [VOTE] Release Apache Cassandra Java Driver 4.19.0

2025-02-10 Thread Brandon Williams
+1

Checked sha/sig, maven artifacts, built on j8.

Kind Regards,
Brandon

On Thu, Feb 6, 2025 at 4:34 PM Bret McGuire  wrote:
>
>Greetings all!  I’m proposing the test build of Cassandra Java Driver 
> 4.19.0 for release.
>
>
> sha1: 46444eaabdbd23e9231123198536d070e99aca27
>
> Git: https://github.com/apache/cassandra-java-driver/tree/4.19.0
>
> Maven 
> Artifacts:https://repository.apache.org/content/repositories/orgapachecassandra-1364/
>
>
>The vote will be open for 120 hours (longer if needed). Everyone who has 
> tested the build is invited to vote. Votes by PMC members are considered 
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
>
>Thanks!