Having thought about this in the past, some options that have come up in those discussions were:
1. Constraints forcing users to always specify a value for a given column or all columns. Only allow NOT NULL for columns with such a constraint applied. 2. Similar to the above but only requiring that for INSERT, letting UPDATE be “user beware”. 3. Forcing a read before write for all cases where it is not specified. 1. You have to consider some problem cases here with optimizing this. If you want to only do the check on the replica, you need to correctly handle the case where the value only exists on some replicas and not others. I do think any implementation of NOT NULL that has a way to let NULL in is bad. So I would be -1 on the proposal here that lets through INSERTs that don’t specify the column (also I would be -1 on the option 2 above, but I included it as something I have discussed with others in the past). -Jeremiah On Feb 10, 2025 at 9:27:52 AM, Bernardo Botella < conta...@bernardobotella.com> wrote: > I will create a Jira to keep track of that “NO VERIFY” suggestion. For > this thread, I’d like to stick to the actual proposal for both NOT_NULL and > STRICTLY_NOT_NULL constraints Stefan and I are adding on the patch. > > > On Feb 10, 2025, at 7:18 AM, Benedict <bened...@apache.org> wrote: > > Thanks. While I agree we shouldn’t be applying these constraints post hoc > on read or compaction, I think we need to make clear to the user whether we > are validating a new constraint before accepting it for alter table. Which > is to say I think alter table should require something like “NO VERIFY” or > some other additional keywords to make clear we aren’t checking the > constraint applies to existing data. > > > On 10 Feb 2025, at 15:10, Bernardo Botella <conta...@bernardobotella.com> > wrote: > > Hi. These was a topic we discussed during the ML thread: > lists.apache.org > <https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj> > <favicon.ico> > <https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj> > <https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj> > > Here was one of my answers on that: > lists.apache.org > <https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv> > <favicon.ico> > <https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv> > <https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv> > > It was also specified in the CEP ( > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime > ): > "Note: This constraints are only enforced at write time. So, an ALTER > CONSTRAINT with more restrictive constraints shouldn’t affect preexisting > data.” > > Long story short, constraints are only checked at write time. If a > constraint is added to a table with preexisting offending data, that data > stays untouched. > > I hope this helps, > Bernardo > > On Feb 10, 2025, at 7:00 AM, Benedict <bened...@apache.org> wrote: > > This is counterintuitive to me. The constraint should be applied to the > table, not to the update. NOT NULL should imply a value is always specified. > > How are you handling this for tables that already exist? Can we alter > table to add constraints, and if so what are the semantics? > > On 10 Feb 2025, at 14:50, Bernardo Botella <conta...@bernardobotella.com> > wrote: > > Hi everyone, > > Stefan Miklosovic and I have been working on a NOT_NULL ( > https://github.com/apache/cassandra/pull/3867) constraint to be added to > the constraints tool belt, and a really interesting conversation came up. > > First, as a problem statement, let's consider this: > > ----------------------------------------- > CREATE TABLE ks.tb2 ( > id int, > cl1 int, > cl2 int, > val text CHECK NOT_NULL(val), > PRIMARY KEY (id, cl1, cl2) > ) > > cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, > null); > InvalidRequest: Error from server: code=2200 [Invalid query] > message="Column value does not satisfy value constraint for column 'val' as > it is null." > > cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, > “text"); > cassandra@cqlsh> select * from ks.tb2; > > id | cl1 | cl2 | val > ----+-----+-----+------ > 1 | 2 | 3 | text > > (1 rows) > cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4); > cassandra@cqlsh> select * from ks.tb2; > > id | cl1 | cl2 | val > ----+-----+-----+------ > 1 | 2 | 3 | text > 1 | 2 | 4 | null > > ----------------------------------------- > > As you see, we have a hole in which a 'null' value is getting written on > column val even if we have a NOT_NULL on that particular column whenever > the column is NOT specified on the write. That raises the question on how > this particular constraint should behave. > > If we consider the other constraints (scalar constraint and length > constraint so far), this particular behavior is fine. But, if the > constraint is NOT_NULL, then it becomes a little bit trickier. > > The conclusions we have reached is that the meaning of constraints should > be interpreted like: I check whatever you give me as part of the write, > ignoring everything else. Let me elaborate: > If we decide to treat this particular NOT_NULL constraint differently, and > check if the value for that column is present in the insert statement, we > then open a different can of worms. What happens if the row already exists > with a valid value, and that insert statement is only trying to do an > update to a different column in the row? If that was the case, we would be > forcing the user to specify the 'val' column value for every update, even > if it is not needed. > > Mainly for this reason, we think it is better to treat this NOT_NULL > constraint just like the other constraints, and execute it ONLY on the > values that are present on the insert statement. > > The main con is that it may lead to a little bit of confussion (as in, why > I just added a null value to the table even if I have a NOT_NULL > constraint?). We have thought on aliviating this particular confusion by: > - Extensive documentation. Let's be upfront on what this constraint does > and does not. ( > https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint > ) > - Adding, as part of this patch, yet another constraint > (STRICTLY_NOT_NULL), that checks for the actual column value to be present > in the insert statement.. > > If you've made it until here, that means you are really interested in > constraints. Thanks! The question for you is, would you have any concern > with this approach? > > Thanks, > Bernardo > > > >