Re: Constraint's "not null" alignment with transactions and their simplification

Jon Haddad Thu, 17 Apr 2025 09:17:57 -0700

Maybe I misremember the convo, but I thought both of these was possible:

CREATE table test ( id int primary key
val int check val > 0,
val2 int
)


INSERT INTO test (id) values (1);  ## inserts with empty val
UPDATE test set val2 = 1;  ## inserts with empty val


Are you saying supplying val is required in both cases?

In the second case, does a user always have to overwrite the field?

Jon


On Wed, Apr 16, 2025 at 11:43 PM Štefan Miklošovič <smikloso...@apache.org>
wrote:

> _A bigger issue, is the actual behavior, in that you can say NOT NULL and
> still easily get null values by not supplying the value.  The constraint is
> more of a weakly applied suggestion._
>
> This is not true. You can not really put there null. See other discussion
> about this behaviour on ML.
>
> On Wed, Apr 16, 2025 at 6:36 PM Jon Haddad <j...@rustyrazorblade.com>
> wrote:
>
>> I think the world is fairly used to NOT NULL in line with the type.  I
>> don't see much value in paving our own path for things that are established
>> norms.  Let's not reinvent wheels that don't need reinventing.
>>
>> I also think it's fine if the generated CQL from describe doesn't match
>> the user's input.  We already do this with all the table level fields that
>> get added.  I'm not sure why we should care if a user drifts away from not
>> null either.  I think these are pretty irrelevant details in the grand
>> scheme of things.
>>
>> A bigger issue, is the actual behavior, in that you can say NOT NULL and
>> still easily get null values by not supplying the value.  The constraint is
>> more of a weakly applied suggestion.  I'm expecting a bit of confusion
>> around this.  I wonder if it makes sense to not allow prepared INSERT
>> statements that don't contain the field if it's null constrained, although
>> it doesn't really help us in the case of an upsert.  It does narrow the
>> edge case a bit, which could be argued is good or bad depending on
>> someone's mood.
>>
>> I think it should be made clear that what we're really giving the user
>> when we say `value check not null` is this:
>>
>> INSERT INTO blah (id, value) VALUES(?, ? not null);
>>
>> Does accord change the behavior of INSERT to actually mean INSERT and not
>> UPSERT, or do you still have to specify IF NOT EXISTS?
>>
>> Jon
>>
>>
>>
>> On Wed, Apr 16, 2025 at 12:50 AM Štefan Miklošovič <
>> smikloso...@apache.org> wrote:
>>
>>> Maybe I am too naive and idealist but I can see a world where, even we
>>> enable them to do it like "val int not null", after they describe a table
>>> and they see "val int check not null" they start to use the latter form
>>> because they would be tired of switching two forms in their head all the
>>> time when they see that all other checks they want to specify have to be
>>> specified in "val int check ..." form. They might actually start to
>>> appreciate the unified fashion of it and they will themselves drift away
>>> from "val int not null" eventually. I would :D.
>>>
>>> As we are among leaders in NoSQL databases we have also a great leverage
>>> over how stuff is going to be used and we can actively form this space. We
>>> do not need to just follow.
>>>
>>> I am open to supporting specifying "val int not null" _together with_
>>> "val int check not null" form. Not _exclusively_ SQL syntax only. This
>>> syntax sugar can go in anytime and it is more or less purely "additive".
>>> Even if a user specifies it like "val int not null" it will be internally
>>> stored as "val int check not null" to be consistent with how it is going to
>>> be represented internally with all other checks. NOT NULL will ultimately
>>> be a check as any other we have, it is just how it might be defined on the
>>> CQL level which we are arguing about here.
>>>
>>> On Wed, Apr 16, 2025 at 12:56 AM Patrick McFadin <pmcfa...@gmail.com>
>>> wrote:
>>>
>>>> I may be to blame for some of that energy. :D
>>>>
>>>> No, we don't have consensus on that direction, but I think we will
>>>> eventually. CQL started out as a subset of SQL but has drifted because
>>>> features get added async and there just hasn't been any formal standard or
>>>> road map or lofty goal. We've been laying out track in front of us as we go
>>>> along. A couple of years ago, I had a conversation at ApacheCon, and it
>>>> started slowly rattling around my brain that the developer experience
>>>> destination is regression to a mean and not charting a bold new path.
>>>> Everything else in Cassandra was already charting plenty of new paths, but
>>>> if we give users a calm, familiar API, that feels like a winner. SQL is
>>>> already a standard and could be our roadmap.
>>>>
>>>> With all that, I've been getting back into the data modeling game and
>>>> seeing how a slightly different syntax can lead to angry users. In
>>>> conversations with users, the idea of being SQL compatible in syntax
>>>> (features can come whenever) is highly appealing. In reality, we are
>>>> talking about CQL4, but why create tech debt for later if we design
>>>> features now? My dream would be to have a compatibility test for Cassandra
>>>> SQL that ticks off what's left for implementation. This is probably a
>>>> DISCUSS thread but here we are now.
>>>>
>>>> Patrick
>>>>
>>>> On Tue, Apr 15, 2025 at 1:47 PM Josh McKenzie <jmcken...@apache.org>
>>>> wrote:
>>>>
>>>>> If we have a goal of eventually providing ANSI SQL support one day, we
>>>>> should at least stick to the ANSI SQL standard where applicable for
>>>>> features in the meantime.
>>>>>
>>>>> Do we collectively have that goal? Not disagreeing with it at all,
>>>>> genuinely curious.
>>>>>
>>>>> Broadly I agree that we should definitely keep ourselves *compatible *with
>>>>> ANSI SQL where possible so as not to paint ourselves into a corner nor
>>>>> alienate our users, but I do believe there's value in having a consistent
>>>>> UX while acknowledging and accommodating that things aren't necessarily
>>>>> consistent in the SQL space.
>>>>>
>>>>> Are you against offering a superset of that standard (i.e. CHECK on
>>>>> all constraints a supported form, ANSI SQL standard on things like IS NOT
>>>>> NULL etc)?
>>>>>
>>>>> On Tue, Apr 15, 2025, at 3:05 AM, Benedict wrote:
>>>>>
>>>>>
>>>>> I am not certain if the ANSI SQL standard requires that any expression
>>>>> be supported, but either way it is much better to implement a strict 
>>>>> subset
>>>>> of a feature than to implement an overlapping feature we can never
>>>>> synchronise with the standard.
>>>>>
>>>>> Accord is also capable of imposing multi column expressions without
>>>>> constraining what user queries can do.
>>>>>
>>>>>
>>>>>
>>>>> On 15 Apr 2025, at 02:10, David Capwell <dcapw...@apple.com> wrote:
>>>>>
>>>>> One argument I have against using columns in the constraint is that
>>>>> we currently are not able to actually handle the CHECK spec in SQL.  In 
>>>>> SQL
>>>>> the CHECK expression is a valid search expression (eg. price > 42 AND tag 
>>>>> =
>>>>> ‘foo’), and is limited to the current row.
>>>>>
>>>>> Where we differ in Apache Cassandra
>>>>>
>>>>> 1) the mutation might not include the columns referenced in the search
>>>>> expression: price numeric CHECK (name IS NOT NULL) (requires each mutation
>>>>> to define all columns referenced in any constraint)
>>>>> 2) our current checks are not normal search expressions (that limit
>>>>> functions to deterministic ones), its a allowed listed set of expressions
>>>>> (fixable)
>>>>>
>>>>> If we actually want to be the SQL version of constraints then we need
>>>>> to put more restrictions on what people can do.
>>>>>
>>>>> For example
>>>>>
>>>>> price numeric CHECK (length(name) > 42)
>>>>>
>>>>> This constraint can only be handled if both price and name are present
>>>>> in the query, so we would need to force all mutations to have price and
>>>>> name, as we wouldn’t know the value of name in the following query
>>>>>
>>>>> INSERT INTO tbl (pk, ck, price) VALUES (0, 0, 0);
>>>>>
>>>>> So, do we put such a limitation on users and force them to define
>>>>> every column present in a constraint, or do we break the SQL spec?  If
>>>>> we choose to break the spec, then why allow column names in the search
>>>>> expressions?  Only “this” column is safe
>>>>>
>>>>> On Apr 14, 2025, at 1:24 PM, Benedict <bened...@apache.org> wrote:
>>>>>
>>>>>
>>>>> If we have a goal of eventually providing ANSI SQL support one day, we
>>>>> should at least stick to the ANSI SQL standard where applicable for
>>>>> features in the meantime. AFAICT the reason everyone else does this the
>>>>> same is it is part of the standard. So, I am more than happy to stick to
>>>>> the CHECK qualifier for all of our unique extensions, but for NOT NULL we
>>>>> should absolutely follow the SQL standard, and for the same reason we
>>>>> should use the field name in the CHECK expression.
>>>>>
>>>>>
>>>>> On 14 Apr 2025, at 21:10, Josh McKenzie <jmcken...@apache.org> wrote:
>>>>>
>>>>> 
>>>>> Consistency *within* our own ecosystem supersedes consistency with
>>>>> other familiar ecosystems IMO.
>>>>>
>>>>> I'd prefer we consistently apply the CHECK keyword and don't have
>>>>> special cases that omit it, or perhaps have those as optional syntactic
>>>>> sugar but at its base the syntax is uniform and consistent.
>>>>>
>>>>> On Mon, Apr 14, 2025, at 3:31 PM, Štefan Miklošovič wrote:
>>>>>
>>>>> To be honest, I agree with Bernardo here.
>>>>>
>>>>> The requirement to have "val int not null" makes sense at first, but
>>>>> it is not so straightforward.
>>>>>
>>>>> I think that what we are trying to do here is to "copy like they have
>>>>> it" because ... well, because they have it like that. And we do not want 
>>>>> to
>>>>> differ too much, because we like adoption and if it is easier for 
>>>>> newcomers
>>>>> who are used to some style already, better to support it as well in this
>>>>> case because, why not, right?
>>>>>
>>>>> Yeah, right ... but we are in a different position than SQL
>>>>> implementation. I already said it, but I think that what happened first 
>>>>> was
>>>>> that they started to support "val int not null" and then they had a need 
>>>>> to
>>>>> support checks as well, so they slapped "val int not null check ..." on
>>>>> that. They could not start from scratch, but we can. So not supporting 
>>>>> "val
>>>>> int not null" in favor of "val int check not null" makes more sense to me.
>>>>> So what if it is different? It is at least consistent.
>>>>>
>>>>> This is a broader discussion to have though. AFAIK, we do not have any
>>>>> official policy which would command us to follow SQL as closely as
>>>>> possible. We have never agreed on following it like SQL has it officially.
>>>>> So "doing it like they have it" as it most a "nice to have" but not a
>>>>> strict requirement.
>>>>>
>>>>> On the other hand, if we ever codified that "we are following SQL"
>>>>> then everything would need to follow it to the t. Not only constraints,
>>>>> every other new feature which would change CQL would need to behave like
>>>>> that.
>>>>>
>>>>> We do not have this policy anywhere afaik so when it comes to that I
>>>>> do not think that supporting "val int not null" is necessary. I think that
>>>>> their mixture of two styles is actually wrong and misleading. Having one
>>>>> way to specify it with "val int check ..." is just a superior solution 
>>>>> here
>>>>> imho.
>>>>>
>>>>> On Mon, Apr 14, 2025 at 9:14 PM Bernardo Botella <
>>>>> conta...@bernardobotella.com> wrote:
>>>>>
>>>>> Now this is becoming a really interesting discussion. Thanks everyone
>>>>> for pitching in!
>>>>>
>>>>> Here is my take on some of the proposed changes:
>>>>>
>>>>> We are talking about treating some constraints (NOT_NULL, JSON) as
>>>>> special cases by omitting the CHECK keyword (not reserved as per current
>>>>> implementation). Now, while this may seem like a nice approach to feature
>>>>> gaps on our CQL, it really worries me that by doing so we open the door to
>>>>> not needed complexity both at implementation and conceptually with the
>>>>> constraints framework.
>>>>>
>>>>> In my mind, what’s the constraints framework? It is a simple and
>>>>> really easy to extend integration point for validators for a row. (LENGHT,
>>>>> SCALAR, REGEX, are really good examples of it).
>>>>>
>>>>> What’s NOT the responsibility of the constraints framework? I don’t
>>>>> think this should be used to deliver partial solutions to feature gaps on
>>>>> CQL data modeling. Let’s take JSON constraint as an example. In the
>>>>> constraints case, it is as simple as checking that the provided string is
>>>>> valid json. Easy. Simple. But, how would JSON look like if it was a first
>>>>> class citizen in CQL? Setting the grammar aside, it would be handled
>>>>> differently probably. Things like: Can we store it better? Do we allow
>>>>> making queries for fields inside the json blob? Are there any 
>>>>> optimizations
>>>>> that can be done when serializing/deserializing it? All of those 
>>>>> definitely
>>>>> fall out of the scope of the constraints framework. So, I guess the
>>>>> question then becomes, is the JSON constraint a valid constraint to have?
>>>>> Just a temporal patch until (if) JSON type is in? Should we just remove it
>>>>> and keep ignoring JSON? Those are valid questions and discussions to have.
>>>>> But, I really think that we shouldn’t see this simple validator as a full
>>>>> fledged, first class citizen, type in CQL. Similar arguments could be have
>>>>> for the NOT_NULL constraint that has spawned so many interesting
>>>>> conversations.
>>>>>
>>>>> Now, having made that distinction, I don’t think we should have
>>>>> constraints that can be defined differently on the CQL statement. They
>>>>> should all have a CHECK keyword, specifying that they are a constraint 
>>>>> that
>>>>> will be checked (aka, row value will be validated against whatever
>>>>> function). That’s easy to identify, and it’s conceptually easy to
>>>>> understand the limitations it comes with (as opposed to the JSON example
>>>>> mentioned above).
>>>>>
>>>>> Bernardo
>>>>>
>>>>>
>>>>>
>>>>> On Apr 14, 2025, at 10:53 AM, Štefan Miklošovič <
>>>>> smikloso...@apache.org> wrote:
>>>>>
>>>>> As Yifan already said, "check" is not a reserved word now and its
>>>>> usage does not collide with anything.
>>>>>
>>>>> If people have columns, tables, keyspaces with name "check" that seems
>>>>> to work already so they don't need to do anything:
>>>>>
>>>>> CREATE TABLE ks.tb (id int check id > 0, val int check val > 0,
>>>>> primary key (id));
>>>>>
>>>>> ALTER TABLE ks.tb ADD check int check check > 0;
>>>>>
>>>>> DESCRIBE ks.tb;
>>>>>
>>>>> CREATE TABLE ks.tb (
>>>>>     id int CHECK id > 0 PRIMARY KEY,
>>>>>     check int CHECK check > 0,
>>>>>     val int CHECK val > 0
>>>>> ) ....
>>>>>
>>>>> CREATE TABLE ks.check (id int check id > 0, check int check check > 0,
>>>>> primary key (id));
>>>>> CREATE KEYSPACE check WITH replication = {'class': 'SimpleStrategy',
>>>>> 'replication_factor': 1};
>>>>> CREATE TABLE check.check (check int check check > 0, val int check val
>>>>> > 0, primary key (check));
>>>>> INSERT INTO check.check (check , val ) VALUES ( 1, 1);
>>>>>
>>>>> PostgreSQL has this:
>>>>>
>>>>> CREATE TABLE products (
>>>>>     product_no integer,
>>>>>     name text,
>>>>>     price numeric CHECK (price > 0)
>>>>> );
>>>>>
>>>>> we follow this approach (minus parenthesis). We can also chain
>>>>> constraints whatever we like
>>>>>
>>>>> val int CHECK val > 0 and age < 100
>>>>>
>>>>> We can make a stab in trying to model
>>>>>
>>>>> val int not null check val > 0
>>>>>
>>>>> this is how PostgreSQL has it (1).
>>>>>
>>>>> but that would be more complicated on the implementation side because
>>>>> we would need to also accommodate "CQL describe" to dump it like that, 
>>>>> plus
>>>>> I am not sure how complicated it would be to tweak the parser as well.
>>>>>
>>>>> I will try to make some progress and will report back.
>>>>>
>>>>> Regards
>>>>>
>>>>> (1)
>>>>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-NOT-NULL
>>>>>
>>>>> On Sun, Apr 13, 2025 at 6:49 PM Dinesh Joshi <djo...@apache.org>
>>>>> wrote:
>>>>>
>>>>> On Sun, Apr 13, 2025 at 9:24 AM Patrick McFadin <pmcfa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> I'm loving all the syntax discussion lately. It's a good debate and
>>>>> essential for the project's future with a good developer experience.
>>>>>
>>>>>
>>>>> +1
>>>>>
>>>>>
>>>>> On NULL. I've been asked this a million times by end users. Why is
>>>>> there no "NOT NULL" in the schema?
>>>>>
>>>>>
>>>>> I would've expected this to be in billions by now ;)
>>>>>
>>>>>
>>>>> I'm in favor of the standard SQL syntax here because it's what users
>>>>> have been using forever:
>>>>> name       text NOT NULL
>>>>>
>>>>>
>>>>> I hold a weak opinion on this. We don't have to necessarily align on
>>>>> to the standard SQL syntax. In my experience, users subconsciously feel
>>>>> Cassandra is a SQL database and try to design their schema to fit the
>>>>> traditional SQL / RDBMS design and then later are disappointed to find out
>>>>> it doesn't have joins or foreign key constraints. But that's just my
>>>>> personal experience working with users. However, I think we should strive
>>>>> for consistency and if it aligns with SQL I have no issues with the 
>>>>> syntax.
>>>>> Just sharing my experience.
>>>>>
>>>>>
>>>>> On CHECK. Also in favor of making this a reserved word but in context.
>>>>> Namely, how Postgres SQL works. CHECK ( boolean_expression_on_column)
>>>>>
>>>>>
>>>>> Making CHECK a reserved keyword may result in issues that Scott
>>>>> described. It will present a substantial barrier for users to upgrade as
>>>>> applications will have to be updated.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Dinesh
>>>>>
>>>>>
>>>>>
>>>>>

Re: Constraint's "not null" alignment with transactions and their simplification

Reply via email to