Maybe I misremember the convo, but I thought both of these was possible: CREATE table test ( id int primary key val int check val > 0, val2 int )
INSERT INTO test (id) values (1); ## inserts with empty val UPDATE test set val2 = 1; ## inserts with empty val Are you saying supplying val is required in both cases? In the second case, does a user always have to overwrite the field? Jon On Wed, Apr 16, 2025 at 11:43 PM Štefan Miklošovič <smikloso...@apache.org> wrote: > _A bigger issue, is the actual behavior, in that you can say NOT NULL and > still easily get null values by not supplying the value. The constraint is > more of a weakly applied suggestion._ > > This is not true. You can not really put there null. See other discussion > about this behaviour on ML. > > On Wed, Apr 16, 2025 at 6:36 PM Jon Haddad <j...@rustyrazorblade.com> > wrote: > >> I think the world is fairly used to NOT NULL in line with the type. I >> don't see much value in paving our own path for things that are established >> norms. Let's not reinvent wheels that don't need reinventing. >> >> I also think it's fine if the generated CQL from describe doesn't match >> the user's input. We already do this with all the table level fields that >> get added. I'm not sure why we should care if a user drifts away from not >> null either. I think these are pretty irrelevant details in the grand >> scheme of things. >> >> A bigger issue, is the actual behavior, in that you can say NOT NULL and >> still easily get null values by not supplying the value. The constraint is >> more of a weakly applied suggestion. I'm expecting a bit of confusion >> around this. I wonder if it makes sense to not allow prepared INSERT >> statements that don't contain the field if it's null constrained, although >> it doesn't really help us in the case of an upsert. It does narrow the >> edge case a bit, which could be argued is good or bad depending on >> someone's mood. >> >> I think it should be made clear that what we're really giving the user >> when we say `value check not null` is this: >> >> INSERT INTO blah (id, value) VALUES(?, ? not null); >> >> Does accord change the behavior of INSERT to actually mean INSERT and not >> UPSERT, or do you still have to specify IF NOT EXISTS? >> >> Jon >> >> >> >> On Wed, Apr 16, 2025 at 12:50 AM Štefan Miklošovič < >> smikloso...@apache.org> wrote: >> >>> Maybe I am too naive and idealist but I can see a world where, even we >>> enable them to do it like "val int not null", after they describe a table >>> and they see "val int check not null" they start to use the latter form >>> because they would be tired of switching two forms in their head all the >>> time when they see that all other checks they want to specify have to be >>> specified in "val int check ..." form. They might actually start to >>> appreciate the unified fashion of it and they will themselves drift away >>> from "val int not null" eventually. I would :D. >>> >>> As we are among leaders in NoSQL databases we have also a great leverage >>> over how stuff is going to be used and we can actively form this space. We >>> do not need to just follow. >>> >>> I am open to supporting specifying "val int not null" _together with_ >>> "val int check not null" form. Not _exclusively_ SQL syntax only. This >>> syntax sugar can go in anytime and it is more or less purely "additive". >>> Even if a user specifies it like "val int not null" it will be internally >>> stored as "val int check not null" to be consistent with how it is going to >>> be represented internally with all other checks. NOT NULL will ultimately >>> be a check as any other we have, it is just how it might be defined on the >>> CQL level which we are arguing about here. >>> >>> On Wed, Apr 16, 2025 at 12:56 AM Patrick McFadin <pmcfa...@gmail.com> >>> wrote: >>> >>>> I may be to blame for some of that energy. :D >>>> >>>> No, we don't have consensus on that direction, but I think we will >>>> eventually. CQL started out as a subset of SQL but has drifted because >>>> features get added async and there just hasn't been any formal standard or >>>> road map or lofty goal. We've been laying out track in front of us as we go >>>> along. A couple of years ago, I had a conversation at ApacheCon, and it >>>> started slowly rattling around my brain that the developer experience >>>> destination is regression to a mean and not charting a bold new path. >>>> Everything else in Cassandra was already charting plenty of new paths, but >>>> if we give users a calm, familiar API, that feels like a winner. SQL is >>>> already a standard and could be our roadmap. >>>> >>>> With all that, I've been getting back into the data modeling game and >>>> seeing how a slightly different syntax can lead to angry users. In >>>> conversations with users, the idea of being SQL compatible in syntax >>>> (features can come whenever) is highly appealing. In reality, we are >>>> talking about CQL4, but why create tech debt for later if we design >>>> features now? My dream would be to have a compatibility test for Cassandra >>>> SQL that ticks off what's left for implementation. This is probably a >>>> DISCUSS thread but here we are now. >>>> >>>> Patrick >>>> >>>> On Tue, Apr 15, 2025 at 1:47 PM Josh McKenzie <jmcken...@apache.org> >>>> wrote: >>>> >>>>> If we have a goal of eventually providing ANSI SQL support one day, we >>>>> should at least stick to the ANSI SQL standard where applicable for >>>>> features in the meantime. >>>>> >>>>> Do we collectively have that goal? Not disagreeing with it at all, >>>>> genuinely curious. >>>>> >>>>> Broadly I agree that we should definitely keep ourselves *compatible *with >>>>> ANSI SQL where possible so as not to paint ourselves into a corner nor >>>>> alienate our users, but I do believe there's value in having a consistent >>>>> UX while acknowledging and accommodating that things aren't necessarily >>>>> consistent in the SQL space. >>>>> >>>>> Are you against offering a superset of that standard (i.e. CHECK on >>>>> all constraints a supported form, ANSI SQL standard on things like IS NOT >>>>> NULL etc)? >>>>> >>>>> On Tue, Apr 15, 2025, at 3:05 AM, Benedict wrote: >>>>> >>>>> >>>>> I am not certain if the ANSI SQL standard requires that any expression >>>>> be supported, but either way it is much better to implement a strict >>>>> subset >>>>> of a feature than to implement an overlapping feature we can never >>>>> synchronise with the standard. >>>>> >>>>> Accord is also capable of imposing multi column expressions without >>>>> constraining what user queries can do. >>>>> >>>>> >>>>> >>>>> On 15 Apr 2025, at 02:10, David Capwell <dcapw...@apple.com> wrote: >>>>> >>>>> One argument I have against using columns in the constraint is that >>>>> we currently are not able to actually handle the CHECK spec in SQL. In >>>>> SQL >>>>> the CHECK expression is a valid search expression (eg. price > 42 AND tag >>>>> = >>>>> ‘foo’), and is limited to the current row. >>>>> >>>>> Where we differ in Apache Cassandra >>>>> >>>>> 1) the mutation might not include the columns referenced in the search >>>>> expression: price numeric CHECK (name IS NOT NULL) (requires each mutation >>>>> to define all columns referenced in any constraint) >>>>> 2) our current checks are not normal search expressions (that limit >>>>> functions to deterministic ones), its a allowed listed set of expressions >>>>> (fixable) >>>>> >>>>> If we actually want to be the SQL version of constraints then we need >>>>> to put more restrictions on what people can do. >>>>> >>>>> For example >>>>> >>>>> price numeric CHECK (length(name) > 42) >>>>> >>>>> This constraint can only be handled if both price and name are present >>>>> in the query, so we would need to force all mutations to have price and >>>>> name, as we wouldn’t know the value of name in the following query >>>>> >>>>> INSERT INTO tbl (pk, ck, price) VALUES (0, 0, 0); >>>>> >>>>> So, do we put such a limitation on users and force them to define >>>>> every column present in a constraint, or do we break the SQL spec? If >>>>> we choose to break the spec, then why allow column names in the search >>>>> expressions? Only “this” column is safe >>>>> >>>>> On Apr 14, 2025, at 1:24 PM, Benedict <bened...@apache.org> wrote: >>>>> >>>>> >>>>> If we have a goal of eventually providing ANSI SQL support one day, we >>>>> should at least stick to the ANSI SQL standard where applicable for >>>>> features in the meantime. AFAICT the reason everyone else does this the >>>>> same is it is part of the standard. So, I am more than happy to stick to >>>>> the CHECK qualifier for all of our unique extensions, but for NOT NULL we >>>>> should absolutely follow the SQL standard, and for the same reason we >>>>> should use the field name in the CHECK expression. >>>>> >>>>> >>>>> On 14 Apr 2025, at 21:10, Josh McKenzie <jmcken...@apache.org> wrote: >>>>> >>>>> >>>>> Consistency *within* our own ecosystem supersedes consistency with >>>>> other familiar ecosystems IMO. >>>>> >>>>> I'd prefer we consistently apply the CHECK keyword and don't have >>>>> special cases that omit it, or perhaps have those as optional syntactic >>>>> sugar but at its base the syntax is uniform and consistent. >>>>> >>>>> On Mon, Apr 14, 2025, at 3:31 PM, Štefan Miklošovič wrote: >>>>> >>>>> To be honest, I agree with Bernardo here. >>>>> >>>>> The requirement to have "val int not null" makes sense at first, but >>>>> it is not so straightforward. >>>>> >>>>> I think that what we are trying to do here is to "copy like they have >>>>> it" because ... well, because they have it like that. And we do not want >>>>> to >>>>> differ too much, because we like adoption and if it is easier for >>>>> newcomers >>>>> who are used to some style already, better to support it as well in this >>>>> case because, why not, right? >>>>> >>>>> Yeah, right ... but we are in a different position than SQL >>>>> implementation. I already said it, but I think that what happened first >>>>> was >>>>> that they started to support "val int not null" and then they had a need >>>>> to >>>>> support checks as well, so they slapped "val int not null check ..." on >>>>> that. They could not start from scratch, but we can. So not supporting >>>>> "val >>>>> int not null" in favor of "val int check not null" makes more sense to me. >>>>> So what if it is different? It is at least consistent. >>>>> >>>>> This is a broader discussion to have though. AFAIK, we do not have any >>>>> official policy which would command us to follow SQL as closely as >>>>> possible. We have never agreed on following it like SQL has it officially. >>>>> So "doing it like they have it" as it most a "nice to have" but not a >>>>> strict requirement. >>>>> >>>>> On the other hand, if we ever codified that "we are following SQL" >>>>> then everything would need to follow it to the t. Not only constraints, >>>>> every other new feature which would change CQL would need to behave like >>>>> that. >>>>> >>>>> We do not have this policy anywhere afaik so when it comes to that I >>>>> do not think that supporting "val int not null" is necessary. I think that >>>>> their mixture of two styles is actually wrong and misleading. Having one >>>>> way to specify it with "val int check ..." is just a superior solution >>>>> here >>>>> imho. >>>>> >>>>> On Mon, Apr 14, 2025 at 9:14 PM Bernardo Botella < >>>>> conta...@bernardobotella.com> wrote: >>>>> >>>>> Now this is becoming a really interesting discussion. Thanks everyone >>>>> for pitching in! >>>>> >>>>> Here is my take on some of the proposed changes: >>>>> >>>>> We are talking about treating some constraints (NOT_NULL, JSON) as >>>>> special cases by omitting the CHECK keyword (not reserved as per current >>>>> implementation). Now, while this may seem like a nice approach to feature >>>>> gaps on our CQL, it really worries me that by doing so we open the door to >>>>> not needed complexity both at implementation and conceptually with the >>>>> constraints framework. >>>>> >>>>> In my mind, what’s the constraints framework? It is a simple and >>>>> really easy to extend integration point for validators for a row. (LENGHT, >>>>> SCALAR, REGEX, are really good examples of it). >>>>> >>>>> What’s NOT the responsibility of the constraints framework? I don’t >>>>> think this should be used to deliver partial solutions to feature gaps on >>>>> CQL data modeling. Let’s take JSON constraint as an example. In the >>>>> constraints case, it is as simple as checking that the provided string is >>>>> valid json. Easy. Simple. But, how would JSON look like if it was a first >>>>> class citizen in CQL? Setting the grammar aside, it would be handled >>>>> differently probably. Things like: Can we store it better? Do we allow >>>>> making queries for fields inside the json blob? Are there any >>>>> optimizations >>>>> that can be done when serializing/deserializing it? All of those >>>>> definitely >>>>> fall out of the scope of the constraints framework. So, I guess the >>>>> question then becomes, is the JSON constraint a valid constraint to have? >>>>> Just a temporal patch until (if) JSON type is in? Should we just remove it >>>>> and keep ignoring JSON? Those are valid questions and discussions to have. >>>>> But, I really think that we shouldn’t see this simple validator as a full >>>>> fledged, first class citizen, type in CQL. Similar arguments could be have >>>>> for the NOT_NULL constraint that has spawned so many interesting >>>>> conversations. >>>>> >>>>> Now, having made that distinction, I don’t think we should have >>>>> constraints that can be defined differently on the CQL statement. They >>>>> should all have a CHECK keyword, specifying that they are a constraint >>>>> that >>>>> will be checked (aka, row value will be validated against whatever >>>>> function). That’s easy to identify, and it’s conceptually easy to >>>>> understand the limitations it comes with (as opposed to the JSON example >>>>> mentioned above). >>>>> >>>>> Bernardo >>>>> >>>>> >>>>> >>>>> On Apr 14, 2025, at 10:53 AM, Štefan Miklošovič < >>>>> smikloso...@apache.org> wrote: >>>>> >>>>> As Yifan already said, "check" is not a reserved word now and its >>>>> usage does not collide with anything. >>>>> >>>>> If people have columns, tables, keyspaces with name "check" that seems >>>>> to work already so they don't need to do anything: >>>>> >>>>> CREATE TABLE ks.tb (id int check id > 0, val int check val > 0, >>>>> primary key (id)); >>>>> >>>>> ALTER TABLE ks.tb ADD check int check check > 0; >>>>> >>>>> DESCRIBE ks.tb; >>>>> >>>>> CREATE TABLE ks.tb ( >>>>> id int CHECK id > 0 PRIMARY KEY, >>>>> check int CHECK check > 0, >>>>> val int CHECK val > 0 >>>>> ) .... >>>>> >>>>> CREATE TABLE ks.check (id int check id > 0, check int check check > 0, >>>>> primary key (id)); >>>>> CREATE KEYSPACE check WITH replication = {'class': 'SimpleStrategy', >>>>> 'replication_factor': 1}; >>>>> CREATE TABLE check.check (check int check check > 0, val int check val >>>>> > 0, primary key (check)); >>>>> INSERT INTO check.check (check , val ) VALUES ( 1, 1); >>>>> >>>>> PostgreSQL has this: >>>>> >>>>> CREATE TABLE products ( >>>>> product_no integer, >>>>> name text, >>>>> price numeric CHECK (price > 0) >>>>> ); >>>>> >>>>> we follow this approach (minus parenthesis). We can also chain >>>>> constraints whatever we like >>>>> >>>>> val int CHECK val > 0 and age < 100 >>>>> >>>>> We can make a stab in trying to model >>>>> >>>>> val int not null check val > 0 >>>>> >>>>> this is how PostgreSQL has it (1). >>>>> >>>>> but that would be more complicated on the implementation side because >>>>> we would need to also accommodate "CQL describe" to dump it like that, >>>>> plus >>>>> I am not sure how complicated it would be to tweak the parser as well. >>>>> >>>>> I will try to make some progress and will report back. >>>>> >>>>> Regards >>>>> >>>>> (1) >>>>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-NOT-NULL >>>>> >>>>> On Sun, Apr 13, 2025 at 6:49 PM Dinesh Joshi <djo...@apache.org> >>>>> wrote: >>>>> >>>>> On Sun, Apr 13, 2025 at 9:24 AM Patrick McFadin <pmcfa...@gmail.com> >>>>> wrote: >>>>> >>>>> I'm loving all the syntax discussion lately. It's a good debate and >>>>> essential for the project's future with a good developer experience. >>>>> >>>>> >>>>> +1 >>>>> >>>>> >>>>> On NULL. I've been asked this a million times by end users. Why is >>>>> there no "NOT NULL" in the schema? >>>>> >>>>> >>>>> I would've expected this to be in billions by now ;) >>>>> >>>>> >>>>> I'm in favor of the standard SQL syntax here because it's what users >>>>> have been using forever: >>>>> name text NOT NULL >>>>> >>>>> >>>>> I hold a weak opinion on this. We don't have to necessarily align on >>>>> to the standard SQL syntax. In my experience, users subconsciously feel >>>>> Cassandra is a SQL database and try to design their schema to fit the >>>>> traditional SQL / RDBMS design and then later are disappointed to find out >>>>> it doesn't have joins or foreign key constraints. But that's just my >>>>> personal experience working with users. However, I think we should strive >>>>> for consistency and if it aligns with SQL I have no issues with the >>>>> syntax. >>>>> Just sharing my experience. >>>>> >>>>> >>>>> On CHECK. Also in favor of making this a reserved word but in context. >>>>> Namely, how Postgres SQL works. CHECK ( boolean_expression_on_column) >>>>> >>>>> >>>>> Making CHECK a reserved keyword may result in issues that Scott >>>>> described. It will present a substantial barrier for users to upgrade as >>>>> applications will have to be updated. >>>>> >>>>> Thanks, >>>>> >>>>> Dinesh >>>>> >>>>> >>>>> >>>>>