I think having JSON validation on existing text fields is a pretty
reasonable idea, regardless if we have a JSON type or not.  I could see
folks wanting to add a JSON constraint to an existing text field, for
example.

I like the idea of a postgres-style JSONB type, but I don't want to derail
this convo into a JSON one.  I'd be happy to see a JSONB added to Cassandra
along with all the functionality that is included in postgres, especially
searching / indexes on JSON fields, I think it should be its own CEP though.

DB Constraints vs Client side logic, I see both aspects here.  I've gone
back and forth over the years on what belongs in the DB vs not, and there's
good arguments to be made for both.  For example, supporting a regex
constraint on a field can be done, but from a cost and
scalability perspective it's way better to do it in the application logic.
However, putting a constraint in like this could make sense in some cases:

```
CREATE TABLE circles (
  key id primary key,
  radius double,
  diameter double,
  CONSTRAINT diameter = 2 * radius
)
```

which is also a (maybe contrived) example of an equality constraint.
There's a good argument to be made in this case that the constraint isn't
what we really need here - it's default values (`circumference double
default radius * 2`), and that's a whole read-before-write can of worms we
probably don't need to get into on this thread.

Jon




On Wed, Jun 12, 2024 at 8:46 AM Abe Ratnofsky <a...@aber.io> wrote:

> Hey Bernardo,
>
> Thanks for the proposal and putting together your summary of the
> discussion. A few thoughts:
>
> I'm not completely convinced of the value of CONSTRAINTS for a database
> like Cassandra, which doesn't support any referential integrity checks,
> doesn't do read-before-write for all queries, and doesn't have a wide
> library of built-in functions.
>
> I'd be a supporter of more BIFs, and that's a solvable problem. String
> size, collection size, timestamp conversions, etc. could all be useful,
> even though there's not much gained over doing them in the client.
>
> With constraints only being applied during write coordination, there's not
> much of an advantage over implementing the equivalent constraints in
> clients. Writes that don't include all columns could violate multi-column
> constraints, like your (a > b) example, for the same reason as
> CASSANDRA-19007 <https://issues.apache.org/jira/browse/CASSANDRA-19007>.
> Constraints could be limited to only apply to frozen columns, where it's
> known that the entire value will be updated at once.
>
> I don't think we should include any constraints where valid user action
> would lead to a violated constraint, like permitting multi-column
> constraints on regular columns or non-frozen types, since they would be too
> prone to mis-use.
>
> Regarding 19007, it could be useful to have a constraint that indicates
> that a subset of columns will always be updated together, since that would
> actually allow Cassandra to know which read queries are safe, and permit a
> fix for 19007 that minimizes the additional data replicas need to send to
> coordinators on ALLOW FILTERING queries. That's a very specific situation
> and shouldn't justify a new framework / API, but might be a useful
> consequence of it.
>
> > - isJson (is the text a json?)
>
> Wouldn't it be more compelling to have a new type, analogous to the
> Postgres JSONB type?
> https://www.postgresql.org/docs/current/datatype-json.html
>
> If we're going to parse the entire JSON blob for validation, we might as
> well store it in an optimized format, support better access patterns, etc.
>

Reply via email to