Re: [DISCUSS] CEP-42: Constraints Framework

Abe Ratnofsky Tue, 25 Jun 2024 11:28:17 -0700

If we're going to introduce a feature that looks like SQL constraints, we 
should make sure it's "reasonably" compliant. In particular, we should avoid 
situations where a user creates a constraint, writes some data, then reads data 
that violates that constraint, unless they've expressed that violations on read 
would be acceptable.


For Postgres, when adding a new constraint you can specify NOT VALID to avoid 
scanning all existing relevant data[1]. If we want to avoid scan-on-DDL, this 
tradeoff needs to be made clear to a user.

As we've already discussed, constraints must deal with operations that appear 
within limits on the write path, but once reconciled on read or during 
compaction can lead to a violation. Adding to non-frozen collections is one 
example. Expecting users to understand the write path for collections feels 
unrealistic to me; I wonder if we should express in the constraint itself that 
it only applies during write.

Anything that uses "nodetool import" (including cassandra-analytics) could 
theoretically push constraint-violating mutations to a table. We could update 
import to scan table contents first, or add a flag to trust the data in 
imported SSTables and make cassandra-analytics executors aware of table-level 
constraints.

Some client implementations read the system_schema tables to build their object 
mappers, I'd like to confirm that nothing will require clients to be aware of 
these new schema constructs.

Overall, I'm supportive of the distinctions discussed between constraints and 
guardrails and like the direction this is heading; I'd just like to make sure 
the more detailed semantics aren't confusing or misleading for our users, and 
semantics are much harder to change in the future.

[1]: https://www.postgresql.org/docs/current/sql-altertable.html

Re: [DISCUSS] CEP-42: Constraints Framework

Reply via email to