Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Claude Warren, Jr via dev
>
> 2)
> Is part of an enum is somehow suplying the lack of enum types. Constraint
> could be something like CONSTRAINT belongsToEnum([list of valid values],
> field):
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field),
>   ...
> );
> 3)
> Similarly, we can check and reject if a term is part of a list of blocked
> terms:
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'],
> field),
>   ...
> );


Are these not just "CONSTRAINT inList([List of valid values], field);"  and
"CONSTRAINT not inList([List of valid values], field);"?
At this point doesn't "CONSTRAINT p1 != p2" devolve to "CONSTRAINT not
inList([p1], p2);"?

Can "[List of values]" point to a variable containing a list?  Or does it
require hard coding in the constraint itself?



On Tue, Jun 11, 2024 at 6:23 PM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> Hi Štephan
>
> I'll address the different points:
> 1)
> An example (possibly a stretch) of use case for != constraint would be:
> Let's say you have a table in which you want to record a movement, from
> position p1 to position p2. You may want to check that those two are
> different to make sure there is actual movement.
>
> CREATE TABLE keyspace.table (
>   p1 int,
>   p2 int,
>   ...,
>   CONSTRAINT p1 != p2
> );
>
> For the case of ==, I agree that it is harder to come up with a valid use
> case, and I added it for completion.
>
> 2)
> Is part of an enum is somehow suplying the lack of enum types. Constraint
> could be something like CONSTRAINT belongsToEnum([list of valid values],
> field):
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field),
>   ...
> );
>
> 3)
> Similarly, we can check and reject if a term is part of a list of blocked
> terms:
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'],
> field),
>   ...
> );
>
> Please let me know if this helps,
> Bernardo
>
>
>
> On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
> Hi Bernardo,
>
> 1) Could you elaborate on these two constraints?
>
> == and != ?
>
> What is the use case? Why would I want to have data in a database stored
> in some column which would need to be _same as my constraint_ and which
> _could not_ be same as my constraint? Can you give me at least one example
> of each? It looks like I am going to put a constant into a database in case
> of ==, wouldn't a static column be better?
>
> 2) For examples of text based types you mentioned: "is part of an enum" -
> how would you enforce this in Cassandra? What enum do we have in CQL?
> 3) What does "is it block listed" mean?
>
> In the meanwhile, I made changes to CEP-24 to move transactionality into
> optional features.
>
> On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> Hi everyone,
>>
>> After the feedback, I'd like to make a recap of what we have discussed in
>> this thread and try to move forward with the conversation.
>>
>> I made some clarifications:
>> - Constraints are only applied at write time.
>> - Guardrail configurations should maintain preference over what's being
>> defined as a constraint.
>>
>> *Specify constraints:*
>> There is a general feedback around adding more concrete examples than the
>> ones that can be found on the CEP document.
>> Basically, the initial constraints I am proposing are:
>> - SizeOf Constraint for String types, as in
>> name text CONSTRAINT sizeOf(name) < 256
>>
>> - Value Constraint for numeric types
>> number_of_items int CONSTRAINT number_of_items < 1000
>>
>> Those two alone and combined provide a lot of flexibility, and allow
>> complex validations that enable "new types" such as:
>>
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> )
>>
>> CREATE TYPE keyspace.color (
>>   r int,
>>   g int,
>>   b int,
>>   CONSTRAINT r >= 0,
>>   CONSTRAINT r < 255,
>>   CONSTRAINT g >= 0,
>>   CONSTRAINT g < 255,
>>   CONSTRAINT b >= 0,
>>   CONSTRAINT b < 255,
>> )
>>
>>
>> Those two initial Constraints are de fundamental constraints that would
>> give value to the feature. The framework can (and will) be extended with
>> other Constraints, leaving us with the following:
>>
>> For numeric types:
>> - Max (<)
>> - Min (>)
>> - Equality ( = = )
>> - Difference (!=)
>>
>> For date types:
>> - Before (<)
>> - After (>)
>>
>> For text based types:
>> - Size (sizeOf)
>> - isJson (is the text a json?)
>> - complies with a given pattern
>> - Is it block listed?
>> - Is it part of an enum?
>>
>> General table constraints (including more than one column):
>> - Compare between numeric types (a < b, a > b, a != b, …)
>> - Compare between date types (date1 < date2, date1>date2, date1!=date2, …)
>>
>> I have updated the CEP with this information.
>>
>> *Potential 

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Štefan Miklošovič
My gut feeling is that anything beyond simple comparisons is just too
problematic / complex. I think that this should be part of the application
logic rather than putting that to the database. Is there any major database
out there which has constraints modelled like that? (belongsToEnum,
isNotBlocked, inList ...). It just opens a lot of questions, like how would
we treat nulls? How would this be supported in the driver? Etc ...



On Wed, Jun 12, 2024 at 12:34 PM Claude Warren, Jr via dev <
dev@cassandra.apache.org> wrote:

> 2)
>> Is part of an enum is somehow suplying the lack of enum types. Constraint
>> could be something like CONSTRAINT belongsToEnum([list of valid values],
>> field):
>> CREATE TABLE keyspace.table (
>>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field),
>>   ...
>> );
>> 3)
>> Similarly, we can check and reject if a term is part of a list of blocked
>> terms:
>> CREATE TABLE keyspace.table (
>>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'],
>> field),
>>   ...
>> );
>
>
> Are these not just "CONSTRAINT inList([List of valid values], field);"
> and "CONSTRAINT not inList([List of valid values], field);"?
> At this point doesn't "CONSTRAINT p1 != p2" devolve to "CONSTRAINT not
> inList([p1], p2);"?
>
> Can "[List of values]" point to a variable containing a list?  Or does it
> require hard coding in the constraint itself?
>
>
>
> On Tue, Jun 11, 2024 at 6:23 PM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> Hi Štephan
>>
>> I'll address the different points:
>> 1)
>> An example (possibly a stretch) of use case for != constraint would be:
>> Let's say you have a table in which you want to record a movement, from
>> position p1 to position p2. You may want to check that those two are
>> different to make sure there is actual movement.
>>
>> CREATE TABLE keyspace.table (
>>   p1 int,
>>   p2 int,
>>   ...,
>>   CONSTRAINT p1 != p2
>> );
>>
>> For the case of ==, I agree that it is harder to come up with a valid use
>> case, and I added it for completion.
>>
>> 2)
>> Is part of an enum is somehow suplying the lack of enum types. Constraint
>> could be something like CONSTRAINT belongsToEnum([list of valid values],
>> field):
>> CREATE TABLE keyspace.table (
>>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field),
>>   ...
>> );
>>
>> 3)
>> Similarly, we can check and reject if a term is part of a list of blocked
>> terms:
>> CREATE TABLE keyspace.table (
>>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'],
>> field),
>>   ...
>> );
>>
>> Please let me know if this helps,
>> Bernardo
>>
>>
>>
>> On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič <
>> stefan.mikloso...@gmail.com> wrote:
>>
>> Hi Bernardo,
>>
>> 1) Could you elaborate on these two constraints?
>>
>> == and != ?
>>
>> What is the use case? Why would I want to have data in a database stored
>> in some column which would need to be _same as my constraint_ and which
>> _could not_ be same as my constraint? Can you give me at least one example
>> of each? It looks like I am going to put a constant into a database in case
>> of ==, wouldn't a static column be better?
>>
>> 2) For examples of text based types you mentioned: "is part of an enum" -
>> how would you enforce this in Cassandra? What enum do we have in CQL?
>> 3) What does "is it block listed" mean?
>>
>> In the meanwhile, I made changes to CEP-24 to move transactionality into
>> optional features.
>>
>> On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella <
>> conta...@bernardobotella.com> wrote:
>>
>>> Hi everyone,
>>>
>>> After the feedback, I'd like to make a recap of what we have discussed
>>> in this thread and try to move forward with the conversation.
>>>
>>> I made some clarifications:
>>> - Constraints are only applied at write time.
>>> - Guardrail configurations should maintain preference over what's being
>>> defined as a constraint.
>>>
>>> *Specify constraints:*
>>> There is a general feedback around adding more concrete examples than
>>> the ones that can be found on the CEP document.
>>> Basically, the initial constraints I am proposing are:
>>> - SizeOf Constraint for String types, as in
>>> name text CONSTRAINT sizeOf(name) < 256
>>>
>>> - Value Constraint for numeric types
>>> number_of_items int CONSTRAINT number_of_items < 1000
>>>
>>> Those two alone and combined provide a lot of flexibility, and allow
>>> complex validations that enable "new types" such as:
>>>
>>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>>   ip_adress inet,
>>>   subnet_mask int,
>>>   CONSTRAINT subnet_mask > 0,
>>>   CONSTRAINT subnet_mask < 32
>>> )
>>>
>>> CREATE TYPE keyspace.color (
>>>   r int,
>>>   g int,
>>>   b int,
>>>   CONSTRAINT r >= 0,
>>>   CONSTRAINT r < 255,
>>>   CONSTRAINT g >= 0,
>>>   CONSTRAINT g < 255,
>>>   CONSTRAINT b >= 0,
>>>   CONSTRAINT b < 255,
>>> )
>>>
>>>
>>> Those two initial Constraints are de fundamental constraints that would
>>> give value to the feature. The framewor

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Bernardo Botella
Hi again,

I completely agree that anything beyond simple poses a problem. My point is 
that the definition of simple may vary, and each of those constraints I 
mentioned deserves a conversation on its own. As I previously mentioned on the 
dev thread:
https://lists.apache.org/thread/qln8cbkhlw9j9563p0kl12wrm5w62nq0

I am trying to propose here the two constraints that will add a lot of value to 
the framework (size and value), and illustrating how the framework is to be 
extended.

The final list I proposed can either be expanded (I’m more than happy to hear 
more proposals :-) ) or reduced (you and Claude present very valid points), 
but, I think using this thread to discuss them one by one may derail the 
conversation and make it hard to follow. Having said that, we can leave out 
from the CEP the isList type of constraints and defer it to a future 
conversation if the constraints framework CEP is approved. Once we have the 
basic ones in place, we can have a deeper discussion on this one.

What do you think?


> On Jun 12, 2024, at 3:39 AM, Štefan Miklošovič  
> wrote:
> 
> My gut feeling is that anything beyond simple comparisons is just too 
> problematic / complex. I think that this should be part of the application 
> logic rather than putting that to the database. Is there any major database 
> out there which has constraints modelled like that? (belongsToEnum, 
> isNotBlocked, inList ...). It just opens a lot of questions, like how would 
> we treat nulls? How would this be supported in the driver? Etc ... 
>  
> 
> 
> On Wed, Jun 12, 2024 at 12:34 PM Claude Warren, Jr via dev 
> mailto:dev@cassandra.apache.org>> wrote:
>>> 2)
>>> Is part of an enum is somehow suplying the lack of enum types. Constraint 
>>> could be something like CONSTRAINT belongsToEnum([list of valid values], 
>>> field):
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field), 
>>>   ...
>>> );
>>> 3)
>>> Similarly, we can check and reject if a term is part of a list of blocked 
>>> terms:
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'], 
>>> field), 
>>>   ...
>>> );
>> 
>> Are these not just "CONSTRAINT inList([List of valid values], field);"  and 
>> "CONSTRAINT not inList([List of valid values], field);"?
>> At this point doesn't "CONSTRAINT p1 != p2" devolve to "CONSTRAINT not 
>> inList([p1], p2);"?
>> 
>> Can "[List of values]" point to a variable containing a list?  Or does it 
>> require hard coding in the constraint itself?
>> 
>> 
>> 
>> On Tue, Jun 11, 2024 at 6:23 PM Bernardo Botella 
>> mailto:conta...@bernardobotella.com>> wrote:
>>> Hi Štephan
>>> 
>>> I'll address the different points:
>>> 1)
>>> An example (possibly a stretch) of use case for != constraint would be:
>>> Let's say you have a table in which you want to record a movement, from 
>>> position p1 to position p2. You may want to check that those two are 
>>> different to make sure there is actual movement.
>>> 
>>> CREATE TABLE keyspace.table (
>>>   p1 int, 
>>>   p2 int,
>>>   ...,
>>>   CONSTRAINT p1 != p2
>>> );
>>> 
>>> For the case of ==, I agree that it is harder to come up with a valid use 
>>> case, and I added it for completion.
>>> 
>>> 2)
>>> Is part of an enum is somehow suplying the lack of enum types. Constraint 
>>> could be something like CONSTRAINT belongsToEnum([list of valid values], 
>>> field):
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field), 
>>>   ...
>>> );
>>> 
>>> 3)
>>> Similarly, we can check and reject if a term is part of a list of blocked 
>>> terms:
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'], 
>>> field), 
>>>   ...
>>> );
>>> 
>>> Please let me know if this helps,
>>> Bernardo
>>> 
>>> 
>>> 
 On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič 
 mailto:stefan.mikloso...@gmail.com>> wrote:
 
 Hi Bernardo,
 
 1) Could you elaborate on these two constraints?
 
 == and != ?
 
 What is the use case? Why would I want to have data in a database stored 
 in some column which would need to be _same as my constraint_ and which 
 _could not_ be same as my constraint? Can you give me at least one example 
 of each? It looks like I am going to put a constant into a database in 
 case of ==, wouldn't a static column be better?
 
 2) For examples of text based types you mentioned: "is part of an enum" - 
 how would you enforce this in Cassandra? What enum do we have in CQL?
 3) What does "is it block listed" mean?
 
 In the meanwhile, I made changes to CEP-24 to move transactionality into 
 optional features.
 
 On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella 
 mailto:conta...@bernardobotella.com>> wrote:
> Hi everyone,
> 
> After the feedback, I'd like to make a recap of what we have discussed in 
> this 

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Abe Ratnofsky
Hey Bernardo,

Thanks for the proposal and putting together your summary of the discussion. A 
few thoughts:

I'm not completely convinced of the value of CONSTRAINTS for a database like 
Cassandra, which doesn't support any referential integrity checks, doesn't do 
read-before-write for all queries, and doesn't have a wide library of built-in 
functions.

I'd be a supporter of more BIFs, and that's a solvable problem. String size, 
collection size, timestamp conversions, etc. could all be useful, even though 
there's not much gained over doing them in the client.

With constraints only being applied during write coordination, there's not much 
of an advantage over implementing the equivalent constraints in clients. Writes 
that don't include all columns could violate multi-column constraints, like 
your (a > b) example, for the same reason as CASSANDRA-19007 
. Constraints could be 
limited to only apply to frozen columns, where it's known that the entire value 
will be updated at once.

I don't think we should include any constraints where valid user action would 
lead to a violated constraint, like permitting multi-column constraints on 
regular columns or non-frozen types, since they would be too prone to mis-use.

Regarding 19007, it could be useful to have a constraint that indicates that a 
subset of columns will always be updated together, since that would actually 
allow Cassandra to know which read queries are safe, and permit a fix for 19007 
that minimizes the additional data replicas need to send to coordinators on 
ALLOW FILTERING queries. That's a very specific situation and shouldn't justify 
a new framework / API, but might be a useful consequence of it.

> - isJson (is the text a json?)

Wouldn't it be more compelling to have a new type, analogous to the Postgres 
JSONB type? https://www.postgresql.org/docs/current/datatype-json.html

If we're going to parse the entire JSON blob for validation, we might as well 
store it in an optimized format, support better access patterns, etc.

Re: [Discuss] CEP-24 Password validation and generation

2024-06-12 Thread Bernardo Botella
+1 on Francisco’s comments. TCM is a general feature that a lot of other things 
will benefit from, and the fact that this CEP is one of those that will benefit 
shouldn’t block it from moving forward.

> On Jun 11, 2024, at 11:16 PM, Francisco Guerrero  wrote:
> 
> Stefan, thanks for moving this CEP forward. This CEP brings a lot of value
> to Cassandra without needing to wait for TCM. I can see how a misconfigured
> node can be problematic, but the issue is not something introduced in
> this CEP, and it affects many other features in Cassandra. I think it needs to
> be addressed separately.
> 
> Mature database offerings have functionality that is proposed in your CEP such
> as password strength, and preventing usage of previously used passwords.
> 
> I'm looking forward to see what shape this CEP takes in the coming weeks,
> and also looking forward to the pull request when it lands.
> 
> I think we can even extend this concept to MutualTLS authentication where
> we can impose certain restrictions on certificates. I recently contributed
> https://issues.apache.org/jira/browse/CASSANDRA-18951 to Cassandra to
> add restrictions to the allowed certificate validity period. We can consider 
> having
> CEP-24 as a pluggable way to configure restrictions that are not necessarily
> just scoped for passwords, but more generally to other authentication methods.
> 
> Best,
> - Francisco
> 
> On 2024/06/07 17:58:34 Štefan Miklošovič wrote:
>> Hi Shailaja,
>> 
>> thanks for taking a look at this.
>> 
>> That was indeed just an example we can change. It was more about showing
>> what might be possible in the future, nothing is set in stone yet, as the
>> last sentence "this is not the part of the initial implementation" explains.
>> 
>> When it comes to these very specific features you mentioned, I feel like
>> this is very "business specific" and I do not want to "pollute" Cassandra
>> system tables unnecessarily. It was a long time ago since I was writing
>> that CEP and it made sense to me back than to have a table for previous
>> passwords but then I started to reconsider it because I do not know about
>> any database out there which would offer something similar (correct me if I
>> am wrong) plus I start to question its actual benefit for a database user.
>> We are not trying to mimic the behavior of a website after all. More to it,
>> the password rotation itself is quite a topic and there are opinions that
>> password should not be actually rotated at all. Hence I think that it is
>> not the role of Cassandra to define how passwords are going to be rotated,
>> with what frequency etc. Let's just keep it simple and let's just enforce
>> the password strength itself.
>> 
>> More to this CEP in general, after I read in the other thread about CEP-42
>> that Dinesh does not consider TCM to be a hard requirement for this CEP and
>> he finds it very useful already, I think I will consolidate what I have and
>> I will remove TCM part of that in order to make it happen sooner.
>> 
>> I think I made a mistake by waiting for config in TCM but it was only with
>> good intentions - to provide a comprehensive feature without any
>> compromises. It seems to me that providing a well rounded config in TCM +
>> guardrails in TCM was too much for me to handle and it would take way more
>> time than I anticipated and it will be better if this is a more iterative
>> process. I think that based on where I am with the implementation of
>> guardrails in TCM (POC is basically done) it is more or less just a coding
>> exercise to integrate it into general config in TCM once config in TCM is
>> introduced.
>> 
>> I think I will restructure the current CEP-24 a little bit and I will move
>> more optional features into possible extensions in the future in order to
>> keep the core functionality at the minimum in order to reason about it more
>> easily. I will try to get back to this in the upcoming weeks and I will
>> eventually start a voting thread.
>> 
>> Regards
>> 
>> 
>> On Fri, Jun 7, 2024 at 6:00 PM  wrote:
>> 
>>> Hi Stefan,
>>> 
>>> Thanks for the CEP, sounds great. Regarding
>>> 
>>> If we were about to make this even harder to bypass, we may say that
>>> password can be changed once per day, for example (anytime for a
>>> superuser). Since we have "created" column which is of type timeuuid, we
>>> would check this table and see if there was some password already set that
>>> day or not and fail the request eventually. This is not the part of the
>>> initial implementation.
>>> 
>>> Allowing password change only once a day would be too restrictive and may
>>> create chaos for users. For example, I am trying to file a tax return on
>>> the last day of deadline, I forgot the password I had set last year, now
>>> changed it. Assume I forgot the password I just set either due to an
>>> unclear/faulty website or due to my bad memory with stress to file tax
>>> returns on the last day. In that case either I should be able to change t

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Jon Haddad
I think having JSON validation on existing text fields is a pretty
reasonable idea, regardless if we have a JSON type or not.  I could see
folks wanting to add a JSON constraint to an existing text field, for
example.

I like the idea of a postgres-style JSONB type, but I don't want to derail
this convo into a JSON one.  I'd be happy to see a JSONB added to Cassandra
along with all the functionality that is included in postgres, especially
searching / indexes on JSON fields, I think it should be its own CEP though.

DB Constraints vs Client side logic, I see both aspects here.  I've gone
back and forth over the years on what belongs in the DB vs not, and there's
good arguments to be made for both.  For example, supporting a regex
constraint on a field can be done, but from a cost and
scalability perspective it's way better to do it in the application logic.
However, putting a constraint in like this could make sense in some cases:

```
CREATE TABLE circles (
  key id primary key,
  radius double,
  diameter double,
  CONSTRAINT diameter = 2 * radius
)
```

which is also a (maybe contrived) example of an equality constraint.
There's a good argument to be made in this case that the constraint isn't
what we really need here - it's default values (`circumference double
default radius * 2`), and that's a whole read-before-write can of worms we
probably don't need to get into on this thread.

Jon




On Wed, Jun 12, 2024 at 8:46 AM Abe Ratnofsky  wrote:

> Hey Bernardo,
>
> Thanks for the proposal and putting together your summary of the
> discussion. A few thoughts:
>
> I'm not completely convinced of the value of CONSTRAINTS for a database
> like Cassandra, which doesn't support any referential integrity checks,
> doesn't do read-before-write for all queries, and doesn't have a wide
> library of built-in functions.
>
> I'd be a supporter of more BIFs, and that's a solvable problem. String
> size, collection size, timestamp conversions, etc. could all be useful,
> even though there's not much gained over doing them in the client.
>
> With constraints only being applied during write coordination, there's not
> much of an advantage over implementing the equivalent constraints in
> clients. Writes that don't include all columns could violate multi-column
> constraints, like your (a > b) example, for the same reason as
> CASSANDRA-19007 .
> Constraints could be limited to only apply to frozen columns, where it's
> known that the entire value will be updated at once.
>
> I don't think we should include any constraints where valid user action
> would lead to a violated constraint, like permitting multi-column
> constraints on regular columns or non-frozen types, since they would be too
> prone to mis-use.
>
> Regarding 19007, it could be useful to have a constraint that indicates
> that a subset of columns will always be updated together, since that would
> actually allow Cassandra to know which read queries are safe, and permit a
> fix for 19007 that minimizes the additional data replicas need to send to
> coordinators on ALLOW FILTERING queries. That's a very specific situation
> and shouldn't justify a new framework / API, but might be a useful
> consequence of it.
>
> > - isJson (is the text a json?)
>
> Wouldn't it be more compelling to have a new type, analogous to the
> Postgres JSONB type?
> https://www.postgresql.org/docs/current/datatype-json.html
>
> If we're going to parse the entire JSON blob for validation, we might as
> well store it in an optimized format, support better access patterns, etc.
>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Abe Ratnofsky
I've thought about this some more. It would be useful for Cassandra to support 
user-defined "guardrails" (or constraints, whatever you want to call them), 
that could be applied per keyspace or table. Whether a user or an operator is 
considered the owner of a table depends on the organization deploying 
Cassandra, so allowing both parties to protect their tables against mis-use 
seems good to me, especially for large multi-tenant clusters with diverse 
workloads.

For example, it would be really useful if a user could set the 
Guardrails.{read,write}ConsistencyLevels for their tables, or declare whether 
all operations should be over LWTs to avoid mixing regular and LWT workloads.

I'm hesitant about adding lots of expression syntax to the CONSTRAINT clause. I 
think I'd prefer a function calling syntax that represents:
1. Whether the constraint is system / keyspace / table scoped
2. Where in query processing the constraint is checked
3. What is executed by the check

Re: [DISCUSS] LWT conditions behavior on collections is inconsistent (CASSANDRA-19637)

2024-06-12 Thread Ekaterina Dimitrova
Hey Benjamin,

Thanks for the fix and raising the point here. The patch seems reasonable
to me. The only thing to confirm on my mind is whether people consider it
behavior breaking change or only a bug fix. Based on that we can move
forward with patch for 4.0+ or only trunk.

It will be also good to ensure we have similar behavior in Accord.

Thoughts, anyone?

Best regards,
Ekaterina

On Tue, 11 Jun 2024 at 18:35, Benjamin Lerer  wrote:

> Hi everybody,
>
> I wanted to raise attention to CASSANDRA-19637
>  that I already
> mentioned in the "[DISCUSS] NULL handling and the unfrozen collection
> issue" thread.
>
> The patch attempts to fix some inconsistencies in the way LWT conditions
> work with collections when being compared to null values.
>
> In case you feel that the solution is not going in the right direction or
> you believe that there are some valid use cases that will be impacted by
> this change in behavior feel free to raise the problem.
>