Re: CEP-15 multi key transaction syntax

2022-06-15 Thread Konstantin Osipov
* bened...@apache.org  [22/06/15 10:00]:
> It sounds like we’re zeroing in on a solution.
> 
> To draw attention back to Jon’s email, I think the last open question at this 
> point is the scope of identifiers declared by LET, and how we handle name 
> clashes with table columns in an UPDATE.
> 
> I think we have basically two options:
> 
> 1. Require LET for all input parameters to an assignment in UPDATE
> 2. Add some additional syntax to local variables to identify them, e.g. 
> 


I'm curious, regardless of the syntax you choose, will LET or
SELECT return the static row if there is no match for the
clustering key, or return NULL row?

I am asking because SELECT currently does not return any rows if
there is no clustering key matching the WHERE clause, but a conditional UPDATE
chooses the static row to check conditions instead, if it's present.

-- 
Konstantin Osipov, Moscow, Russia


Re: CEP-15 multi key transaction syntax

2022-06-15 Thread bened...@apache.org
I expect LET to behave like SELECT, and I don’t expect this work to modify the 
behaviour of normal CQL expressions. Do you think there is something wrong or 
inconsistent about the behaviours you mention?

Static columns are a bit weird, but at the very least the following would 
permit the user to reliably obtain a static value, if it exists:

LET x = some_static_column FROM table WHERE partitionKey = someKey LIMIT 1

This could be mixed with a clustering key query

LET y = some_regular_column FROM table WHERE partitionKey = someKey AND 
clusteringKey = someOtherKey


From: Konstantin Osipov 
Date: Wednesday, 15 June 2022 at 14:04
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
* bened...@apache.org  [22/06/15 10:00]:
> It sounds like we’re zeroing in on a solution.
>
> To draw attention back to Jon’s email, I think the last open question at this 
> point is the scope of identifiers declared by LET, and how we handle name 
> clashes with table columns in an UPDATE.
>
> I think we have basically two options:
>
> 1. Require LET for all input parameters to an assignment in UPDATE
> 2. Add some additional syntax to local variables to identify them, e.g. 
> 


I'm curious, regardless of the syntax you choose, will LET or
SELECT return the static row if there is no match for the
clustering key, or return NULL row?

I am asking because SELECT currently does not return any rows if
there is no clustering key matching the WHERE clause, but a conditional UPDATE
chooses the static row to check conditions instead, if it's present.

--
Konstantin Osipov, Moscow, Russia


Re: Cassandra project biweekly status update 2022-06-14

2022-06-15 Thread Mick Semb Wever
>
> I'm going to jump off the email list to JIRA for this one - we've had a
> discussion ongoing about when we cut a Major vs. a Minor, what qualifies as
> an API, etc on CASSANDRA-16844 (
> https://issues.apache.org/jira/browse/CASSANDRA-16844). Expect something
> to formally hit the dev mailing list about this soon, but until then we can
> keep going on the JIRA ticket.
>


I need to take some blame here, for leading people down a bit of a garden
path.

The idea was that trunk is by default the next minor, and that when a patch
lands that warrants a bump to the next major then the patch includes that
change to build.xml's base.version.

The devil is in the detail here, and it becomes a lot more clearer when
reading CASSANDRA-16844.  I'm appreciative when we can tackle these things
in a lazy manner as they arise as real examples often bring that extra
clarity.

I agree a broader consensus beyond those on the jira ticket should be
sought before committing the patch that bumps a new major. The broader
audience may also help propose better solutions that don't require a major
change (as was done in 16844), and help coordinate with other tickets also
warranting a new major…


Re: Cassandra project biweekly status update 2022-06-14

2022-06-15 Thread bened...@apache.org
> I agree a broader consensus beyond those on the jira ticket should be sought 
> before committing the patch that bumps a new major.

Broader consensus should be sought on any ticket that breaks backwards 
compatibility – even if we already have bumped major version.

A major version bump should NOT be taken as carte blanche to break users, we 
should determine it for eadh case on a balance of benefit/cost.



From: Mick Semb Wever 
Date: Wednesday, 15 June 2022 at 17:44
To: dev@cassandra.apache.org 
Subject: Re: Cassandra project biweekly status update 2022-06-14
I'm going to jump off the email list to JIRA for this one - we've had a 
discussion ongoing about when we cut a Major vs. a Minor, what qualifies as an 
API, etc on CASSANDRA-16844 
(https://issues.apache.org/jira/browse/CASSANDRA-16844). Expect something to 
formally hit the dev mailing list about this soon, but until then we can keep 
going on the JIRA ticket.


I need to take some blame here, for leading people down a bit of a garden path.

The idea was that trunk is by default the next minor, and that when a patch 
lands that warrants a bump to the next major then the patch includes that 
change to build.xml's base.version.

The devil is in the detail here, and it becomes a lot more clearer when reading 
CASSANDRA-16844.  I'm appreciative when we can tackle these things in a lazy 
manner as they arise as real examples often bring that extra clarity.

I agree a broader consensus beyond those on the jira ticket should be sought 
before committing the patch that bumps a new major. The broader audience may 
also help propose better solutions that don't require a major change (as was 
done in 16844), and help coordinate with other tickets also warranting a new 
major…


Re: Cassandra project biweekly status update 2022-06-14

2022-06-15 Thread Dinesh Joshi
Better yet strive to maintain backward compatibility. There are very very few 
occasions where backward compatibility breakage is warranted. 

> On Jun 15, 2022, at 10:59 AM, bened...@apache.org wrote:
> 
> 
> > I agree a broader consensus beyond those on the jira ticket should be 
> > sought before committing the patch that bumps a new major.
>  
> Broader consensus should be sought on any ticket that breaks backwards 
> compatibility – even if we already have bumped major version.
>  
> A major version bump should NOT be taken as carte blanche to break users, we 
> should determine it for eadh case on a balance of benefit/cost.
>  
>  
>  
> From: Mick Semb Wever 
> Date: Wednesday, 15 June 2022 at 17:44
> To: dev@cassandra.apache.org 
> Subject: Re: Cassandra project biweekly status update 2022-06-14
> 
> I'm going to jump off the email list to JIRA for this one - we've had a 
> discussion ongoing about when we cut a Major vs. a Minor, what qualifies as 
> an API, etc on CASSANDRA-16844 
> (https://issues.apache.org/jira/browse/CASSANDRA-16844). Expect something to 
> formally hit the dev mailing list about this soon, but until then we can keep 
> going on the JIRA ticket.
>  
>  
> I need to take some blame here, for leading people down a bit of a garden 
> path.
>  
> The idea was that trunk is by default the next minor, and that when a patch 
> lands that warrants a bump to the next major then the patch includes that 
> change to build.xml's base.version.
>  
> The devil is in the detail here, and it becomes a lot more clearer when 
> reading CASSANDRA-16844.  I'm appreciative when we can tackle these things in 
> a lazy manner as they arise as real examples often bring that extra clarity.
>  
> I agree a broader consensus beyond those on the jira ticket should be sought 
> before committing the patch that bumps a new major. The broader audience may 
> also help propose better solutions that don't require a major change (as was 
> done in 16844), and help coordinate with other tickets also warranting a new 
> major…


Re: CEP-15 multi key transaction syntax

2022-06-15 Thread Konstantin Osipov
* bened...@apache.org  [22/06/15 18:38]:
> I expect LET to behave like SELECT, and I don’t expect this work to modify 
> the behaviour of normal CQL expressions. Do you think there is something 
> wrong or inconsistent about the behaviours you mention?
> 
> Static columns are a bit weird, but at the very least the following would 
> permit the user to reliably obtain a static value, if it exists:
> 
> LET x = some_static_column FROM table WHERE partitionKey = someKey LIMIT 1
> 
> This could be mixed with a clustering key query
> 
> LET y = some_regular_column FROM table WHERE partitionKey = someKey AND 
> clusteringKey = someOtherKey

I think static rows should not be selectable outside clustering
rows. This violates relational model. Unfortunately currently they
sometimes are. 

Here's an example:


> create table t (p int, c int, r int, s int static, primary key(p, c));
OK
> insert into t (p, s) values (1,1) if not exists;
+-+--+--+--+--+
| [applied]   | p| c| s| r|
|-+--+--+--+--|
| True| null | null | null | null |
+-+--+--+--+--+
> -- that's right, there is a row now; what row though?
> select count(*) from t;
+-+
|   count |
|-|
|   1 |
+-+
> -- let's add more rows
> insert into t (p, c, s) values (1,1,1) if not exists;
+-+--+--+--+--+
| [applied]   | p| c| s| r|
|-+--+--+--+--|
| True| null | null | null | null |
+-+--+--+--+--+
> -- we did not add more rows?
> select count(*) from t;
+-+
|   count |
|-|
|   1 |
+-+

In LWT, a static row appears to exist when there is no regular row
matching WHERE. It would be nice to somehow either be consistent
in LET with existing SELECTs, or, so to speak, be consistently
inconsistent, i.e. consistent with some other vendor, and not come
up with a whole new semantics for static rows, different from LWT
and SELECTs.

This is why I was making all these comments about missing rows
-there is no incongruence in classic SQL, any vendor, because a)
there are no static rows b) NULLs are first-class values,
distinguishable from unset values.


-- 
Konstantin Osipov, Moscow, Russia


Re: CEP-15 multi key transaction syntax

2022-06-15 Thread bened...@apache.org
Ok, so I am not a huge fan of static rows, but I disagree with your analysis.

First some history: static rows are an efficiency sop to those who migrated 
from the historical wide row world, where you could have “global” partition 
state fetched with every query, and to support the deprecation of thrift and 
its horrible data model something needed to give – static rows were the result.

However, is the concept generally consistent? I think so. At least, your 
example seem fine to me, and I can’t see how they violate the “relational 
model” (whatever that may be). If it helps, you can think of the static columns 
actually creating a second table, so that you now have two separate tables with 
the same partition key. These tables are implicitly related via a “full outer 
join” on the partition key, and you can imagine that you are generally querying 
a view of this relation.

In this case, you would expect the outcome you see AFAICT. If you have no 
restriction on the results, and you have no regular rows and one static row, 
you would see a single static row result with null regular columns (and a count 
of 1 row). If you imposed a restriction on regular columns, you would not see 
the static column as the null regular columns would not match the condition.

> In LWT, a static row appears to exist when there is no regular row matching 
> WHERE

I assume you mean the IF clause matches against a static row if you UPDATE tbl 
SET v = a WHERE p = b IF s = c. This could be an inconsistency, but I think it 
is not. Recall, UPDATE in CQL is not UPDATE in SQL. SQL would do nothing if the 
row doesn’t exist, whatever the IF clause might say. CQL is really performing 
UPSERT.

So, what happens when the WHERE clause doesn’t match a primary key with UPSERT? 
A row is created. In this case, if you consider that this empty nascent row is 
used to join with the static “table” for evaluating the IF condition, to decide 
what you UPSERT, then it all makes sense – to me, anyway.

> NULLs are first-class values, distinguishable from unset values

Could you give an example?


From: Konstantin Osipov 
Date: Wednesday, 15 June 2022 at 20:56
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
* bened...@apache.org  [22/06/15 18:38]:
> I expect LET to behave like SELECT, and I don’t expect this work to modify 
> the behaviour of normal CQL expressions. Do you think there is something 
> wrong or inconsistent about the behaviours you mention?
>
> Static columns are a bit weird, but at the very least the following would 
> permit the user to reliably obtain a static value, if it exists:
>
> LET x = some_static_column FROM table WHERE partitionKey = someKey LIMIT 1
>
> This could be mixed with a clustering key query
>
> LET y = some_regular_column FROM table WHERE partitionKey = someKey AND 
> clusteringKey = someOtherKey

I think static rows should not be selectable outside clustering
rows. This violates relational model. Unfortunately currently they
sometimes are.

Here's an example:


> create table t (p int, c int, r int, s int static, primary key(p, c));
OK
> insert into t (p, s) values (1,1) if not exists;
+-+--+--+--+--+
| [applied]   | p| c| s| r|
|-+--+--+--+--|
| True| null | null | null | null |
+-+--+--+--+--+
> -- that's right, there is a row now; what row though?
> select count(*) from t;
+-+
|   count |
|-|
|   1 |
+-+
> -- let's add more rows
> insert into t (p, c, s) values (1,1,1) if not exists;
+-+--+--+--+--+
| [applied]   | p| c| s| r|
|-+--+--+--+--|
| True| null | null | null | null |
+-+--+--+--+--+
> -- we did not add more rows?
> select count(*) from t;
+-+
|   count |
|-|
|   1 |
+-+

In LWT, a static row appears to exist when there is no regular row
matching WHERE. It would be nice to somehow either be consistent
in LET with existing SELECTs, or, so to speak, be consistently
inconsistent, i.e. consistent with some other vendor, and not come
up with a whole new semantics for static rows, different from LWT
and SELECTs.

This is why I was making all these comments about missing rows
-there is no incongruence in classic SQL, any vendor, because a)
there are no static rows b) NULLs are first-class values,
distinguishable from unset values.


--
Konstantin Osipov, Moscow, Russia