Re: [Marketing] For Review: Changelog blog #16 for May

2022-06-06 Thread Chris Thornett
Yes, sorry, I'd locked it down. I've added you as an editor if you need to
make any more changes.

On Wed, Jun 1, 2022 at 4:22 AM Erick Ramirez 
wrote:

> Chris, I just realised that no one had access to the doc.
>>
>
> Or maybe you've removed access since the review period has passed. My bad.
> 🙂
>


Re: CEP-15 multi key transaction syntax

2022-06-06 Thread Henrik Ingo
Thank you Blake and team!

Just some personal reactions and thoughts...

First instinct is to support the shorter format where UPDATE ... AS car  is
also its own implicit select.

However, a subtle thing to note is that a reasonable user might expect that
in a sequence of multiple UPDATEs, each of them is also read at the
position where the UPDATE is in the list of statements. The fact that
Accord executes all reads first is not at all obvious from the syntax. One
way to make it obvious is to require the user to explicitly type the
SELECTs and then to require that all SELECTs appear before
UPDATE/INSERT/DELETE.


I like the idea of a RETURN or RETURNING keyword to specify what exactly
you want to return. This would allow to also return results from
UPDATE/INSERT since the user explicitly told us to do so.

Returning the "result" from an UPDATE presents the question should it be
the data at the start of the transaction or end state? Interestingly the
MongoDB $findAndModify operation allows you to choose between both options.
There seems to be a valid use case for both. The obvious examples are:

  UPDATE t SET c=100 WHERE id=1 AS t RETURNING BEFORE c;
COMMIT TRANSACTION IF t.c <= 100;

I want to know the value of what c was before I replaced with a new value.

  INSERT INTO t (c) VALUES (100) AS t RETURNING AFTER d;
COMMIT TRANSACTION IF t.c <= 100;

I want to know the defaulted value of d. (...as was already pointed out in
another email.)

  UPDATE t SET c+=1 WHERE id=1 AS t RETURNING AFTER c;
COMMIT TRANSACTION IF t.c <= 100;

I want to know the result of c after the transaction. (Which I know will be
at most 100, but I want to know exactly.)

I kind of sympathize with the intuitive opinion that we should return the
values from the start of the transaction, since that's how Accord works:
reads first, updates second.


Finally, I wanted to share a thought on how to implement the returning of
multiple result sets. While you don't address it, I'm assuming the driver
api will get new functionality where you can get a specific result set out
of many.

I was thinking the following coordinator-side implementation would allow to
use also old drivers:

BEGIN TRANSACTION;
   SELECT * FROM table1 WHERE  AS t1;
   SELECT * FROM table2 WHERE  AS t2;
   UPDATE something...
COMMIT TRANSACTION;
SELECT * FROM t1;
SELECT * FROM t2;

The coordinator-level implementation here would be to store the results of
the SELECTs inside a transaction into temporary tables that the client can
the read from after the transaction. Even if those later selects are
outside the transaction, their contents would be a constant snapshot
representing the state of those rows at the time of the transaction. The
tables should be visible only to the same client session and until the
start of the next transaction or a timeout, whichever comes first.

henrik




On Fri, Jun 3, 2022 at 6:39 PM Blake Eggleston  wrote:

> Hi dev@,
>
> I’ve been working on a draft syntax for Accord transactions and wanted to
> bring what I have to the dev list to solicit feedback and build consensus
> before moving forward with it. The proposed transaction syntax is intended
> to be an extended batch syntax. Basically batches with selects, and an
> optional condition at the end. To facilitate conditions against an
> arbitrary number of select statements, you can also name the statements,
> and reference columns in the results. To cut down on the number of
> operations needed, select values can also be used in updates, including
> some math operations. Parameterization of literals is supported the same as
> other statements.
>
> Here's an example selecting a row from 2 tables, and issuing updates for
> each row if a condition is met:
>
> BEGIN TRANSACTION;
>   SELECT * FROM users WHERE name='blake' AS user;
>   SELECT * from cars WHERE model='pinto' AS car;
>   UPDATE users SET miles_driven = user.miles_driven + 30 WHERE
> name='blake';
>   UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE model='pinto';
> COMMIT TRANSACTION IF car.is_running;
>
> This can be simplified by naming the updates with an AS  syntax. If
> updates are named, a corresponding read is generated behind the scenes and
> its values inform the update.
>
> Here's an example, the query is functionally identical to the previous
> query. In the case of the user update, a read is still performed behind the
> scenes to enable the calculation of miles_driven + 30, but doesn't need to
> be named since it's not referenced anywhere else.
>
> BEGIN TRANSACTION;
>   UPDATE users SET miles_driven += 30 WHERE name='blake';
>   UPDATE cars SET miles_driven += 30 WHERE model='pinto' AS car;
> COMMIT TRANSACTION IF car.is_running;
>
> Here’s another example, performing the canonical bank transfer:
>
> BEGIN TRANSACTION;
>   UPDATE accounts SET balance += 100 WHERE name='blake' AS blake;
>   UPDATE accounts SET balance -= 100 WHERE name='benedict' AS benedict;
> COMMIT TRANSACTION

Re: CEP-15 multi key transaction syntax

2022-06-06 Thread bened...@apache.org
> One way to make it obvious is to require the user to explicitly type the 
> SELECTs and then to require that all SELECTs appear before 
> UPDATE/INSERT/DELETE.

Yes, I agree that SELECT statements should be required to go first.

However, I think this is sufficient and we can retain the shorter format for 
RETURNING. There only remains the issue of conditions imposed upon 
UPDATE/INSERT/DELETE statements when there are multiple statements that affect 
the same primary key. I think we can (and should) simply reject such queries 
for now, as it doesn’t make much sense to have multiple statements for the same 
primary key in the same transaction.

> Returning the "result" from an UPDATE presents the question should it be the 
> data at the start of the transaction or end state?

I am inclined to only return the new values (as proposed by Alex) for the 
purpose of returning new auto-increment values etc. If you require the prior 
value, SELECT is available to express this.

> I was thinking the following coordinator-side implementation would allow to 
> use also old drivers

I am inclined to return just the first result set to old clients. I think it’s 
fine to require a client upgrade to get multiple result sets.


From: Henrik Ingo 
Date: Monday, 6 June 2022 at 15:18
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Thank you Blake and team!

Just some personal reactions and thoughts...

First instinct is to support the shorter format where UPDATE ... AS car  is 
also its own implicit select.

However, a subtle thing to note is that a reasonable user might expect that in 
a sequence of multiple UPDATEs, each of them is also read at the position where 
the UPDATE is in the list of statements. The fact that Accord executes all 
reads first is not at all obvious from the syntax. One way to make it obvious 
is to require the user to explicitly type the SELECTs and then to require that 
all SELECTs appear before UPDATE/INSERT/DELETE.


I like the idea of a RETURN or RETURNING keyword to specify what exactly you 
want to return. This would allow to also return results from UPDATE/INSERT 
since the user explicitly told us to do so.

Returning the "result" from an UPDATE presents the question should it be the 
data at the start of the transaction or end state? Interestingly the MongoDB 
$findAndModify operation allows you to choose between both options. There seems 
to be a valid use case for both. The obvious examples are:

  UPDATE t SET c=100 WHERE id=1 AS t RETURNING BEFORE c;
COMMIT TRANSACTION IF t.c <= 100;

I want to know the value of what c was before I replaced with a new value.

  INSERT INTO t (c) VALUES (100) AS t RETURNING AFTER d;
COMMIT TRANSACTION IF t.c <= 100;

I want to know the defaulted value of d. (...as was already pointed out in 
another email.)

  UPDATE t SET c+=1 WHERE id=1 AS t RETURNING AFTER c;
COMMIT TRANSACTION IF t.c <= 100;

I want to know the result of c after the transaction. (Which I know will be at 
most 100, but I want to know exactly.)

I kind of sympathize with the intuitive opinion that we should return the 
values from the start of the transaction, since that's how Accord works: reads 
first, updates second.


Finally, I wanted to share a thought on how to implement the returning of 
multiple result sets. While you don't address it, I'm assuming the driver api 
will get new functionality where you can get a specific result set out of many.

I was thinking the following coordinator-side implementation would allow to use 
also old drivers:

BEGIN TRANSACTION;
   SELECT * FROM table1 WHERE  AS t1;
   SELECT * FROM table2 WHERE  AS t2;
   UPDATE something...
COMMIT TRANSACTION;
SELECT * FROM t1;
SELECT * FROM t2;

The coordinator-level implementation here would be to store the results of the 
SELECTs inside a transaction into temporary tables that the client can the read 
from after the transaction. Even if those later selects are outside the 
transaction, their contents would be a constant snapshot representing the state 
of those rows at the time of the transaction. The tables should be visible only 
to the same client session and until the start of the next transaction or a 
timeout, whichever comes first.

henrik




On Fri, Jun 3, 2022 at 6:39 PM Blake Eggleston 
mailto:beggles...@apple.com>> wrote:
Hi dev@,

I’ve been working on a draft syntax for Accord transactions and wanted to bring 
what I have to the dev list to solicit feedback and build consensus before 
moving forward with it. The proposed transaction syntax is intended to be an 
extended batch syntax. Basically batches with selects, and an optional 
condition at the end. To facilitate conditions against an arbitrary number of 
select statements, you can also name the statements, and reference columns in 
the results. To cut down on the number of operations needed, select values can 
also be used in updates, including some math operations. Parameteri

Re: CEP-15 multi key transaction syntax

2022-06-06 Thread Henrik Ingo
On Mon, Jun 6, 2022 at 5:28 PM bened...@apache.org 
wrote:

> > One way to make it obvious is to require the user to explicitly type
> the SELECTs and then to require that all SELECTs appear before
> UPDATE/INSERT/DELETE.
>
>
>
> Yes, I agree that SELECT statements should be required to go first.
>
>
>
> However, I think this is sufficient and we can retain the shorter format
> for RETURNING. There only remains the issue of conditions imposed upon
> UPDATE/INSERT/DELETE statements when there are multiple statements that
> affect the same primary key. I think we can (and should) simply reject such
> queries for now, as it doesn’t make much sense to have multiple statements
> for the same primary key in the same transaction.
>
>
I guess I was thinking ahead to a future where and UPDATE write set may or
may not intersect with a previous update due to allowing WHERE clause to
use secondary keys, etc.

That said, I'm not saying we SHOULD require explicit SELECT statements for
every update. I'm sure that would be annoying more than useful.I was just
following a train of thought.



>
>
> > Returning the "result" from an UPDATE presents the question should it
> be the data at the start of the transaction or end state?
>
>
>
> I am inclined to only return the new values (as proposed by Alex) for the
> purpose of returning new auto-increment values etc. If you require the
> prior value, SELECT is available to express this.
>
>
That's a great point!


>
>
> > I was thinking the following coordinator-side implementation would
> allow to use also old drivers
>
>
>
> I am inclined to return just the first result set to old clients. I think
> it’s fine to require a client upgrade to get multiple result sets.
>
>
Possibly. I just wanted to share an idea for consideration. IMO the temp
table idea might not be too hard to implement*, but sure the syntax does
feel a bit bolted on.

*) I'm maybe the wrong person to judge that, of course :-)

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: [Marketing] For Review: Cassandra World Party 2022

2022-06-06 Thread Chris Thornett
FYI - as there are a few details that need confirming on the World Party,
I've pushed the publication of this post back to Thursday, 9 June. Thanks!

On Tue, May 31, 2022 at 10:07 PM Chris Thornett  wrote:

> We have a 72-hr community review for the first blog announcing this year's
> Cassandra World Party for the release of 4.1:
>
> https://docs.google.com/document/d/1ed5NEkxQjkk__EUS6HnuvqDisAKZoYuh-rEZdp3RKpU/edit?usp=sharin
> g
>
> FYI - We're also looking to post two blogs a week during June as we ramp
> up for the next release and promote all the great features that are
> included.
>
> This is due to be published on 7 June.
>
> Thanks,
> --
>
> Chris Thornett
> senior content strategist, Constantia.io
> ch...@constantia.io
>


Re: CEP-15 multi key transaction syntax

2022-06-06 Thread Blake Eggleston
Hi all,

Thanks for all the input and questions so far. Glad people are excited about 
this!

I didn’t have any free time to respond this weekend, although it looks like 
Benedict has responded to most of the questions so far, so if I don’t respond 
to a question you asked here, you can interpret that as “what Benedict said” :).


Jeff, 

> Is there a new keyword for “partition (not) exists” or is it inferred by the 
> select?

I'd intended this to be worked out from the select statement, ie: if the 
read/reference is null/empty, then it doesn't exist, whether you're interested 
in the partition, row, or cell. So I don't think we'd need an additional 
keyword there. I think that would address partition exists / not exists use 
cases?

> And would you allow a transaction that had > 1 named select and no 
> modification statements, but commit if 1=1 ?

Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) would 
be part of the syntax. Also, running a txn that doesn’t contain updates 
wouldn’t be a problem.

Patrick, I think Benedict answered your questions? Glad you got the joke :)

Alex,

> 1. Dependant SELECTs
> 2. Dependant UPDATEs
> 3. UPDATE from secondary index (or SASI)
> 5. UPDATE with predicate on non-primary key

The full primary key must be defined as part of the statement, and you can’t 
use column references to define them, so you wouldn’t be able to run these.

> MVs

To prevent being spread too thin, both in syntax design and implementation 
work, I’d like to limit read and write operations in the initial implementation 
to vanilla selects, updates, inserts, and deletes. Once we have a solid 
implementation of multi-key/table transactions supporting foundational 
operations, we can start figuring out how the more advanced pieces can be best 
supported. Not a great answer to your question, but a related tangent I should 
have included in my initial email.

> ... RETURNING ...

I like the idea of the returning statement, but to echo what Benedict said, I 
think any scheme for specifying data to be returned should apply the same to 
select and update statements, since updates can have underlying reads that the 
user may be interested in. I’d mentioned having an optional RETURN statement in 
addition to automatically returning selects in my original email.

> ... WITH ...

I like the idea of defining statement names at the beginning of a statement, 
since I could imagine mapping names to selects might get difficult if there are 
a lot of columns in the select or update, but beginning each statement with 
`WITH ` reduces readability imo. Maybe putting the name after the first 
term of the statement (ie: `SELECT * AS  WHERE...`, `UPDATE table AS 
 SET ...`, `INSERT INTO table AS  (...) VALUES (...);`) would be 
improve finding names without harming overall readability?

Benedict,

> I agree that SELECT statements should be required to go first.

+1

> There only remains the issue of conditions imposed upon UPDATE/INSERT/DELETE 
> statements when there are multiple statements that affect the same primary 
> key. I think we can (and should) simply reject such queries for now, as it 
> doesn’t make much sense to have multiple statements for the same primary key 
> in the same transaction.

Unfortunately, I think there are use cases for both multiple selects and 
updates for the same primary key in a txn. Selects aren’t as problematic, but 
if multiple updates end up touching the same cell, I’d expect the last one to 
win. This would make dealing with range tombstones a little trickier, since the 
default behavior of alternating updates and range tombstones affecting the same 
cells is not intuitive, but I don’t think it would be too bad.


Something that’s come up a few times, and that I’ve also been thinking about is 
whether to return the values that were originally read, or the values written 
with the update to the client, and there are use cases for both. I don’t 
remember who suggested it, but I think returning the original values from named 
select statements, and the post-update values from named update statements is a 
good way to handle both. Also, while returning the contents of the mutation 
would be the easiest, implementation wise, swapping cell values from the 
updates named read would be most useful, since a txn won’t always result in an 
update, in which case we’d just return the select.

Thanks,

Blake



> On Jun 6, 2022, at 9:41 AM, Henrik Ingo  wrote:
> 
> On Mon, Jun 6, 2022 at 5:28 PM bened...@apache.org 
>   > wrote:
> > One way to make it obvious is to require the user to explicitly type the 
> > SELECTs and then to require that all SELECTs appear before 
> > UPDATE/INSERT/DELETE.
> 
>  
> 
> Yes, I agree that SELECT statements should be required to go first.
> 
>  
> 
> However, I think this is sufficient and we can retain the shorter format for 
> RETURNING. There only remains the issue of conditions 

Re: CEP-15 multi key transaction syntax

2022-06-06 Thread bened...@apache.org
> if multiple updates end up touching the same cell, I’d expect the last one to 
> win

Hmm, yes I suppose range tombstones are a plausible and reasonable thing to mix 
with inserts over the same key range.

What’s your present thinking about the idea of handling returning the values as 
of a given point in the sequential execution then?

The succinct syntax is I think highly desirable for user experience, but this 
does complicate it a bit if we want to remain intuitive.




From: Blake Eggleston 
Date: Monday, 6 June 2022 at 23:17
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Hi all,

Thanks for all the input and questions so far. Glad people are excited about 
this!

I didn’t have any free time to respond this weekend, although it looks like 
Benedict has responded to most of the questions so far, so if I don’t respond 
to a question you asked here, you can interpret that as “what Benedict said” :).


Jeff,

> Is there a new keyword for “partition (not) exists” or is it inferred by the 
> select?

I'd intended this to be worked out from the select statement, ie: if the 
read/reference is null/empty, then it doesn't exist, whether you're interested 
in the partition, row, or cell. So I don't think we'd need an additional 
keyword there. I think that would address partition exists / not exists use 
cases?

> And would you allow a transaction that had > 1 named select and no 
> modification statements, but commit if 1=1 ?

Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) would 
be part of the syntax. Also, running a txn that doesn’t contain updates 
wouldn’t be a problem.

Patrick, I think Benedict answered your questions? Glad you got the joke :)

Alex,

> 1. Dependant SELECTs
> 2. Dependant UPDATEs
> 3. UPDATE from secondary index (or SASI)
> 5. UPDATE with predicate on non-primary key

The full primary key must be defined as part of the statement, and you can’t 
use column references to define them, so you wouldn’t be able to run these.

> MVs

To prevent being spread too thin, both in syntax design and implementation 
work, I’d like to limit read and write operations in the initial implementation 
to vanilla selects, updates, inserts, and deletes. Once we have a solid 
implementation of multi-key/table transactions supporting foundational 
operations, we can start figuring out how the more advanced pieces can be best 
supported. Not a great answer to your question, but a related tangent I should 
have included in my initial email.

> ... RETURNING ...

I like the idea of the returning statement, but to echo what Benedict said, I 
think any scheme for specifying data to be returned should apply the same to 
select and update statements, since updates can have underlying reads that the 
user may be interested in. I’d mentioned having an optional RETURN statement in 
addition to automatically returning selects in my original email.

> ... WITH ...

I like the idea of defining statement names at the beginning of a statement, 
since I could imagine mapping names to selects might get difficult if there are 
a lot of columns in the select or update, but beginning each statement with 
`WITH ` reduces readability imo. Maybe putting the name after the first 
term of the statement (ie: `SELECT * AS  WHERE...`, `UPDATE table AS 
 SET ...`, `INSERT INTO table AS  (...) VALUES (...);`) would be 
improve finding names without harming overall readability?

Benedict,

> I agree that SELECT statements should be required to go first.

+1

> There only remains the issue of conditions imposed upon UPDATE/INSERT/DELETE 
> statements when there are multiple statements that affect the same primary 
> key. I think we can (and should) simply reject such queries for now, as it 
> doesn’t make much sense to have multiple statements for the same primary key 
> in the same transaction.

Unfortunately, I think there are use cases for both multiple selects and 
updates for the same primary key in a txn. Selects aren’t as problematic, but 
if multiple updates end up touching the same cell, I’d expect the last one to 
win. This would make dealing with range tombstones a little trickier, since the 
default behavior of alternating updates and range tombstones affecting the same 
cells is not intuitive, but I don’t think it would be too bad.


Something that’s come up a few times, and that I’ve also been thinking about is 
whether to return the values that were originally read, or the values written 
with the update to the client, and there are use cases for both. I don’t 
remember who suggested it, but I think returning the original values from named 
select statements, and the post-update values from named update statements is a 
good way to handle both. Also, while returning the contents of the mutation 
would be the easiest, implementation wise, swapping cell values from the 
updates named read would be most useful, since a txn won’t always result in an 
update, in which case w

Re: CEP-15 multi key transaction syntax

2022-06-06 Thread Blake Eggleston
That's a good question. I'd lean towards returning the final state of things, 
although I could understand expecting to see intermediate state. Regarding 
range tombstones, we could require them to precede any updates like selects, 
but there's still the question of how to handle multiple updates to the same 
cell when the user has requested we return the post-update state of the cell.

> On Jun 6, 2022, at 4:00 PM, bened...@apache.org wrote:
> 
> > if multiple updates end up touching the same cell, I’d expect the last one 
> > to win
>  
> Hmm, yes I suppose range tombstones are a plausible and reasonable thing to 
> mix with inserts over the same key range.
>  
> What’s your present thinking about the idea of handling returning the values 
> as of a given point in the sequential execution then?
>  
> The succinct syntax is I think highly desirable for user experience, but this 
> does complicate it a bit if we want to remain intuitive.
>  
>  
>  
>  
> From: Blake Eggleston 
> Date: Monday, 6 June 2022 at 23:17
> To: dev@cassandra.apache.org 
> Subject: Re: CEP-15 multi key transaction syntax
> 
> Hi all,
> 
> Thanks for all the input and questions so far. Glad people are excited about 
> this!
> 
> I didn’t have any free time to respond this weekend, although it looks like 
> Benedict has responded to most of the questions so far, so if I don’t respond 
> to a question you asked here, you can interpret that as “what Benedict said” 
> :).
> 
> 
> Jeff, 
> 
> > Is there a new keyword for “partition (not) exists” or is it inferred by 
> > the select?
> 
> I'd intended this to be worked out from the select statement, ie: if the 
> read/reference is null/empty, then it doesn't exist, whether you're 
> interested in the partition, row, or cell. So I don't think we'd need an 
> additional keyword there. I think that would address partition exists / not 
> exists use cases?
> 
> > And would you allow a transaction that had > 1 named select and no 
> > modification statements, but commit if 1=1 ?
> 
> Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) 
> would be part of the syntax. Also, running a txn that doesn’t contain updates 
> wouldn’t be a problem.
> 
> Patrick, I think Benedict answered your questions? Glad you got the joke :)
> 
> Alex,
> 
> > 1. Dependant SELECTs
> > 2. Dependant UPDATEs
> > 3. UPDATE from secondary index (or SASI)
> > 5. UPDATE with predicate on non-primary key
> 
> The full primary key must be defined as part of the statement, and you can’t 
> use column references to define them, so you wouldn’t be able to run these.
> 
> > MVs
> 
> To prevent being spread too thin, both in syntax design and implementation 
> work, I’d like to limit read and write operations in the initial 
> implementation to vanilla selects, updates, inserts, and deletes. Once we 
> have a solid implementation of multi-key/table transactions supporting 
> foundational operations, we can start figuring out how the more advanced 
> pieces can be best supported. Not a great answer to your question, but a 
> related tangent I should have included in my initial email.
> 
> > ... RETURNING ...
> 
> I like the idea of the returning statement, but to echo what Benedict said, I 
> think any scheme for specifying data to be returned should apply the same to 
> select and update statements, since updates can have underlying reads that 
> the user may be interested in. I’d mentioned having an optional RETURN 
> statement in addition to automatically returning selects in my original email.
> 
> > ... WITH ...
> 
> I like the idea of defining statement names at the beginning of a statement, 
> since I could imagine mapping names to selects might get difficult if there 
> are a lot of columns in the select or update, but beginning each statement 
> with `WITH ` reduces readability imo. Maybe putting the name after the 
> first term of the statement (ie: `SELECT * AS  WHERE...`, `UPDATE table 
> AS  SET ...`, `INSERT INTO table AS  (...) VALUES (...);`) would 
> be improve finding names without harming overall readability?
> 
> Benedict,
> 
> > I agree that SELECT statements should be required to go first.
> 
> +1
> 
> > There only remains the issue of conditions imposed upon 
> > UPDATE/INSERT/DELETE statements when there are multiple statements that 
> > affect the same primary key. I think we can (and should) simply reject such 
> > queries for now, as it doesn’t make much sense to have multiple statements 
> > for the same primary key in the same transaction.
> 
> Unfortunately, I think there are use cases for both multiple selects and 
> updates for the same primary key in a txn. Selects aren’t as problematic, but 
> if multiple updates end up touching the same cell, I’d expect the last one to 
> win. This would make dealing with range tombstones a little trickier, since 
> the default behavior of alternating updates and range tombstones affecting 
> the same cells is not intuitive, but I don’t think

Re: CEP-15 multi key transaction syntax

2022-06-06 Thread bened...@apache.org
It affects not just RETURNING but also conditions that are evaluated against 
the row, and if we in future permit using the values from one select in a 
function call / write to another table (which I imagine we will).

I think that for it to be intuitive we need it to make sense sequentially, 
which means either calculating it or restricting what can be stated (or 
abandoning the syntax).

If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
overlapping DELETE (and as many SELECT as you like) that would perhaps make it 
simple enough? Require for now that SELECTS go first, then DELETE and then 
INSERT/UPDATE (or vice versa, depending what we want to make simple)?

FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
restricted to single rows we are updating, so we could simply maintain a 
collections of rows and upsert into them as we process the execution. Most 
transactions won’t need it, I suspect, so we don’t need to worry about perfect 
efficiency.


From: Blake Eggleston 
Date: Tuesday, 7 June 2022 at 00:21
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
That's a good question. I'd lean towards returning the final state of things, 
although I could understand expecting to see intermediate state. Regarding 
range tombstones, we could require them to precede any updates like selects, 
but there's still the question of how to handle multiple updates to the same 
cell when the user has requested we return the post-update state of the cell.


On Jun 6, 2022, at 4:00 PM, bened...@apache.org 
wrote:

> if multiple updates end up touching the same cell, I’d expect the last one to 
> win

Hmm, yes I suppose range tombstones are a plausible and reasonable thing to mix 
with inserts over the same key range.

What’s your present thinking about the idea of handling returning the values as 
of a given point in the sequential execution then?

The succinct syntax is I think highly desirable for user experience, but this 
does complicate it a bit if we want to remain intuitive.




From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Monday, 6 June 2022 at 23:17
To: dev@cassandra.apache.org 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
Hi all,

Thanks for all the input and questions so far. Glad people are excited about 
this!

I didn’t have any free time to respond this weekend, although it looks like 
Benedict has responded to most of the questions so far, so if I don’t respond 
to a question you asked here, you can interpret that as “what Benedict said” :).


Jeff,

> Is there a new keyword for “partition (not) exists” or is it inferred by the 
> select?

I'd intended this to be worked out from the select statement, ie: if the 
read/reference is null/empty, then it doesn't exist, whether you're interested 
in the partition, row, or cell. So I don't think we'd need an additional 
keyword there. I think that would address partition exists / not exists use 
cases?

> And would you allow a transaction that had > 1 named select and no 
> modification statements, but commit if 1=1 ?

Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) would 
be part of the syntax. Also, running a txn that doesn’t contain updates 
wouldn’t be a problem.

Patrick, I think Benedict answered your questions? Glad you got the joke :)

Alex,

> 1. Dependant SELECTs
> 2. Dependant UPDATEs
> 3. UPDATE from secondary index (or SASI)
> 5. UPDATE with predicate on non-primary key

The full primary key must be defined as part of the statement, and you can’t 
use column references to define them, so you wouldn’t be able to run these.

> MVs

To prevent being spread too thin, both in syntax design and implementation 
work, I’d like to limit read and write operations in the initial implementation 
to vanilla selects, updates, inserts, and deletes. Once we have a solid 
implementation of multi-key/table transactions supporting foundational 
operations, we can start figuring out how the more advanced pieces can be best 
supported. Not a great answer to your question, but a related tangent I should 
have included in my initial email.

> ... RETURNING ...

I like the idea of the returning statement, but to echo what Benedict said, I 
think any scheme for specifying data to be returned should apply the same to 
select and update statements, since updates can have underlying reads that the 
user may be interested in. I’d mentioned having an optional RETURN statement in 
addition to automatically returning selects in my original email.

> ... WITH ...

I like the idea of defining statement names at the beginning of a statement, 
since I could imagine mapping names to selects might get difficult if there are 
a lot of columns in the select or update, but beginning each statement with 
`WITH ` reduces readability imo. Maybe putting the name after the first