Re: CEP-15 multi key transaction syntax

bened...@apache.org Mon, 06 Jun 2022 07:28:26 -0700

> One way to make it obvious is to require the user to explicitly type the 
> SELECTs and then to require that all SELECTs appear before 
> UPDATE/INSERT/DELETE.

Yes, I agree that SELECT statements should be required to go first.

However, I think this is sufficient and we can retain the shorter format for 
RETURNING. There only remains the issue of conditions imposed upon 
UPDATE/INSERT/DELETE statements when there are multiple statements that affect 
the same primary key. I think we can (and should) simply reject such queries 
for now, as it doesn’t make much sense to have multiple statements for the same 
primary key in the same transaction.

> Returning the "result" from an UPDATE presents the question should it be the 
> data at the start of the transaction or end state?

I am inclined to only return the new values (as proposed by Alex) for the 
purpose of returning new auto-increment values etc. If you require the prior 
value, SELECT is available to express this.

> I was thinking the following coordinator-side implementation would allow to 
> use also old drivers

I am inclined to return just the first result set to old clients. I think it’s 
fine to require a client upgrade to get multiple result sets.

From: Henrik Ingo <henrik.i...@datastax.com>
Date: Monday, 6 June 2022 at 15:18
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: Re: CEP-15 multi key transaction syntax
Thank you Blake and team!

Just some personal reactions and thoughts...

First instinct is to support the shorter format where UPDATE ... AS car  is 
also its own implicit select.

However, a subtle thing to note is that a reasonable user might expect that in 
a sequence of multiple UPDATEs, each of them is also read at the position where 
the UPDATE is in the list of statements. The fact that Accord executes all 
reads first is not at all obvious from the syntax. One way to make it obvious 
is to require the user to explicitly type the SELECTs and then to require that 
all SELECTs appear before UPDATE/INSERT/DELETE.

I like the idea of a RETURN or RETURNING keyword to specify what exactly you 
want to return. This would allow to also return results from UPDATE/INSERT 
since the user explicitly told us to do so.

Returning the "result" from an UPDATE presents the question should it be the 
data at the start of the transaction or end state? Interestingly the MongoDB 
$findAndModify operation allows you to choose between both options. There seems 
to be a valid use case for both. The obvious examples are:

      UPDATE t SET c=100 WHERE id=1 AS t RETURNING BEFORE c;
    COMMIT TRANSACTION IF t.c <= 100;

I want to know the value of what c was before I replaced with a new value.

      INSERT INTO t (c) VALUES (100) AS t RETURNING AFTER d;
    COMMIT TRANSACTION IF t.c <= 100;

I want to know the defaulted value of d. (...as was already pointed out in 
another email.)

      UPDATE t SET c+=1 WHERE id=1 AS t RETURNING AFTER c;
    COMMIT TRANSACTION IF t.c <= 100;

I want to know the result of c after the transaction. (Which I know will be at 
most 100, but I want to know exactly.)

I kind of sympathize with the intuitive opinion that we should return the 
values from the start of the transaction, since that's how Accord works: reads 
first, updates second.

Finally, I wanted to share a thought on how to implement the returning of 
multiple result sets. While you don't address it, I'm assuming the driver api 
will get new functionality where you can get a specific result set out of many.

I was thinking the following coordinator-side implementation would allow to use 
also old drivers:

BEGIN TRANSACTION;
   SELECT * FROM table1 WHERE .... AS t1;
   SELECT * FROM table2 WHERE .... AS t2;
   UPDATE something...
COMMIT TRANSACTION;
SELECT * FROM t1;
SELECT * FROM t2;

The coordinator-level implementation here would be to store the results of the 
SELECTs inside a transaction into temporary tables that the client can the read 
from after the transaction. Even if those later selects are outside the 
transaction, their contents would be a constant snapshot representing the state 
of those rows at the time of the transaction. The tables should be visible only 
to the same client session and until the start of the next transaction or a 
timeout, whichever comes first.

henrik

On Fri, Jun 3, 2022 at 6:39 PM Blake Eggleston 
<beggles...@apple.com<mailto:beggles...@apple.com>> wrote:
Hi dev@,

I’ve been working on a draft syntax for Accord transactions and wanted to bring 
what I have to the dev list to solicit feedback and build consensus before 
moving forward with it. The proposed transaction syntax is intended to be an 
extended batch syntax. Basically batches with selects, and an optional 
condition at the end. To facilitate conditions against an arbitrary number of 
select statements, you can also name the statements, and reference columns in 
the results. To cut down on the number of operations needed, select values can 
also be used in updates, including some math operations. Parameterization of 
literals is supported the same as other statements.

Here's an example selecting a row from 2 tables, and issuing updates for each 
row if a condition is met:

BEGIN TRANSACTION;
  SELECT * FROM users WHERE name='blake' AS user;
  SELECT * from cars WHERE model='pinto' AS car;
  UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake';
  UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE model='pinto';
COMMIT TRANSACTION IF car.is_running;

This can be simplified by naming the updates with an AS <name> syntax. If 
updates are named, a corresponding read is generated behind the scenes and its 
values inform the update.

Here's an example, the query is functionally identical to the previous query. 
In the case of the user update, a read is still performed behind the scenes to 
enable the calculation of miles_driven + 30, but doesn't need to be named since 
it's not referenced anywhere else.

BEGIN TRANSACTION;
  UPDATE users SET miles_driven += 30 WHERE name='blake';
  UPDATE cars SET miles_driven += 30 WHERE model='pinto' AS car;
COMMIT TRANSACTION IF car.is_running;

Here’s another example, performing the canonical bank transfer:

BEGIN TRANSACTION;
  UPDATE accounts SET balance += 100 WHERE name='blake' AS blake;
  UPDATE accounts SET balance -= 100 WHERE name='benedict' AS benedict;
COMMIT TRANSACTION IF blake EXISTS AND benedict.balance >= 100;

As you can see from the examples, column values can be referenced via a dot 
syntax, ie: <select_name>.<column> -> select1.value. Since the read portion of 
the transaction is performed before evaluating conditions or applying updates, 
values read can be freely applied to non-primary key values in updates. Select 
statements used either in checking a condition or creating an update must be 
restricted to a single row, either by specifying the full primary key or a 
limit of 1. Multi-row selects are allowed, but only for returning data to the 
client (see below).

For evaluating conditions, = & != are available for all types, <, <=, >, >= are 
available for numerical types, and EXISTS, NOT EXISTS can be used for 
partitions, rows, and values. If any column references cannot be satisfied by 
the result of the reads, the condition implicitly fails. This prevents having 
to include a bunch of exists statements.

On completion, an operation would return a boolean value indicating the 
operation had been applied, and a result set for each named select (but not 
named update). We could also support an optional RETURN keyword, which would 
allow the user to only return specific named selects (ie: RETURN select1, 
select2).

Let me know what you think!

Blake

--

Henrik Ingo

+358 40 569 7354<tel:358405697354>

[Visit us online.]<https://www.datastax.com/>  [Visit us on Twitter.] 
<https://twitter.com/DataStaxEng>   [Visit us on YouTube.] 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
   [Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Re: CEP-15 multi key transaction syntax

Reply via email to