Re: CEP-15 multi key transaction syntax

2022-06-05 Thread bened...@apache.org
> 1. Dependant SELECTs
> 2. Dependant UPDATEs
> 3. UPDATE from secondary index (or SASI)
> 5. UPDATE with predicate on non-primary key

So, I think these are all likely to be rejected the same way they are today, as 
the individual statements would not parse [1,2] or be validated [3,5], as I’m 
fairly sure UPDATE and INSERT require a primary key to be specified and that 
only SELECT supports secondary indexes.

It could be nice to have dedicated messages explaining the limitation for 
[1,2], at least until the restriction is lifted.

> 4. The presence of a materialized view

This is a bit more complex. I think in principle MVs could function as they do 
today, i.e. with eventually consistent update. MVs remain experimental however, 
with known shortcomings, and I am not keen to validate them with Accord.

Since I think our plan is to opt tables into transactional behaviour (to 
minimise the potential for misusing them, unlike LWTs, which are easily used 
unsafely), I would prefer to ensure that MVs are mutually exclusive with 
transactions for now.

I anticipate follow up work will deliver global secondary indexes on top of 
Accord. I’ve no idea if that will replace or coexist with MVs as they exist 
today, perhaps it will be possible to create MVs and specify their consistency 
properties on creation once the existing MVs are reliable.

> 6. Large SELECTs Are Actually Okay But Look Like They Shouldn't Be

I’m not sure what our plans are around aggregations and transactions, perhaps 
Blake can speak more to his thoughts. Since aggregations are relatively new I 
am inclined to exclude them initially, at least for write transactions, since 
LWTs do not support them.

Otherwise we will need some deterministic measure for aborting transactions – 
even after we have agreed to execute them. E.g. a 5000 row limit on live rows 
read as input before a transaction is converted to a no-op. We will have to be 
especially careful here for unconditional transactions without any 
SELECT/RETURN, as these must still wait for the result of execution before 
notifying the user of the outcome, if it may be aborted.

Suggestions welcome here.

> 7. Triggers

Good question!

It looks like LWTs don’t integrate with triggers today, so I guess we can 
ignore them too. I don’t know how stable triggers are, or how widely they are 
used. I’m sure we have some use cases, but I’m not aware of any community 
members that use them so it is likely sparse.

In principle a trigger could modify the transaction submitted by a client to 
include additional updates, but this would likely require changes to the 
trigger API. I anticipate ignoring them until we have community demand.

> Random Syntax Thoughts

I like the RETURNING syntax, and consistency with SQL dialects is a plus. I’m 
concerned about consistency with SELECT statements, though – these already 
imply RETURNING, but we might use them to compute constraint clauses on tables 
we are not updating, and this would leave no consistent way of doing this 
without returning all of its fields to the user, at least not without multiple 
SELECT statements over the same data.

We could introduce a new keyword such as CONSTRAIN in this case, with syntax 
equivalent to UPDATE/DELETE but supporting RETURNING and by default not 
returning any fields?

The idea of a RETURNING syntax on the transaction itself was previously floated 
and is nice, but I worry about having multiple inconsistent ways of returning 
data that can be co-mingled. How would you envisage these keywords interacting?


From: Alex Miller 
Date: Sunday, 5 June 2022 at 03:39
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
All of my text below largely extends your question of syntax in a few
directions:
 - What is the user experience of trying to run different statements
with this syntax?
 - How do transactions interact with other Cassandra constructs?
 - What are the execution semantics of these statements?
which I do acknowledge is a moderate re-scoping of the question.

Also, please take my understanding of existing CQL and DDL constructs with
an impractically large grain of salt.


Undesireable Transactions
-

I tried to match CQL docs up against a number of ways of writing statements
which Accord wouldn't like, or users might not like the effect of running.
I'm assuming it'd be good to think through how one would express the error
message or guidance given to users?  Or at least just making sure I
understand correctly what is writable but not executable or desirable.

=== Likely Unexecutable

All the cases here are predicated on the lack of automatic reconnaissance
transaction support.

1. Dependant SELECTs

CREATE TABLE users (name text primary key, home_state text);
CREATE TABLE states (name text primary key, population int);

BEGIN TRANSACTION;
  /*1*/ SELECT home_state FROM users WHERE name='blake' AS user;
  /*2*/ SELECT population FROM states WHE

Re: CEP-15 multi key transaction syntax

2022-06-05 Thread bened...@apache.org
> In the case that the condition is met, is the mutation applied at that point, 
> or has it already happened and there is something like a rollback segment?

The condition is a part of the transaction execution, so no mutation is applied 
until it has been evaluated – there is no rollback.

> What is the case when the condition is not met and what is presented to the 
> end-user?

I think you can expect to have any SELECT/RETURN (whatever we settle on) 
results returned, along with FALSE for the executed result set.

> More importantly, what happens with respect to the A & I in ACID when the 
> transaction is applied?

Not sure what you mean? They’re maintained at all times, but would be happy to 
explain more if I can understand the question better.

> If UPDATE is used, returning the number of rows changed would be helpful.

Do we support updates that affect an uncertain number of rows at the moment? 
Besides DELETE, for which we don’t want to calculate it, as it’s costlier.

> Is this something that can be done interactively in cqlsh or does it all have 
> to be submitted in one statement block?

These are non-interactive, so it needs to be declared in a single statement. I 
think Accord can be extended to natively support interactive transactions in 
future, in a manner consistent with its fast non-interactive transactions, but 
that’s a whole other endeavour.

From: Patrick McFadin 
Date: Sunday, 5 June 2022 at 01:47
To: dev 
Subject: Re: CEP-15 multi key transaction syntax
I've been waiting for this email! I'll echo what Jeff said about how exciting 
this is for the project.

On the SELECT inside the transaction:

In the first example, I'm making an assumption that you are doing a select on a 
partition key and only expect one result but is any valid CQL SELECT allowed 
here? If 'model' were a non-partition key column name and was indexed, then you 
could potentially have multiple rows returned and that isn't an allowed 
operation. Are only partition key lookups allowed or is there some logic 
looking for only one row?

I'm asking because I can see in reverse time series models where you can select 
the latest temperature
  SELECT temperature FROM weather_station WHERE id=1234 AND DATE='2022-06-04' 
LIMIT 1;

(also, horrible example. Everyone knows that the return value for a 
Pinto.is_running will always evaluate to FALSE)

On COMMIT TRANSACTION:

So much to unpack here. In the case that the condition is met, is the mutation 
applied at that point, or has it already happened and there is something like a 
rollback segment? What is the case when the condition is not met and what is 
presented to the end-user? More importantly, what happens with respect to the A 
& I in ACID when the transaction is applied?

If UPDATE is used, returning the number of rows changed would be helpful.

Is this something that can be done interactively in cqlsh or does it all have 
to be submitted in one statement block?

I'll stop here for now.

Patrick

On Sat, Jun 4, 2022 at 3:34 PM bened...@apache.org 
mailto:bened...@apache.org>> wrote:
> The returned result set is after the updates are applied?
Returning the prior values is probably more powerful, as you can perform 
unconditional updates and respond to the prior state, that you otherwise would 
not know. It’s also simpler to implement.

My inclination is to require that SELECT statements are declared first, so that 
we leave open the option of (in future) supporting SELECT statements in any 
place in the transaction, returning the values as of their position in a 
sequential execution of the statements.

> And would you allow a transaction that had > 1 named select and no 
> modification statements, but commit if 1=1 ?

My preference is that the IF condition is anyway optional, as it is much more 
obvious to a user than concocting some always-true condition. But yes, 
read-only transactions involving multiple tables will definitely be supported.


From: Jeff Jirsa mailto:jji...@gmail.com>>
Date: Saturday, 4 June 2022 at 22:49
To: dev@cassandra.apache.org 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax

And would you allow a transaction that had > 1 named select and no modification 
statements, but commit if 1=1 ?

> On Jun 4, 2022, at 2:45 PM, Jeff Jirsa 
> mailto:jji...@gmail.com>> wrote:
>
> 
>
>> On Jun 3, 2022, at 8:39 AM, Blake Eggleston 
>> mailto:beggles...@apple.com>> wrote:
>>
>> Hi dev@,
>
> First, I’m ridiculously excited to see this.
>
>>
>> I’ve been working on a draft syntax for Accord transactions and wanted to 
>> bring what I have to the dev list to solicit feedback and build consensus 
>> before moving forward with it. The proposed transaction syntax is intended 
>> to be an extended batch syntax. Basically batches with selects, and an 
>> optional condition at the end. To facilitate conditions against an arbitrary 
>> number of select statements, you