Re: CEP-15 multi key transaction syntax

2022-06-10 Thread Blake Eggleston
Yeah I think that’s intuitive enough. I had been thinking about multiple 
condition branches, but was thinking about something closer to 

IF select.column=5
  UPDATE ... SET ... WHERE key=1;
ELSE IF select.column=6
  UPDATE ... SET ... WHERE key=2;
ELSE
  UPDATE ... SET ... WHERE key=3;
ENDIF
COMMIT TRANSACTION;

Which would make the proposed COMMIT IF we're talking about now a shorthand. Of 
course this would be follow on work.

> On Jun 8, 2022, at 1:20 PM, bened...@apache.org wrote:
> 
> I imagine that conditions would be evaluated against the state prior to the 
> execution of statement against which it is being evaluated, but after the 
> prior statements. I think that should be OK to reason about.
>  
> i.e. we might have a contrived example like:
>  
> BEGIN TRANSACTION
> UPDATE tbl SET a = 1 WHERE k = 1 AS q1
> UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
> COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1
>  
> So q1 would read a = 0, but q2 would read a = 1 and set a = 2.
>  
> I think this is probably adequately intuitive? It is a bit atypical to have 
> conditions that wrap the whole transaction though.
>  
> We have another option, of course, which is to offer IF x ROLLBACK 
> TRANSACTION, which is closer to SQL, which would translate the above to:
>  
> BEGIN TRANSACTION
> SELECT a FROM tbl WHERE k = 1 AS q0
> IF q0.a != 0 ROLLBACK TRANSACTION
> UPDATE tbl SET a = 1 WHERE k = 1 AS q1
> IF q1.a != 1 ROLLBACK TRANSACTION
> UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
> COMMIT TRANSACTION
>  
> This is less succinct, but might be more familiar to users. We could also 
> eschew the ability to read from UPDATE statements entirely in this scheme, as 
> this would then look very much like SQL.
>  
>  
> From: Blake Eggleston 
> Date: Wednesday, 8 June 2022 at 20:59
> To: dev@cassandra.apache.org 
> Subject: Re: CEP-15 multi key transaction syntax
> 
> > It affects not just RETURNING but also conditions that are evaluated 
> > against the row, and if we in future permit using the values from one 
> > select in a function call / write to another table (which I imagine we 
> > will).
> 
> I hadn’t thought about that... using intermediate or even post update values 
> in condition evaluation or function calls seems like it would make it 
> difficult to understand why a condition is or is not applying. On the other 
> hand, it would powerful, especially when using things like database generated 
> values in queries (auto incrementing integer clustering keys or server 
> generated timeuuids being examples that come to mind). Additionally, if we 
> return these values, I guess that would solve the visibility issues I’m 
> worried about. 
> 
> Agreed intermediate values would be straightforward to calculate though.
> 
> 
> On Jun 6, 2022, at 4:33 PM, bened...@apache.org  
> wrote:
>  
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).
>  
> I think that for it to be intuitive we need it to make sense sequentially, 
> which means either calculating it or restricting what can be stated (or 
> abandoning the syntax).
>  
> If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
> overlapping DELETE (and as many SELECT as you like) that would perhaps make 
> it simple enough? Require for now that SELECTS go first, then DELETE and then 
> INSERT/UPDATE (or vice versa, depending what we want to make simple)?
>  
> FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
> restricted to single rows we are updating, so we could simply maintain a 
> collections of rows and upsert into them as we process the execution. Most 
> transactions won’t need it, I suspect, so we don’t need to worry about 
> perfect efficiency.
>  
>  
> From: Blake Eggleston mailto:beggles...@apple.com>>
> Date: Tuesday, 7 June 2022 at 00:21
> To: dev@cassandra.apache.org  
> mailto:dev@cassandra.apache.org>>
> Subject: Re: CEP-15 multi key transaction syntax
> 
> That's a good question. I'd lean towards returning the final state of things, 
> although I could understand expecting to see intermediate state. Regarding 
> range tombstones, we could require them to precede any updates like selects, 
> but there's still the question of how to handle multiple updates to the same 
> cell when the user has requested we return the post-update state of the cell.
> 
> 
> 
> On Jun 6, 2022, at 4:00 PM, bened...@apache.org  
> wrote:
>  
> > if multiple updates end up touching the same cell, I’d expect the last one 
> > to win
>  
> Hmm, yes I suppose range tombstones are a plausible and reasonable thing to 
> mix with inserts over the same key range.
>  
> What’s your present thinking about the idea of handling returning the values 
> as of a given point

Re: CEP-15 multi key transaction syntax

2022-06-10 Thread bened...@apache.org
So, thinking on it myself some more, I think if there’s an option that 
*doesn’t* require the user to reason about the point at which the read happens 
in order to understand how the condition is applied would probably be better.

What do you think of the IF (Boolean expr) ABORT TRANSACTION idea?

It’s compatible with more advanced IF functionality later, and probably not 
much trickier to implement?

The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we only 
get one chance to make this API right.


From: Blake Eggleston 
Date: Friday, 10 June 2022 at 18:56
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Yeah I think that’s intuitive enough. I had been thinking about multiple 
condition branches, but was thinking about something closer to

IF select.column=5
  UPDATE ... SET ... WHERE key=1;
ELSE IF select.column=6
  UPDATE ... SET ... WHERE key=2;
ELSE
  UPDATE ... SET ... WHERE key=3;
ENDIF
COMMIT TRANSACTION;

Which would make the proposed COMMIT IF we're talking about now a shorthand. Of 
course this would be follow on work.


On Jun 8, 2022, at 1:20 PM, bened...@apache.org 
wrote:

I imagine that conditions would be evaluated against the state prior to the 
execution of statement against which it is being evaluated, but after the prior 
statements. I think that should be OK to reason about.

i.e. we might have a contrived example like:

BEGIN TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1

So q1 would read a = 0, but q2 would read a = 1 and set a = 2.

I think this is probably adequately intuitive? It is a bit atypical to have 
conditions that wrap the whole transaction though.

We have another option, of course, which is to offer IF x ROLLBACK TRANSACTION, 
which is closer to SQL, which would translate the above to:

BEGIN TRANSACTION
SELECT a FROM tbl WHERE k = 1 AS q0
IF q0.a != 0 ROLLBACK TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
IF q1.a != 1 ROLLBACK TRANSACTION
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION

This is less succinct, but might be more familiar to users. We could also 
eschew the ability to read from UPDATE statements entirely in this scheme, as 
this would then look very much like SQL.


From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Wednesday, 8 June 2022 at 20:59
To: dev@cassandra.apache.org 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).

I hadn’t thought about that... using intermediate or even post update values in 
condition evaluation or function calls seems like it would make it difficult to 
understand why a condition is or is not applying. On the other hand, it would 
powerful, especially when using things like database generated values in 
queries (auto incrementing integer clustering keys or server generated 
timeuuids being examples that come to mind). Additionally, if we return these 
values, I guess that would solve the visibility issues I’m worried about.

Agreed intermediate values would be straightforward to calculate though.



On Jun 6, 2022, at 4:33 PM, bened...@apache.org 
wrote:

It affects not just RETURNING but also conditions that are evaluated against 
the row, and if we in future permit using the values from one select in a 
function call / write to another table (which I imagine we will).

I think that for it to be intuitive we need it to make sense sequentially, 
which means either calculating it or restricting what can be stated (or 
abandoning the syntax).

If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
overlapping DELETE (and as many SELECT as you like) that would perhaps make it 
simple enough? Require for now that SELECTS go first, then DELETE and then 
INSERT/UPDATE (or vice versa, depending what we want to make simple)?

FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
restricted to single rows we are updating, so we could simply maintain a 
collections of rows and upsert into them as we process the execution. Most 
transactions won’t need it, I suspect, so we don’t need to worry about perfect 
efficiency.


From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Tuesday, 7 June 2022 at 00:21
To: dev@cassandra.apache.org 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
That's a good question. I'd lean towards returning the final state of things, 
although I could understand expecting to see intermediate state. Regarding 
range tombstones, we could require them to precede any updates like selects

Re: CEP-15 multi key transaction syntax

2022-06-10 Thread bened...@apache.org
This might also permit us to remove one result set (the success/failure one) 
and return instead an exception if the transaction is aborted. This is also 
more consistent with SQL, if memory serves. That might conflict with returning 
the other result sets in the event of abort (though that’s up to us 
ultimately), but it feels like a nicer API for the user – depending on how 
these exceptions are surfaced in client APIs.

From: bened...@apache.org 
Date: Friday, 10 June 2022 at 19:59
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
So, thinking on it myself some more, I think if there’s an option that 
*doesn’t* require the user to reason about the point at which the read happens 
in order to understand how the condition is applied would probably be better.

What do you think of the IF (Boolean expr) ABORT TRANSACTION idea?

It’s compatible with more advanced IF functionality later, and probably not 
much trickier to implement?

The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we only 
get one chance to make this API right.


From: Blake Eggleston 
Date: Friday, 10 June 2022 at 18:56
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Yeah I think that’s intuitive enough. I had been thinking about multiple 
condition branches, but was thinking about something closer to

IF select.column=5
  UPDATE ... SET ... WHERE key=1;
ELSE IF select.column=6
  UPDATE ... SET ... WHERE key=2;
ELSE
  UPDATE ... SET ... WHERE key=3;
ENDIF
COMMIT TRANSACTION;

Which would make the proposed COMMIT IF we're talking about now a shorthand. Of 
course this would be follow on work.



On Jun 8, 2022, at 1:20 PM, bened...@apache.org 
wrote:

I imagine that conditions would be evaluated against the state prior to the 
execution of statement against which it is being evaluated, but after the prior 
statements. I think that should be OK to reason about.

i.e. we might have a contrived example like:

BEGIN TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1

So q1 would read a = 0, but q2 would read a = 1 and set a = 2.

I think this is probably adequately intuitive? It is a bit atypical to have 
conditions that wrap the whole transaction though.

We have another option, of course, which is to offer IF x ROLLBACK TRANSACTION, 
which is closer to SQL, which would translate the above to:

BEGIN TRANSACTION
SELECT a FROM tbl WHERE k = 1 AS q0
IF q0.a != 0 ROLLBACK TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
IF q1.a != 1 ROLLBACK TRANSACTION
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION

This is less succinct, but might be more familiar to users. We could also 
eschew the ability to read from UPDATE statements entirely in this scheme, as 
this would then look very much like SQL.


From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Wednesday, 8 June 2022 at 20:59
To: dev@cassandra.apache.org 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).

I hadn’t thought about that... using intermediate or even post update values in 
condition evaluation or function calls seems like it would make it difficult to 
understand why a condition is or is not applying. On the other hand, it would 
powerful, especially when using things like database generated values in 
queries (auto incrementing integer clustering keys or server generated 
timeuuids being examples that come to mind). Additionally, if we return these 
values, I guess that would solve the visibility issues I’m worried about.

Agreed intermediate values would be straightforward to calculate though.




On Jun 6, 2022, at 4:33 PM, bened...@apache.org 
wrote:

It affects not just RETURNING but also conditions that are evaluated against 
the row, and if we in future permit using the values from one select in a 
function call / write to another table (which I imagine we will).

I think that for it to be intuitive we need it to make sense sequentially, 
which means either calculating it or restricting what can be stated (or 
abandoning the syntax).

If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
overlapping DELETE (and as many SELECT as you like) that would perhaps make it 
simple enough? Require for now that SELECTS go first, then DELETE and then 
INSERT/UPDATE (or vice versa, depending what we want to make simple)?

FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
restricted to single rows we are updating, so we could simply maintain a 
collections of rows and upsert into them as we process the exec