Re: CEP-15 multi key transaction syntax

2022-06-12 Thread bened...@apache.org
> I would love hearing from people on what they think.

^^ It would be great to have more participants in this conversation

> For context, my questions earlier were based on my 20+ years of using SQL 
> transactions across different systems.

We probably don’t come from a very different place. I spent too many years with 
T-SQL.

> When you start a SQL transaction, you are creating a branch of your data that 
> you can operate with until you reach your desired state and then merge it 
> back with a commit.

That’s the essential complexity we’re grappling with: how much do we permit 
your “branch” to do, how do we let you express it, and how do we let you 
express conditions?

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

Right now, we have the issue that read-your-writes introduces some complexity 
to the semantics, particularly around the conditions of execution.

LWTs impose conditions on the state of all records prior to execution, but 
their API has a lot of shortcomings. The proposal of COMMIT IF (Boolean expr) 
is most consistent with this approach. This can be confusing, though, if the 
condition is evaluated on a value that has been updated by a prior statement in 
the batch – what value does this global condition get evaluated against?*

SQL has no such concept, but also SQL is designed to be interactive. Depending 
on the dialect there’s probably a lot of ways to do this non-interactively in 
SQL, but we probably cannot cheaply replicate the functionality exactly as we 
do not (yet) support interactive transactions that they were designed for. To 
submit a conditional non-interactive transaction in SQL, you would likely use

IF (X) THEN
ROLLBACK
RETURN (ERRCODE)
END IF

or

IF (X) THEN RAISERROR

So, that is in essence the question we are currently asking: do we want to have 
a more LWT-like approach (and if so, how do we address this complexity for the 
user), or do we want a more SQL-like approach (and if so, how do we modify it 
to make non-interactive transactions convenient, and implementation tractable)

* This is anyway a shortcoming of existing batches, I think? So it might be we 
can sweep it under the rug, but I think it will be more relevant here as people 
execute more complex transactions, and we should ideally have semantics that 
will work well into the future – including if we later introduce interactive 
transactions.





From: Patrick McFadin 
Date: Saturday, 11 June 2022 at 15:33
To: dev 
Subject: Re: CEP-15 multi key transaction syntax
I think the syntax is evolving into something pretty complicated, which may be 
warranted but I wanted to take a step back and be a bit more reflective on what 
we are trying to accomplish.

For context, my questions earlier were based on my 20+ years of using SQL 
transactions across different systems. That's my personal bias when I see the 
word "database transaction" in this case. When you start a SQL transaction, you 
are creating a branch of your data that you can operate with until you reach 
your desired state and then merge it back with a commit. Or if you don't like 
what you see, use a rollback and act like it never happened. That was the 
thinking when I asked about interactive sessions. If you are using a driver, 
that all happens in a batch. I realize that is out of scope here, but that's 
probably knowledge that is pre-installed in the majority of the user community.

Getting to the point, which is developer experience. I'm seeing a philosophical 
fork in the road which hopefully will generate some comments in the larger user 
community.

Path 1)
Mimic what's already been available in the SQL community, using existing CQL 
syntax. (SQL Example using JDBC: https://www.baeldung.com/java-jdbc-auto-commit)

Path 2)
Chart a new direction with new syntax

I genuinely don't have a clear answer, but I would love hearing from people on 
what they think.

Patrick

On Fri, Jun 10, 2022 at 12:07 PM 
bened...@apache.org 
mailto:bened...@apache.org>> wrote:
This might also permit us to remove one result set (the success/failure one) 
and return instead an exception if the transaction is aborted. This is also 
more consistent with SQL, if memory serves. That might conflict with returning 
the other result sets in the event of abort (though that’s up to us 
ultimately), but it feels like a nicer API for the user – depending on how 
these exceptions are surfaced in client APIs.

From: bened...@apache.org 
mailto:bened...@apache.org>>
Date: Friday, 10 June 2022 at 19:59
To: dev@cassandra.apache.org 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
So, thinking on it myself some more, I think if there’s an option that 
*doesn’t* require the user to 

Re: CEP-15 multi key transaction syntax

2022-06-12 Thread Li Boxuan
Thank you team for this exciting update! I just joined the dev mailing list to 
take part in this discussion. I am not a Cassandra developer and haven’t 
understood Accord myself yet, so my questions are more from a user’s standpoint.

> The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we only 
> get one chance to make this API right.

I agree that COMMIT IF syntax is ambiguous. When I first saw the syntax, I took 
it for granted that the condition was evaluated against the state before the 
updates, but it turned out to be the opposite. Thus I prefer the IF (Boolean 
expr) ABORT TRANSACTION idea. In addition, when the transaction is large and 
there are many conditions, using the COMMIT IF syntax might make the CQL query 
uglier and developers’ life harder. Another very subtle point is if there are 
many conditions combined using AND clauses, wouldn't it make the execution 
slightly slower because, for each SELECT statement, you would have to check 
every condition? The IF (Boolean expr) ABORT TRANSACTION would suffer less 
because users may tend to put the condition closer to the related SELECT 
statement.

> read-only transactions involving multiple tables will definitely be supported.

Would you consider allowing users to start a read-only transaction explicitly 
like BEGIN TRANSACTION READONLY? This could help catch some developers’ bugs 
like unintentional updates. This might also give Cassandra a hint for 
optimization.

Finally, I wonder if the community would be interested in idempotency support. 
DynamoDB has this interesting feature 
(https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html#transaction-apis-txwriteitems),
 which guards the situation where the same transaction is submitted multiple 
times due to a connection time-out or other connectivity issue. I have no idea 
how that is implemented under the hood and I don’t even know if this is 
technically possible with the Accord design, but I thought it would be 
interesting to think about.

Best regards,
Boxuan


On Jun 12, 2022, at 7:31 AM, bened...@apache.org 
wrote:

> I would love hearing from people on what they think.

^^ It would be great to have more participants in this conversation

> For context, my questions earlier were based on my 20+ years of using SQL 
> transactions across different systems.

We probably don’t come from a very different place. I spent too many years with 
T-SQL.

> When you start a SQL transaction, you are creating a branch of your data that 
> you can operate with until you reach your desired state and then merge it 
> back with a commit.

That’s the essential complexity we’re grappling with: how much do we permit 
your “branch” to do, how do we let you express it, and how do we let you 
express conditions?

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

Right now, we have the issue that read-your-writes introduces some complexity 
to the semantics, particularly around the conditions of execution.

LWTs impose conditions on the state of all records prior to execution, but 
their API has a lot of shortcomings. The proposal of COMMIT IF (Boolean expr) 
is most consistent with this approach. This can be confusing, though, if the 
condition is evaluated on a value that has been updated by a prior statement in 
the batch – what value does this global condition get evaluated against?*

SQL has no such concept, but also SQL is designed to be interactive. Depending 
on the dialect there’s probably a lot of ways to do this non-interactively in 
SQL, but we probably cannot cheaply replicate the functionality exactly as we 
do not (yet) support interactive transactions that they were designed for. To 
submit a conditional non-interactive transaction in SQL, you would likely use

IF (X) THEN
ROLLBACK
RETURN (ERRCODE)
END IF

or

IF (X) THEN RAISERROR

So, that is in essence the question we are currently asking: do we want to have 
a more LWT-like approach (and if so, how do we address this complexity for the 
user), or do we want a more SQL-like approach (and if so, how do we modify it 
to make non-interactive transactions convenient, and implementation tractable)

* This is anyway a shortcoming of existing batches, I think? So it might be we 
can sweep it under the rug, but I think it will be more relevant here as people 
execute more complex transactions, and we should ideally have semantics that 
will work well into the future – including if we later introduce interactive 
transactions.





From: Patrick McFadin mailto:pmcfa...@gmail.com>>
Date: Saturday, 11 June 2022 at 15:33
To: dev mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
I think the syntax is evolving into something pretty complicated, which may be 
warranted but

Re: CEP-15 multi key transaction syntax

2022-06-12 Thread Li Boxuan
Correcting my typo:

>  I took it for granted that the condition was evaluated against the state 
> before the updates

I took it for granted that the condition was evaluated against the state AFTER 
the updates

On Jun 12, 2022, at 11:07 AM, Li Boxuan 
mailto:libox...@connect.hku.hk>> wrote:

Thank you team for this exciting update! I just joined the dev mailing list to 
take part in this discussion. I am not a Cassandra developer and haven’t 
understood Accord myself yet, so my questions are more from a user’s standpoint.

> The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we only 
> get one chance to make this API right.

I agree that COMMIT IF syntax is ambiguous. When I first saw the syntax, I took 
it for granted that the condition was evaluated against the state after the 
updates, but it turned out to be the opposite. Thus I prefer the IF (Boolean 
expr) ABORT TRANSACTION idea. In addition, when the transaction is large and 
there are many conditions, using the COMMIT IF syntax might make the CQL query 
uglier and developers’ life harder. Another very subtle point is if there are 
many conditions combined using AND clauses, wouldn't it make the execution 
slightly slower because, for each SELECT statement, you would have to check 
every condition? The IF (Boolean expr) ABORT TRANSACTION would suffer less 
because users may tend to put the condition closer to the related SELECT 
statement.

> read-only transactions involving multiple tables will definitely be supported.

Would you consider allowing users to start a read-only transaction explicitly 
like BEGIN TRANSACTION READONLY? This could help catch some developers’ bugs 
like unintentional updates. This might also give Cassandra a hint for 
optimization.

Finally, I wonder if the community would be interested in idempotency support. 
DynamoDB has this interesting feature 
(https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html#transaction-apis-txwriteitems),
 which guards the situation where the same transaction is submitted multiple 
times due to a connection time-out or other connectivity issue. I have no idea 
how that is implemented under the hood and I don’t even know if this is 
technically possible with the Accord design, but I thought it would be 
interesting to think about.

Best regards,
Boxuan


On Jun 12, 2022, at 7:31 AM, bened...@apache.org 
wrote:

> I would love hearing from people on what they think.

^^ It would be great to have more participants in this conversation

> For context, my questions earlier were based on my 20+ years of using SQL 
> transactions across different systems.

We probably don’t come from a very different place. I spent too many years with 
T-SQL.

> When you start a SQL transaction, you are creating a branch of your data that 
> you can operate with until you reach your desired state and then merge it 
> back with a commit.

That’s the essential complexity we’re grappling with: how much do we permit 
your “branch” to do, how do we let you express it, and how do we let you 
express conditions?

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

Right now, we have the issue that read-your-writes introduces some complexity 
to the semantics, particularly around the conditions of execution.

LWTs impose conditions on the state of all records prior to execution, but 
their API has a lot of shortcomings. The proposal of COMMIT IF (Boolean expr) 
is most consistent with this approach. This can be confusing, though, if the 
condition is evaluated on a value that has been updated by a prior statement in 
the batch – what value does this global condition get evaluated against?*

SQL has no such concept, but also SQL is designed to be interactive. Depending 
on the dialect there’s probably a lot of ways to do this non-interactively in 
SQL, but we probably cannot cheaply replicate the functionality exactly as we 
do not (yet) support interactive transactions that they were designed for. To 
submit a conditional non-interactive transaction in SQL, you would likely use

IF (X) THEN
ROLLBACK
RETURN (ERRCODE)
END IF

or

IF (X) THEN RAISERROR

So, that is in essence the question we are currently asking: do we want to have 
a more LWT-like approach (and if so, how do we address this complexity for the 
user), or do we want a more SQL-like approach (and if so, how do we modify it 
to make non-interactive transactions convenient, and implementation tractable)

* This is anyway a shortcoming of existing batches, I think? So it might be we 
can sweep it under the rug, but I think it will be more relevant here as people 
execute more complex transactions, and we should ideally have semantics that 
will work well into the future – including if we later introduce interactiv

Re: CEP-15 multi key transaction syntax

2022-06-12 Thread bened...@apache.org
Welcome Li, and thanks for your input

> When I first saw the syntax, I took it for granted that the condition was 
> evaluated against the state AFTER the updates

Depending what you mean, I think this is one of the options being considered. 
At least, it seems this syntax is most likely to be evaluated against the 
values written by preceding statements in the batch, but not the statement 
itself (or later ones), as this could lead to nonsensical statements like

BEGIN TRANSACTION
UPDATE tbl SET v = 1 WHERE key = 1 AS tbl
COMMIT TRANSACTION IF tbl.v = 0

Where y is never 0 afterwards, so this never succeeds. I take it in this simple 
case you would expect the condition to be evaluated against the state prior to 
the statement (i.e. the initial state)?

But we have a blank slate, so every option is available to us! We just need to 
make sure it makes sense to the user, even in uncommon cases.

> The IF (Boolean expr) ABORT TRANSACTION would suffer less because users may 
> tend to put the condition closer to the related SELECT statement.

This is probably not going to matter in practice. The SELECTs all happen 
upfront no matter what the CQL might look like, and the UPDATE all happen only 
after the IF conditions are evaluated. This is all just a question of how the 
user expresses things.

In future we may offer interactive transactions, or transactions that are 
multi-step, in which case this would be more relevant and could have an 
efficiency impact.

> Would you consider allowing users to start a read-only transaction explicitly 
> like BEGIN TRANSACTION READONLY?

Good question. I would be OK with this, for sure, and will defer to the 
opinions of others here. There won’t be any optimisation impact, as we simply 
check if the transaction contains any updates, but some validation could be 
helpful for the user.

> Finally, I wonder if the community would be interested in idempotency support.

This is something that has been considered, and that Accord is able to support 
(in a couple of ways), but as an end-to-end feature this requires client 
support and other scaffolding that is not currently planned/scheduled. The 
simplest (least robust) approach is for the server to include the transaction’s 
identifier in its timeout, so that it be queried by the client to establish if 
it has been made durable. This should be quite easy to deliver on the 
server-side, but would require some application or client integration, and is 
unreliable in the face of coordinator failure (so the transaction id is unknown 
to the client). The more complete approach is for the client to include an 
idempotency token in its submission to the server, and for C* to record this 
alongside the transaction id, and for some bounded time window to either reject 
re-submissions of this token or to evaluate it as a no-op. This requires much 
tighter integration from the clients, and more work server-side.

Which is simply to say, this is on our radar but I can’t make promises about 
what form it will take, or when it will arrive, only that it has been planned 
for enough to ensure we can achieve it when resources permit.

From: Li Boxuan 
Date: Sunday, 12 June 2022 at 16:14
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Correcting my typo:

>  I took it for granted that the condition was evaluated against the state 
> before the updates


I took it for granted that the condition was evaluated against the state AFTER 
the updates



On Jun 12, 2022, at 11:07 AM, Li Boxuan 
mailto:libox...@connect.hku.hk>> wrote:

Thank you team for this exciting update! I just joined the dev mailing list to 
take part in this discussion. I am not a Cassandra developer and haven’t 
understood Accord myself yet, so my questions are more from a user’s standpoint.

> The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we only 
> get one chance to make this API right.

I agree that COMMIT IF syntax is ambiguous. When I first saw the syntax, I took 
it for granted that the condition was evaluated against the state after the 
updates, but it turned out to be the opposite. Thus I prefer the IF (Boolean 
expr) ABORT TRANSACTION idea. In addition, when the transaction is large and 
there are many conditions, using the COMMIT IF syntax might make the CQL query 
uglier and developers’ life harder. Another very subtle point is if there are 
many conditions combined using AND clauses, wouldn't it make the execution 
slightly slower because, for each SELECT statement, you would have to check 
every condition? The IF (Boolean expr) ABORT TRANSACTION would suffer less 
because users may tend to put the condition closer to the related SELECT 
statement.

> read-only transactions involving multiple tables will definitely be supported.

Would you consider allowing users to start a read-only transaction explicitly 
like BEGIN TRANSACTION READONLY? This could help catch some developers’ bugs 
like unintentional updates. Th