Re: [DISCUSS] Change code style guide WRT to @Override in subclasses / interface implementations

2022-06-08 Thread bened...@apache.org
I’ve opened a PR: https://github.com/apache/cassandra-website/pull/137

Not sure what our commit norms are for the website, but I’m assuming we would 
normally expect a +1 from somebody else.

From: Dinesh Joshi 
Date: Saturday, 4 June 2022 at 19:59
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Change code style guide WRT to @Override in subclasses / 
interface implementations
sounds good. Lazy consensus it is.
On Jun 4, 2022, at 11:09 AM, bened...@apache.org wrote:

I think lazy consensus is good enough here, since there has been no dissent so 
far as I can tell. It’s easier to modify if we assume lazy consensus until a 
dispute arises. If anyone wants to escalate to a formal vote, feel free to say 
so.

I’ll update the wiki in a couple of days; we can always roll back if a 
dissenting voice appears.


From: Dinesh Joshi 
Date: Friday, 3 June 2022 at 18:34
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Change code style guide WRT to @Override in subclasses / 
interface implementations
Let’s bring it to vote? We can update the docs as we evolve the guidance but I 
think it’s in a good enough shape to publish.

On Jun 3, 2022, at 9:07 AM, bened...@apache.org wrote:

I always ask if we’re ready, get a few acks, then one or two new queries come 
out of the woodwork.

Perhaps I will just publish, and we can start addressing these queries in a 
follow-up process.

From: Dinesh Joshi 
Date: Friday, 3 June 2022 at 16:57
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Change code style guide WRT to @Override in subclasses / 
interface implementations
I don’t think the guide has yet been published to the official website, has it? 
Maybe we should just get it out there.
On Jun 3, 2022, at 8:54 AM, bened...@apache.org wrote:

Somebody hasn’t looked at the new style guide*, the conversation for which 
keeps rolling on and so it never quite gets promoted to the wiki. It says:

Always use @Override annotations when implementing abstract or interface 
methods or overriding a parent method.

* 
https://docs.google.com/document/d/1sjw0crb0clQin2tMgZLt_ob4hYfLJYaU4lRX722htTo


From: Josh McKenzie 
Date: Friday, 3 June 2022 at 16:14
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Change code style guide WRT to @Override in subclasses / 
interface implementations
> Avoid redundant @Override annotations when implementing abstract or interface 
> methods.
I'd argue they're not redundant. We're humans and infinitely fallible. :)

+1 to changing this to just always annotate for all the reasons you enumerate.

On Fri, Jun 3, 2022, at 10:16 AM, Alex Petrov wrote:
Right, my thinking matches what David has mentioned:

https://issues.apache.org/jira/browse/CASSANDRA-16096
https://lists.apache.org/thread/mkskwxn921t5bkfmnog032qvnyjk82t7

I'll make sure to update the style guide itself, too, since it looks like there 
was a vote, and intellij file is updated, just need to fixup the website.


On Fri, Jun 3, 2022, at 4:02 PM, Dinesh Joshi wrote:
So your proposal is to always add override annotation? Or are there situations 
where you don’t want to add them?


On Jun 3, 2022, at 6:53 AM, Alex Petrov  wrote:

Hi everyone,

In our style guide [1], we have a following statement:

> Avoid redundant @Override annotations when implementing abstract or interface 
> methods.

I'd like to suggest we change this.

@Override annotation in subclasses might be annoying when you're writing the 
code for the first time, or reading already familiar code, but when you're 
working on large changes and have complex class hierarchies, or multiple 
overloads for the method, it's easy to overlook methods that were not marked as 
overrides, and leave a wrong method in the code, or misinterpret the call chain.

I think @Override annotations are extremely useful and serve their purpose, 
especially when refactoring: I can change the interface, and will not only be 
pointed to all classes that do not implement the new version (which compiler 
will do anyways), but also will be pointed to the classes that, to the human 
eye, may look like they're overriding the method, but in fact they do not.

More concrete example: there is an abstract class between the interface and a 
concrete implementation: you change the interface, modify the method in the 
abstract class, but then forget to change the signature in the overriden 
implementation of the concrete class, and get a behaviour from the abstract 
class rather then concrete implementation.

The question is not about taste or code aesthetics, but about making 
maintaining a large codebase that has a lot of complexity and that was evolving 
over many years simpler. If you could provide an example where @Override would 
be counter-productive or overly burdensome, we could compare this cost of 
maintenance with the cost of potential errors.

Thank you,
--Alex

[1] https://cassandra.apache.org/_/development/code_style.html




Re: CEP-15 multi key transaction syntax

2022-06-08 Thread Blake Eggleston
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).

I hadn’t thought about that... using intermediate or even post update values in 
condition evaluation or function calls seems like it would make it difficult to 
understand why a condition is or is not applying. On the other hand, it would 
powerful, especially when using things like database generated values in 
queries (auto incrementing integer clustering keys or server generated 
timeuuids being examples that come to mind). Additionally, if we return these 
values, I guess that would solve the visibility issues I’m worried about. 

Agreed intermediate values would be straightforward to calculate though.

> On Jun 6, 2022, at 4:33 PM, bened...@apache.org wrote:
> 
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).
>  
> I think that for it to be intuitive we need it to make sense sequentially, 
> which means either calculating it or restricting what can be stated (or 
> abandoning the syntax).
>  
> If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
> overlapping DELETE (and as many SELECT as you like) that would perhaps make 
> it simple enough? Require for now that SELECTS go first, then DELETE and then 
> INSERT/UPDATE (or vice versa, depending what we want to make simple)?
>  
> FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
> restricted to single rows we are updating, so we could simply maintain a 
> collections of rows and upsert into them as we process the execution. Most 
> transactions won’t need it, I suspect, so we don’t need to worry about 
> perfect efficiency.
>  
>  
> From: Blake Eggleston 
> Date: Tuesday, 7 June 2022 at 00:21
> To: dev@cassandra.apache.org 
> Subject: Re: CEP-15 multi key transaction syntax
> 
> That's a good question. I'd lean towards returning the final state of things, 
> although I could understand expecting to see intermediate state. Regarding 
> range tombstones, we could require them to precede any updates like selects, 
> but there's still the question of how to handle multiple updates to the same 
> cell when the user has requested we return the post-update state of the cell.
> 
> 
> On Jun 6, 2022, at 4:00 PM, bened...@apache.org  
> wrote:
>  
> > if multiple updates end up touching the same cell, I’d expect the last one 
> > to win
>  
> Hmm, yes I suppose range tombstones are a plausible and reasonable thing to 
> mix with inserts over the same key range.
>  
> What’s your present thinking about the idea of handling returning the values 
> as of a given point in the sequential execution then?
>  
> The succinct syntax is I think highly desirable for user experience, but this 
> does complicate it a bit if we want to remain intuitive.
>  
>  
>  
>  
> From: Blake Eggleston mailto:beggles...@apple.com>>
> Date: Monday, 6 June 2022 at 23:17
> To: dev@cassandra.apache.org  
> mailto:dev@cassandra.apache.org>>
> Subject: Re: CEP-15 multi key transaction syntax
> 
> Hi all,
> 
> Thanks for all the input and questions so far. Glad people are excited about 
> this!
> 
> I didn’t have any free time to respond this weekend, although it looks like 
> Benedict has responded to most of the questions so far, so if I don’t respond 
> to a question you asked here, you can interpret that as “what Benedict said” 
> :).
> 
> 
> Jeff, 
> 
> > Is there a new keyword for “partition (not) exists” or is it inferred by 
> > the select?
> 
> I'd intended this to be worked out from the select statement, ie: if the 
> read/reference is null/empty, then it doesn't exist, whether you're 
> interested in the partition, row, or cell. So I don't think we'd need an 
> additional keyword there. I think that would address partition exists / not 
> exists use cases?
> 
> > And would you allow a transaction that had > 1 named select and no 
> > modification statements, but commit if 1=1 ?
> 
> Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) 
> would be part of the syntax. Also, running a txn that doesn’t contain updates 
> wouldn’t be a problem.
> 
> Patrick, I think Benedict answered your questions? Glad you got the joke :)
> 
> Alex,
> 
> > 1. Dependant SELECTs
> > 2. Dependant UPDATEs
> > 3. UPDATE from secondary index (or SASI)
> > 5. UPDATE with predicate on non-primary key
> 
> The full primary key must be defined as part of the statement, and you can’t 
> use column references to define them, so you wouldn’t be able to run these.
> 
> > MVs
> 
> To prevent being spread too thin, both in syntax design and implementation 
> work, I’d like to limit read

Re: CEP-15 multi key transaction syntax

2022-06-08 Thread bened...@apache.org
I imagine that conditions would be evaluated against the state prior to the 
execution of statement against which it is being evaluated, but after the prior 
statements. I think that should be OK to reason about.

i.e. we might have a contrived example like:

BEGIN TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1

So q1 would read a = 0, but q2 would read a = 1 and set a = 2.

I think this is probably adequately intuitive? It is a bit atypical to have 
conditions that wrap the whole transaction though.

We have another option, of course, which is to offer IF x ROLLBACK TRANSACTION, 
which is closer to SQL, which would translate the above to:

BEGIN TRANSACTION
SELECT a FROM tbl WHERE k = 1 AS q0
IF q0.a != 0 ROLLBACK TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
IF q1.a != 1 ROLLBACK TRANSACTION
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION

This is less succinct, but might be more familiar to users. We could also 
eschew the ability to read from UPDATE statements entirely in this scheme, as 
this would then look very much like SQL.


From: Blake Eggleston 
Date: Wednesday, 8 June 2022 at 20:59
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).

I hadn’t thought about that... using intermediate or even post update values in 
condition evaluation or function calls seems like it would make it difficult to 
understand why a condition is or is not applying. On the other hand, it would 
powerful, especially when using things like database generated values in 
queries (auto incrementing integer clustering keys or server generated 
timeuuids being examples that come to mind). Additionally, if we return these 
values, I guess that would solve the visibility issues I’m worried about.

Agreed intermediate values would be straightforward to calculate though.


On Jun 6, 2022, at 4:33 PM, bened...@apache.org 
wrote:

It affects not just RETURNING but also conditions that are evaluated against 
the row, and if we in future permit using the values from one select in a 
function call / write to another table (which I imagine we will).

I think that for it to be intuitive we need it to make sense sequentially, 
which means either calculating it or restricting what can be stated (or 
abandoning the syntax).

If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
overlapping DELETE (and as many SELECT as you like) that would perhaps make it 
simple enough? Require for now that SELECTS go first, then DELETE and then 
INSERT/UPDATE (or vice versa, depending what we want to make simple)?

FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
restricted to single rows we are updating, so we could simply maintain a 
collections of rows and upsert into them as we process the execution. Most 
transactions won’t need it, I suspect, so we don’t need to worry about perfect 
efficiency.


From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Tuesday, 7 June 2022 at 00:21
To: dev@cassandra.apache.org 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
That's a good question. I'd lean towards returning the final state of things, 
although I could understand expecting to see intermediate state. Regarding 
range tombstones, we could require them to precede any updates like selects, 
but there's still the question of how to handle multiple updates to the same 
cell when the user has requested we return the post-update state of the cell.



On Jun 6, 2022, at 4:00 PM, bened...@apache.org 
wrote:

> if multiple updates end up touching the same cell, I’d expect the last one to 
> win

Hmm, yes I suppose range tombstones are a plausible and reasonable thing to mix 
with inserts over the same key range.

What’s your present thinking about the idea of handling returning the values as 
of a given point in the sequential execution then?

The succinct syntax is I think highly desirable for user experience, but this 
does complicate it a bit if we want to remain intuitive.




From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Monday, 6 June 2022 at 23:17
To: dev@cassandra.apache.org 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
Hi all,

Thanks for all the input and questions so far. Glad people are excited about 
this!

I didn’t have any free time to respond this weekend, although it looks like 
Benedict has responded to most of the questions so far, so if I don’t respond 
to a question you asked here, you can interpret that as “what Benedi