Re: [DISCUSS] Clear rules about sstable versioning and downgrade support

2023-01-16 Thread Claude Warren, Jr via dev
What does this mean for the Trie sstable format?

Would it perhaps make sense to version the sstable upgrader (and future
downgrader) based on the highest version they understand?  for example
sstableupgrader version N will handle the n? versions so it can upgrade
from m? while sstabledowngrader version N can downgrade from m? to
n, where  is the lower limit that downgrader can write.  In
this way users trying to upgrade from out of support major releases can use
multiple upgraders and downgraders to move forward and/or recover from a
failed upgrade.

In this case the sstablesupgrader and sstablesdowngrader should report any
tables that it can not handle and not execute the upgrade/downgrade.


On Fri, Jan 13, 2023 at 1:17 PM Jacek Lewandowski <
lewandowski.ja...@gmail.com> wrote:

> Hi,
>
> I'd like to bring that topic to your attention. I think that we should
> think about allowing users to downgrade under certain conditions. For
> example, always allow for downgrading to any previous minor release.
>
> Clear rules should make users feel safer when upgrading and perhaps
> encourage trying Cassandra at all.
>
> One of the things related to that is sstable format version. It consists
> of major and minor components and is incremented independently from
> Cassandra releases. One rule here is that a Cassandra release producing
> sstables at version XY should be able to read any sstable with version
> (X-1)* and X* (which means that all the minor future versions X. Perhaps we
> could make some commitment to change major sstable format only with new
> major release?
>
> What do you think?
>
> Thanks
> - - -- --- -  -
> Jacek Lewandowski
>


Merging CEP-15 to trunk

2023-01-16 Thread Benedict
Hi Everyone, I hope you all had a lovely holiday period. 

Those who have been following along will have seen a steady drip of progress 
into the cep-15-accord feature branch over the past year. We originally 
discussed that feature branches would merge periodically into trunk, and we are 
long overdue. With the release of 4.1, it’s time to rectify that. 

Barring complaints, I hope to merge the current state to trunk within a couple 
of weeks. This remains a work in progress, but will permit users to experiment 
with the alpha version of Accord and provide feedback, as well as phase the 
changes to trunk.


Intra-project dependencies

2023-01-16 Thread Benedict
Those of us who have developed the in-jvm-dtest-api will know that the 
project’s approach to developing libraries is untenable for more complex 
projects. Accord makes this a pressing concern, but we would also benefit from 
separating utilities to their own library for use by the dtest-api and Accord, 
and also the dtest-api could be evolved more easily. I see basically four 
options:

Continue requiring a release vote for every library change prior to importing 
it to another project
Bring libraries into the C* tree
Deploy snapshots for our internal modules, that we import until release (at 
which publish real jars)
Use git submodules

I think (4) is the only sensible option. It permits different development 
branches to easily reference different versions of a library and also to easily 
co-develop them - from within the same IDE project, even. We might even be able 
to avoid additional release votes as a matter of course, by compiling the 
library source as part of the C* release, so that they adopt the C* release 
vote (or else we may periodically release the library as we do other releases)

(1) is unworkable, as this means a release vote for every patch that affects 
both a library and a module that imports it. Even for the dtest-api, this has 
been excruciating and has lead to workarounds.
(2) incurs additional development work porting changes between C* versions that 
share a logical library version, and the potential for the “same” version to 
drift unintentionally
(3) makes parallel branch development trickier as each needs its own unique 
snapshot release, and release votes become more complicated as we require a 
chain of releases, one for each dependency.

I might be missing something, does anyone have any other bright ideas for 
approaching this problem? I’m sure there are plenty of opinions out there.


Re: Intra-project dependencies

2023-01-16 Thread Mick Semb Wever
>
> I think (4) is the only sensible option. It permits different development
> branches to easily reference different versions of a library and also to
> easily co-develop them - from within the same IDE project, even.
>


I've only heard horror stories about submodules. The challenges they bring
should be listed and checked.

Some examples
 - you can no longer just `git clone …`  (and we clone automatically in a
number of places)
 - same with `git pull …` (easy to be left with out-of-sync submodules)
 - permanence from a git SHA no longer exists
 - our releases get more complicated (our source tarballs are the asf
releases)
 - handling patches cover submodules
 - switching branches, and using git worktrees, during dv

I see (4) as a valid option, but concerned with the amount of work required
to adapt to it, and whether it will only make it more complicated for the
new contributor to the project. For example the first two points are
addressed by remembering to do `git clone --recurse-submodules …` . And who
would be fixing our build/test/release scripts to accommodate?

Not blockers, just concerns we need to raise and address.



> We might even be able to avoid additional release votes as a matter of
> course, by compiling the library source as part of the C* release, so that
> they adopt the C* release vote (or else we may periodically release the
> library as we do other releases)
>


Yes. Today we do a combination of first (3) and then (1). Having to make a
release of these libraries every time a patch (/feature branch) is
completing is a horror story in itself.

I might be missing something, does anyone have any other bright ideas for
> approaching this problem? I’m sure there are plenty of opinions out there.
>


Looking at the problem with these libraries,
 - we don't need releases
 - we don't have a clean version/branch parity to in-tree
 - codebase parity between branches is important for upgrade tests (shared
classloaders)

 For (2) you mention drift of the "same" version, isn't this only a problem
for dtest-api in the way it requires the "same version" of a codebase for
compatibility when running upgrade tests? As the library itself no longer
has an explicit version, what I presume you meant by logical version.

To begin with, I'm leaning towards (2) because it is a cognitive re-use of
our release branches, and the problems around classpath compatibility can
be solved with tests. I'm sure I'm not seeing the whole picture though…


Re: [DISCUSSION] Cassandra's code style and source code analysis

2023-01-16 Thread Miklosovic, Stefan
Based on the voting we should go with option 4?

Two weeks passed without anybody joining so I guess folks are all happy with 
that or this just went unnoticed?

Let's give it time until the end of this week (Friday 12:00 UTC).

Regards


From: Maxim Muzafarov 
Sent: Tuesday, January 3, 2023 14:31
To: dev@cassandra.apache.org
Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




Folks,

Let me update the voting status and put together everything we have so
far. We definitely need more votes to have a solid foundation for this
change, so I encourage everyone to consider the options above and
share them in this thread.


Total for each applicable option:

4-th option -- 4 votes
3-rd option -- 3 votes
5-th option -- 1 vote
1-st option -- 0 votes
2-nd option -- 0 votes

On Thu, 22 Dec 2022 at 22:06, Mick Semb Wever  wrote:
>>
>>
>> 3. Total 5 groups, 2968 files to change
>>
>> ```
>> org.apache.cassandra.*
>> [blank line]
>> java.*
>> [blank line]
>> javax.*
>> [blank line]
>> all other imports
>> [blank line]
>> static all other imports
>> ```
>
>
>
> 3, then 5.
> There's lots under com.*, net.*, org.* that is essentially the same as "all 
> other imports", what's the reason to separate those?
>
> My preference for 3 is simply that imports are by default collapsed, and if I 
> expand them it's the dependencies on other cassandra stuff I'm first 
> grokking. It's also our only imports that lead to cyclic dependencies (which 
> we're not good at).


Re: Intra-project dependencies

2023-01-16 Thread Benedict
I guess option 5 is what we have today in cep-15, have the build file grab the 
relevant SHA for the library. This way you maintain a precise SHA for builds 
and scripts don’t have to be modified.

I believe this is also possible with git submodules, but I’m happy to bake this 
into our build file instead with a script.

> As the library itself no longer has an explicit version, what I presume you 
> meant by logical version.

I mean that we don’t want to duplicate work and risk diverging functionality 
maintaining what is logically (meant to be) the same code. As a developer, 
managing all of the branches is already a pain. Libraries naturally have a 
different development cadence to the main project, and tying the development to 
C* versions is just an unnecessary ongoing burden (and risk) that we can avoid.

There’s also an additional penalty: we reduce the likelihood of outside 
contributions to the libraries only. Accord in particular I hope will attract 
outside interest if it is maintained as a separate library, as it has broad 
applicability, and is likely of academic interest. Tying it to C* version and 
more tightly coupling with C* codebase makes that less likely. We might also 
see folk interested in our utilities, or our simulator framework, if they were 
to be maintained separately, which could be valuable.




> On 16 Jan 2023, at 10:49, Mick Semb Wever  wrote:
> 
> 
>> I think (4) is the only sensible option. It permits different development 
>> branches to easily reference different versions of a library and also to 
>> easily co-develop them - from within the same IDE project, even.
> 
> 
> I've only heard horror stories about submodules. The challenges they bring 
> should be listed and checked.
> 
> Some examples
>  - you can no longer just `git clone …`  (and we clone automatically in a 
> number of places)
>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>  - permanence from a git SHA no longer exists
>  - our releases get more complicated (our source tarballs are the asf 
> releases)
>  - handling patches cover submodules
>  - switching branches, and using git worktrees, during dv
> 
> I see (4) as a valid option, but concerned with the amount of work required 
> to adapt to it, and whether it will only make it more complicated for the new 
> contributor to the project. For example the first two points are addressed by 
> remembering to do `git clone --recurse-submodules …` . And who would be 
> fixing our build/test/release scripts to accommodate?
> 
> Not blockers, just concerns we need to raise and address.
> 
>  
>> We might even be able to avoid additional release votes as a matter of 
>> course, by compiling the library source as part of the C* release, so that 
>> they adopt the C* release vote (or else we may periodically release the 
>> library as we do other releases)
> 
> 
> Yes. Today we do a combination of first (3) and then (1). Having to make a 
> release of these libraries every time a patch (/feature branch) is completing 
> is a horror story in itself.
> 
>> I might be missing something, does anyone have any other bright ideas for 
>> approaching this problem? I’m sure there are plenty of opinions out there.
> 
> 
> Looking at the problem with these libraries, 
>  - we don't need releases
>  - we don't have a clean version/branch parity to in-tree
>  - codebase parity between branches is important for upgrade tests (shared 
> classloaders)
> 
>  For (2) you mention drift of the "same" version, isn't this only a problem 
> for dtest-api in the way it requires the "same version" of a codebase for 
> compatibility when running upgrade tests? As the library itself no longer has 
> an explicit version, what I presume you meant by logical version.
> 
> To begin with, I'm leaning towards (2) because it is a cognitive re-use of 
> our release branches, and the problems around classpath compatibility can be 
> solved with tests. I'm sure I'm not seeing the whole picture though…
> 


Re: [DISCUSSION] Cassandra's code style and source code analysis

2023-01-16 Thread Maxim Muzafarov
Stefan,

Thank you for bringing this topic up. I'll prepare the PR shortly with
option 4, so everyone can take a look at the amount of changes. This
does not force us to go exactly this path, but it may shed light on
changes in general.

What exactly we're planning to do in the PR:

1. Checkstyle AvoidStarImport rule, so no star imports will be allowed.
2. Checkstyle ImportOrder rule, for controlling the order.
3. The IDE code style configuration for Intellij IDEA, NetBeans, and
Eclipse (it doesn't exist for Eclipse yet).
4. The import order according to option 4:

```
java.*
javax.*
[blank line]
com.*
net.*
org.*
[blank line]
org.apache.cassandra.*
[blank line]
all other imports
[blank line]
static all other imports
```



On Mon, 16 Jan 2023 at 12:39, Miklosovic, Stefan
 wrote:
>
> Based on the voting we should go with option 4?
>
> Two weeks passed without anybody joining so I guess folks are all happy with 
> that or this just went unnoticed?
>
> Let's give it time until the end of this week (Friday 12:00 UTC).
>
> Regards
>
> 
> From: Maxim Muzafarov 
> Sent: Tuesday, January 3, 2023 14:31
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
>
> Folks,
>
> Let me update the voting status and put together everything we have so
> far. We definitely need more votes to have a solid foundation for this
> change, so I encourage everyone to consider the options above and
> share them in this thread.
>
>
> Total for each applicable option:
>
> 4-th option -- 4 votes
> 3-rd option -- 3 votes
> 5-th option -- 1 vote
> 1-st option -- 0 votes
> 2-nd option -- 0 votes
>
> On Thu, 22 Dec 2022 at 22:06, Mick Semb Wever  wrote:
> >>
> >>
> >> 3. Total 5 groups, 2968 files to change
> >>
> >> ```
> >> org.apache.cassandra.*
> >> [blank line]
> >> java.*
> >> [blank line]
> >> javax.*
> >> [blank line]
> >> all other imports
> >> [blank line]
> >> static all other imports
> >> ```
> >
> >
> >
> > 3, then 5.
> > There's lots under com.*, net.*, org.* that is essentially the same as "all 
> > other imports", what's the reason to separate those?
> >
> > My preference for 3 is simply that imports are by default collapsed, and if 
> > I expand them it's the dependencies on other cassandra stuff I'm first 
> > grokking. It's also our only imports that lead to cyclic dependencies 
> > (which we're not good at).


Re: Intra-project dependencies

2023-01-16 Thread Josh McKenzie
>  - permanence from a git SHA no longer exists
With the caveat that I haven't worked w/submodules before and only know about 
them from a cursory search, it looks like git-submodule status would show us 
the sha for submodules and we could have parent projects reference specific 
shas to pull for submodules to build? 
https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203

It seems like our use case is one of the primary ones git submodules are 
designed to address.

On Mon, Jan 16, 2023, at 6:40 AM, Benedict wrote:
> 
> I guess option 5 is what we have today in cep-15, have the build file grab 
> the relevant SHA for the library. This way you maintain a precise SHA for 
> builds and scripts don’t have to be modified.
> 
> I believe this is also possible with git submodules, but I’m happy to bake 
> this into our build file instead with a script.
> 
> > As the library itself no longer has an explicit version, what I presume you 
> > meant by logical version.
> 
> I mean that we don’t want to duplicate work and risk diverging functionality 
> maintaining what is logically (meant to be) the same code. As a developer, 
> managing all of the branches is already a pain. Libraries naturally have a 
> different development cadence to the main project, and tying the development 
> to C* versions is just an unnecessary ongoing burden (and risk) that we can 
> avoid.
> 
> There’s also an additional penalty: we reduce the likelihood of outside 
> contributions to the libraries only. Accord in particular I hope will attract 
> outside interest if it is maintained as a separate library, as it has broad 
> applicability, and is likely of academic interest. Tying it to C* version and 
> more tightly coupling with C* codebase makes that less likely. We might also 
> see folk interested in our utilities, or our simulator framework, if they 
> were to be maintained separately, which could be valuable.
> 
> 
> 
> 
>> On 16 Jan 2023, at 10:49, Mick Semb Wever  wrote:
>> 
>>> I think (4) is the only sensible option. It permits different development 
>>> branches to easily reference different versions of a library and also to 
>>> easily co-develop them - from within the same IDE project, even.
>>> 
>> 
>> 
>> I've only heard horror stories about submodules. The challenges they bring 
>> should be listed and checked.
>> 
>> Some examples
>>  - you can no longer just `git clone …`  (and we clone automatically in a 
>> number of places)
>>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>>  - permanence from a git SHA no longer exists
>>  - our releases get more complicated (our source tarballs are the asf 
>> releases)
>>  - handling patches cover submodules
>>  - switching branches, and using git worktrees, during dv
>> 
>> I see (4) as a valid option, but concerned with the amount of work required 
>> to adapt to it, and whether it will only make it more complicated for the 
>> new contributor to the project. For example the first two points are 
>> addressed by remembering to do `git clone --recurse-submodules …` . And who 
>> would be fixing our build/test/release scripts to accommodate?
>> 
>> Not blockers, just concerns we need to raise and address.
>> 
>>  
>>> We might even be able to avoid additional release votes as a matter of 
>>> course, by compiling the library source as part of the C* release, so that 
>>> they adopt the C* release vote (or else we may periodically release the 
>>> library as we do other releases)
>>> 
>> 
>> 
>> Yes. Today we do a combination of first (3) and then (1). Having to make a 
>> release of these libraries every time a patch (/feature branch) is 
>> completing is a horror story in itself.
>> 
>> 
>>> I might be missing something, does anyone have any other bright ideas for 
>>> approaching this problem? I’m sure there are plenty of opinions out there.
>>> 
>> 
>> 
>> Looking at the problem with these libraries, 
>>  - we don't need releases
>>  - we don't have a clean version/branch parity to in-tree
>>  - codebase parity between branches is important for upgrade tests (shared 
>> classloaders)
>> 
>>  For (2) you mention drift of the "same" version, isn't this only a problem 
>> for dtest-api in the way it requires the "same version" of a codebase for 
>> compatibility when running upgrade tests? As the library itself no longer 
>> has an explicit version, what I presume you meant by logical version.
>> 
>> To begin with, I'm leaning towards (2) because it is a cognitive re-use of 
>> our release branches, and the problems around classpath compatibility can be 
>> solved with tests. I'm sure I'm not seeing the whole picture though…
>> 


Re: Merging CEP-15 to trunk

2023-01-16 Thread J. D. Jordan
I haven’t been following the progress of the feature branch, but I would think 
the requirements for merging it into master would be the same as any other 
merge.

A subset of those requirements being:
Is the code to be merged in releasable quality? Is it disabled by a feature 
flag by default if not?
Do all the tests pass?
Has there been review and +1 by two committer?

If the code in the feature branch meets all of the merging criteria of the 
project then I see no reason to keep it in a feature branch for ever.

-Jeremiah


> On Jan 16, 2023, at 3:21 AM, Benedict  wrote:
> 
> Hi Everyone, I hope you all had a lovely holiday period. 
> 
> Those who have been following along will have seen a steady drip of progress 
> into the cep-15-accord feature branch over the past year. We originally 
> discussed that feature branches would merge periodically into trunk, and we 
> are long overdue. With the release of 4.1, it’s time to rectify that. 
> 
> Barring complaints, I hope to merge the current state to trunk within a 
> couple of weeks. This remains a work in progress, but will permit users to 
> experiment with the alpha version of Accord and provide feedback, as well as 
> phase the changes to trunk.


Re: Merging CEP-15 to trunk

2023-01-16 Thread Benedict
My goal isn’t to ask if others believe we have the right to merge, only to 
invite feedback if there are any specific concerns. Large pieces of work like 
this cause headaches and concerns for other contributors, and so it’s only 
polite to provide notice of our intention, since probably many haven’t even 
noticed the feature branch developing.

The relevant standard for merging a feature branch, if we want to rehash that, 
is that it is feature- and bug-neutral by default, ie that a release could be 
cut afterwards while maintaining our usual quality standards, and that the 
feature is disabled by default, yes. It is not however feature-complete or 
production read as a feature; that would prevent any incremental merging of 
feature development.

> On 16 Jan 2023, at 15:57, J. D. Jordan  wrote:
> 
> I haven’t been following the progress of the feature branch, but I would 
> think the requirements for merging it into master would be the same as any 
> other merge.
> 
> A subset of those requirements being:
> Is the code to be merged in releasable quality? Is it disabled by a feature 
> flag by default if not?
> Do all the tests pass?
> Has there been review and +1 by two committer?
> 
> If the code in the feature branch meets all of the merging criteria of the 
> project then I see no reason to keep it in a feature branch for ever.
> 
> -Jeremiah
> 
> 
>> On Jan 16, 2023, at 3:21 AM, Benedict  wrote:
>> 
>> Hi Everyone, I hope you all had a lovely holiday period. 
>> 
>> Those who have been following along will have seen a steady drip of progress 
>> into the cep-15-accord feature branch over the past year. We originally 
>> discussed that feature branches would merge periodically into trunk, and we 
>> are long overdue. With the release of 4.1, it’s time to rectify that. 
>> 
>> Barring complaints, I hope to merge the current state to trunk within a 
>> couple of weeks. This remains a work in progress, but will permit users to 
>> experiment with the alpha version of Accord and provide feedback, as well as 
>> phase the changes to trunk.



Re: Intra-project dependencies

2023-01-16 Thread Henrik Ingo
Hi all

I was invited to share my thoughts just as an additional and somewhat fresh
point of view...

On a high level: We talked through this with Mick and a few other
colleagues, and I/we came to the conclusion that fundamentally all of the
mentioned options 1-5 are just variations of the same problem being moved
into different places. That is to say there's complexity here that isn't
going away. This is good to recognize just so that you realize when you are
feeling that you don't quite like any of the available options, this is
why. At least for me it's somehow calming when you understand this is the
reality and you just have to face it.



It seems to me the fundamental question is, will the link from Cassandra to
Accord be a 1-1 or n-1 mapping? Superficially we would think that Accord is
a separate library and all future Cassandra versions will use the same
version of Accord. But is that really the case? Isn't it rather expected
that Cassandra 5.1, 5.2 will probably come with more and improved
functionality than what will be in 5.0? Fundamental additional
functionality like less-than-strict consistency, mvcc, and maybe one day
interactive transactions. What I'd expect to see here is then that the
separate Accord library in fact is rather closely tied to its parent
Cassandra release, and as soon as we have a 5.0 GA, we will also need a
stable Accord branch to match, while significant new development will
happen in tandem with Cassandra trunk/5.1?

If the latter scenario is more likely, then having Accord in tree seems to
be the easiest choice, because it's actually not the case that you are
maintaining three copies of the same codebase. (Anymore than that's the
case for all Cassandra code.)


FWIW MongoDB does in fact use option 5: At build time there's a bash script
that copies your separate WiredTiger repository into the source tree, then
compiles. A major reason they did it this way was to support the possiblity
that some modules would be closed source. Git modules would not work - or
at least be very annoying - for a case where the parent directory is open
source but the sub-module is not available to everyone.

But having used the MongoDB system - which apparently is also Accord's
system today - I'd say in the end it's just git submodules in a different
form: You get to choose whether to manage the library dependency with git
or a bash script.


Finally, and I know this was stated before as well, the Accord developers
seem hopeful that Accord will gain interest and contributors from outside
of Cassandra, and as such warrants its own repository. For arguments sake,
let's assume this is possible/likely...



I didn't write this email to support any particular alternative or opinion.
But combining the above thoughts, I feel like there is a conclusion
sticking out of this email... And the conclusion is of the form "we can
always change this later"...

It seems to me that especially now, and probably also after 5.0 is
released, we will in any case only have a single version of Cassandra using
a singgle version of Accord. So at least to begin with, it's the least
effort to keep it in-tree, to avoid the overhead of git submodules, or
having to make releases, etc.  The separate constituency of Accord-only
developers can be satisfied by keeping Accord in its own directory, could
even be a top-level directory, and a small build system that can build a
separate Accord jar file. You could even maintain a separate github repo
just for advertising purposes. (Just like github.com/apache/cassandra isn't
the official git repo for Cassandra either.)

If both of my assumptions above are true, then from a Cassandra point of
view there's not much benefit having Accord separately, but if 3rd party
interest in Accord grows, then it could indeed be split out into its own
repository at that point. The main motivation then would be to service
those 3rd party developers who aren't so interested in Cassandra. But this
split would only be done once it is known that such a community will form.

Thoughts?

henrik


On Mon, Jan 16, 2023 at 2:30 PM Josh McKenzie  wrote:

>  - permanence from a git SHA no longer exists
>
> With the caveat that I haven't worked w/submodules before and only know
> about them from a cursory search, it looks like git-submodule status would
> show us the sha for submodules and we could have parent projects reference
> specific shas to pull for submodules to build?
> https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203
> 
>
> It seems like our use case is one of the primary ones git submodules are
> designed to address.
>
> On Mon, Jan 16, 2023, at 6:40 AM, Benedict wrote:
>
>
> I guess option 5 

Re: Merging CEP-15 to trunk

2023-01-16 Thread Josh McKenzie
Did we document this or is it in an email thread somewhere?

I don't see it on the confluence wiki nor does a cursory search of ponymail 
turn it up.

What was it for something flagged experimental?
1. Same tests pass on the branch as to the root it's merging back to
2. 2 committers eyes on (author + reviewer or 2 reviewers, etc)
3. Disabled by default w/flag to enable

So really only the 3rd thing is different right? Probably ought to add an 
informal step 4 which Benedict is doing here which is "hit the dev ML w/a 
DISCUSS thread about the upcoming merge so it's on people's radar and they can 
coordinate".

On Mon, Jan 16, 2023, at 11:08 AM, Benedict wrote:
> My goal isn’t to ask if others believe we have the right to merge, only to 
> invite feedback if there are any specific concerns. Large pieces of work like 
> this cause headaches and concerns for other contributors, and so it’s only 
> polite to provide notice of our intention, since probably many haven’t even 
> noticed the feature branch developing.
> 
> The relevant standard for merging a feature branch, if we want to rehash 
> that, is that it is feature- and bug-neutral by default, ie that a release 
> could be cut afterwards while maintaining our usual quality standards, and 
> that the feature is disabled by default, yes. It is not however 
> feature-complete or production read as a feature; that would prevent any 
> incremental merging of feature development.
> 
> > On 16 Jan 2023, at 15:57, J. D. Jordan  wrote:
> > 
> > I haven’t been following the progress of the feature branch, but I would 
> > think the requirements for merging it into master would be the same as any 
> > other merge.
> > 
> > A subset of those requirements being:
> > Is the code to be merged in releasable quality? Is it disabled by a feature 
> > flag by default if not?
> > Do all the tests pass?
> > Has there been review and +1 by two committer?
> > 
> > If the code in the feature branch meets all of the merging criteria of the 
> > project then I see no reason to keep it in a feature branch for ever.
> > 
> > -Jeremiah
> > 
> > 
> >> On Jan 16, 2023, at 3:21 AM, Benedict  wrote:
> >> 
> >> Hi Everyone, I hope you all had a lovely holiday period. 
> >> 
> >> Those who have been following along will have seen a steady drip of 
> >> progress into the cep-15-accord feature branch over the past year. We 
> >> originally discussed that feature branches would merge periodically into 
> >> trunk, and we are long overdue. With the release of 4.1, it’s time to 
> >> rectify that. 
> >> 
> >> Barring complaints, I hope to merge the current state to trunk within a 
> >> couple of weeks. This remains a work in progress, but will permit users to 
> >> experiment with the alpha version of Accord and provide feedback, as well 
> >> as phase the changes to trunk.
> 
> 


Re: Merging CEP-15 to trunk

2023-01-16 Thread J. D. Jordan
My only concern to merging (given all normal requirements are met) would be if there was a possibility that the feature would never be finished. Given all of the excitement and activity around accord, I do not think that is a concern here. So I see no reason not to merge incremental progress behind a feature flag.-JeremiahOn Jan 16, 2023, at 10:30 AM, Josh McKenzie  wrote:Did we document this or is it in an email thread somewhere?I don't see it on the confluence wiki nor does a cursory search of ponymail turn it up.What was it for something flagged experimental?1. Same tests pass on the branch as to the root it's merging back to2. 2 committers eyes on (author + reviewer or 2 reviewers, etc)3. Disabled by default w/flag to enableSo really only the 3rd thing is different right? Probably ought to add an informal step 4 which Benedict is doing here which is "hit the dev ML w/a DISCUSS thread about the upcoming merge so it's on people's radar and they can coordinate".On Mon, Jan 16, 2023, at 11:08 AM, Benedict wrote:My goal isn’t to ask if others believe we have the right to merge, only to invite feedback if there are any specific concerns. Large pieces of work like this cause headaches and concerns for other contributors, and so it’s only polite to provide notice of our intention, since probably many haven’t even noticed the feature branch developing.The relevant standard for merging a feature branch, if we want to rehash that, is that it is feature- and bug-neutral by default, ie that a release could be cut afterwards while maintaining our usual quality standards, and that the feature is disabled by default, yes. It is not however feature-complete or production read as a feature; that would prevent any incremental merging of feature development.> On 16 Jan 2023, at 15:57, J. D. Jordan  wrote:> > I haven’t been following the progress of the feature branch, but I would think the requirements for merging it into master would be the same as any other merge.> > A subset of those requirements being:> Is the code to be merged in releasable quality? Is it disabled by a feature flag by default if not?> Do all the tests pass?> Has there been review and +1 by two committer?> > If the code in the feature branch meets all of the merging criteria of the project then I see no reason to keep it in a feature branch for ever.> > -Jeremiah> > >> On Jan 16, 2023, at 3:21 AM, Benedict  wrote:>> >> Hi Everyone, I hope you all had a lovely holiday period. >> >> Those who have been following along will have seen a steady drip of progress into the cep-15-accord feature branch over the past year. We originally discussed that feature branches would merge periodically into trunk, and we are long overdue. With the release of 4.1, it’s time to rectify that. >> >> Barring complaints, I hope to merge the current state to trunk within a couple of weeks. This remains a work in progress, but will permit users to experiment with the alpha version of Accord and provide feedback, as well as phase the changes to trunk.

Re: Intra-project dependencies

2023-01-16 Thread Benedict
How often have we modified Paxos? There are currently no proposals to develop Accord further after the initial release. So I think it is very likely that Accord development will decouple from Cassandra version, unless there is significant external interest that drives it.Furthermore, the idea of revisiting this later is problematic. We can’t easily decouple Accord if it becomes tightly coupled with Cassandra, which becomes quite likely when the builds are co-dependent. We have spent great effort developing them separately to avoid this.You can’t go back later and recover lost interest. How many projects have adopted ZAB, versus Raft?None of this also addresses the wider need for reform of our approach here, for both the dtest-api and the simulator.I’m still not clear on the concrete downsides of maintaining a separate tree here? Could somebody explain what they expect to go wrong? I respond to Mick’s points below, as I do not recognise them from our experience. We’ve been doing this for a year without incident.I will note we explicitly voted to develop Accord as a standalone library as part of the original CEP, and this was debated quite extensively, so to change that will require a new dedicated DISCUSS thread and vote. - you can no longer just `git clone …`  (and we clone automatically in a number of places)Yes you can, if your build script updates the sub modules like we have been doing. - same with `git pull …` (easy to be left with out-of-sync submodules)Yes you can, again for the same reason. This is no different to ensuring your libraries are in sync, which must be done on every pull or checkout. - permanence from a git SHA no longer existsIt is intact, if you link to a SHA. - our releases get more complicated (our source tarballs are the asf releases)How? - handling patches cover submodulesHow is this different to patches affecting multiple versions in C*? - switching branches, and using git worktrees, during dvElaborate? I don’t see any problem, but I might be missing something.On 16 Jan 2023, at 16:11, Henrik Ingo  wrote:Hi allI was invited to share my thoughts just as an additional and somewhat fresh point of view...On a high level: We talked through this with Mick and a few other colleagues, and I/we came to the conclusion that fundamentally all of the mentioned options 1-5 are just variations of the same problem being moved into different places. That is to say there's complexity here that isn't going away. This is good to recognize just so that you realize when you are feeling that you don't quite like any of the available options, this is why. At least for me it's somehow calming when you understand this is the reality and you just have to face it.It seems to me the fundamental question is, will the link from Cassandra to Accord be a 1-1 or n-1 mapping? Superficially we would think that Accord is a separate library and all future Cassandra versions will use the same version of Accord. But is that really the case? Isn't it rather expected that Cassandra 5.1, 5.2 will probably come with more and improved functionality than what will be in 5.0? Fundamental additional functionality like less-than-strict consistency, mvcc, and maybe one day interactive transactions. What I'd expect to see here is then that the separate Accord library in fact is rather closely tied to its parent Cassandra release, and as soon as we have a 5.0 GA, we will also need a stable Accord branch to match, while significant new development will happen in tandem with Cassandra trunk/5.1?If the latter scenario is more likely, then having Accord in tree seems to be the easiest choice, because it's actually not the case that you are maintaining three copies of the same codebase. (Anymore than that's the case for all Cassandra code.)FWIW MongoDB does in fact use option 5: At build time there's a bash script that copies your separate WiredTiger repository into the source tree, then compiles. A major reason they did it this way was to support the possiblity that some modules would be closed source. Git modules would not work - or at least be very annoying - for a case where the parent directory is open source but the sub-module is not available to everyone. But having used the MongoDB system - which apparently is also Accord's system today - I'd say in the end it's just git submodules in a different form: You get to choose whether to manage the library dependency with git or a bash script.Finally, and I know this was stated before as well, the Accord developers seem hopeful that Accord will gain interest and contributors from outside of Cassandra, and as such warrants its own repository. For arguments sake, let's assume this is possible/likely...I didn't write this email to support any particular alternative or opinion. But combining the above thoughts, I feel like there is a conclusion sticking out of this email... And the conclusion is of the form "we can always change this later"...It seems to me that especially now, and proba

Re: Merging CEP-15 to trunk

2023-01-16 Thread Benedict
That’s fair, though for long term contributors probably the risk is relatively low on that front. I guess that’s something we can perhaps raise as part of each CEP if we envisage it taking several months of development?> Did we document this or is it in an email thread somewhere?It’s probably buried in one of the many threads we’ve had about related topics on releases and development. We’ve definitely discussed feature branches before, and I recall discussing a goal of merging ~quarterly. But perhaps like most sub topics it didn’t get enough visibility, in which case this thread I suppose can serve as a dedicated rehash and we can formalise whatever falls out.In theory as Jeremiah says there’s only the normal merge criteria. But that includes nobody saying no to a piece of work or raising concerns, and advertising the opportunity to say no is important for that IMO.On 16 Jan 2023, at 16:36, J. D. Jordan  wrote:My only concern to merging (given all normal requirements are met) would be if there was a possibility that the feature would never be finished. Given all of the excitement and activity around accord, I do not think that is a concern here. So I see no reason not to merge incremental progress behind a feature flag.-JeremiahOn Jan 16, 2023, at 10:30 AM, Josh McKenzie  wrote:Did we document this or is it in an email thread somewhere?I don't see it on the confluence wiki nor does a cursory search of ponymail turn it up.What was it for something flagged experimental?1. Same tests pass on the branch as to the root it's merging back to2. 2 committers eyes on (author + reviewer or 2 reviewers, etc)3. Disabled by default w/flag to enableSo really only the 3rd thing is different right? Probably ought to add an informal step 4 which Benedict is doing here which is "hit the dev ML w/a DISCUSS thread about the upcoming merge so it's on people's radar and they can coordinate".On Mon, Jan 16, 2023, at 11:08 AM, Benedict wrote:My goal isn’t to ask if others believe we have the right to merge, only to invite feedback if there are any specific concerns. Large pieces of work like this cause headaches and concerns for other contributors, and so it’s only polite to provide notice of our intention, since probably many haven’t even noticed the feature branch developing.The relevant standard for merging a feature branch, if we want to rehash that, is that it is feature- and bug-neutral by default, ie that a release could be cut afterwards while maintaining our usual quality standards, and that the feature is disabled by default, yes. It is not however feature-complete or production read as a feature; that would prevent any incremental merging of feature development.> On 16 Jan 2023, at 15:57, J. D. Jordan  wrote:> > I haven’t been following the progress of the feature branch, but I would think the requirements for merging it into master would be the same as any other merge.> > A subset of those requirements being:> Is the code to be merged in releasable quality? Is it disabled by a feature flag by default if not?> Do all the tests pass?> Has there been review and +1 by two committer?> > If the code in the feature branch meets all of the merging criteria of the project then I see no reason to keep it in a feature branch for ever.> > -Jeremiah> > >> On Jan 16, 2023, at 3:21 AM, Benedict  wrote:>> >> Hi Everyone, I hope you all had a lovely holiday period. >> >> Those who have been following along will have seen a steady drip of progress into the cep-15-accord feature branch over the past year. We originally discussed that feature branches would merge periodically into trunk, and we are long overdue. With the release of 4.1, it’s time to rectify that. >> >> Barring complaints, I hope to merge the current state to trunk within a couple of weeks. This remains a work in progress, but will permit users to experiment with the alpha version of Accord and provide feedback, as well as phase the changes to trunk.

Re: Merging CEP-15 to trunk

2023-01-16 Thread Jacek Lewandowski
Hi,

It would be great if some documentation got added to the code you want to
merge. To me, it would be enough to just quickly
characterize on the class level what is the class for and what are the
expectations. This is especially important for Accord API
classes because now it is hard to review whether the implementation in
Cassandra conforms the API requirements.

Given it is going to be a possibility for others to try Accord before the
release, it would be good to create some CQL syntax
documentation, something like a chapter in
https://cassandra.apache.org/doc/latest/cassandra/cql/index.html but for
unreleased
Cassandra version or a blog post, so that the syntax is known to the users
and they can quickly get into speed, hopefully
reporting any problems soon.

- - -- --- -  -
Jacek Lewandowski


On Mon, 16 Jan 2023 at 17:52, Benedict  wrote:

> That’s fair, though for long term contributors probably the risk is
> relatively low on that front. I guess that’s something we can perhaps raise
> as part of each CEP if we envisage it taking several months of development?
>
> > Did we document this or is it in an email thread somewhere?
>
> It’s probably buried in one of the many threads we’ve had about related
> topics on releases and development. We’ve definitely discussed feature
> branches before, and I recall discussing a goal of merging ~quarterly. But
> perhaps like most sub topics it didn’t get enough visibility, in which case
> this thread I suppose can serve as a dedicated rehash and we can formalise
> whatever falls out.
>
> In theory as Jeremiah says there’s only the normal merge criteria. But
> that includes nobody saying no to a piece of work or raising concerns, and
> advertising the opportunity to say no is important for that IMO.
>
> On 16 Jan 2023, at 16:36, J. D. Jordan  wrote:
>
> 
> My only concern to merging (given all normal requirements are met) would
> be if there was a possibility that the feature would never be finished.
> Given all of the excitement and activity around accord, I do not think that
> is a concern here. So I see no reason not to merge incremental progress
> behind a feature flag.
>
> -Jeremiah
>
> On Jan 16, 2023, at 10:30 AM, Josh McKenzie  wrote:
>
> 
> Did we document this or is it in an email thread somewhere?
>
> I don't see it on the confluence wiki nor does a cursory search of
> ponymail turn it up.
>
> What was it for something flagged experimental?
> 1. Same tests pass on the branch as to the root it's merging back to
> 2. 2 committers eyes on (author + reviewer or 2 reviewers, etc)
> 3. Disabled by default w/flag to enable
>
> So really only the 3rd thing is different right? Probably ought to add an
> informal step 4 which Benedict is doing here which is "hit the dev ML w/a
> DISCUSS thread about the upcoming merge so it's on people's radar and they
> can coordinate".
>
> On Mon, Jan 16, 2023, at 11:08 AM, Benedict wrote:
>
> My goal isn’t to ask if others believe we have the right to merge, only to
> invite feedback if there are any specific concerns. Large pieces of work
> like this cause headaches and concerns for other contributors, and so it’s
> only polite to provide notice of our intention, since probably many haven’t
> even noticed the feature branch developing.
>
> The relevant standard for merging a feature branch, if we want to rehash
> that, is that it is feature- and bug-neutral by default, ie that a release
> could be cut afterwards while maintaining our usual quality standards, and
> that the feature is disabled by default, yes. It is not however
> feature-complete or production read as a feature; that would prevent any
> incremental merging of feature development.
>
> > On 16 Jan 2023, at 15:57, J. D. Jordan 
> wrote:
> >
> > I haven’t been following the progress of the feature branch, but I
> would think the requirements for merging it into master would be the same
> as any other merge.
> >
> > A subset of those requirements being:
> > Is the code to be merged in releasable quality? Is it disabled by a
> feature flag by default if not?
> > Do all the tests pass?
> > Has there been review and +1 by two committer?
> >
> > If the code in the feature branch meets all of the merging criteria of
> the project then I see no reason to keep it in a feature branch for ever.
> >
> > -Jeremiah
> >
> >
> >> On Jan 16, 2023, at 3:21 AM, Benedict  wrote:
> >>
> >> Hi Everyone, I hope you all had a lovely holiday period.
> >>
> >> Those who have been following along will have seen a steady drip of
> progress into the cep-15-accord feature branch over the past year. We
> originally discussed that feature branches would merge periodically into
> trunk, and we are long overdue. With the release of 4.1, it’s time to
> rectify that.
> >>
> >> Barring complaints, I hope to merge the current state to trunk within a
> couple of weeks. This remains a work in progress, but will permit users to
> experiment with the alpha version of Accord a

Re: Merging CEP-15 to trunk

2023-01-16 Thread Benedict
Could you file a bug report with more detail about which classes you think are lacking adequate documentation in each project, and what you would like to see? We would welcome your participation.On 16 Jan 2023, at 17:28, Jacek Lewandowski  wrote:Hi,It would be great if some documentation got added to the code you want to merge. To me, it would be enough to just quickly characterize on the class level what is the class for and what are the expectations. This is especially important for Accord API classes because now it is hard to review whether the implementation in Cassandra conforms the API requirements. Given it is going to be a possibility for others to try Accord before the release, it would be good to create some CQL syntax documentation, something like a chapter in https://cassandra.apache.org/doc/latest/cassandra/cql/index.html but for unreleasedCassandra version or a blog post, so that the syntax is known to the users and they can quickly get into speed, hopefully reporting any problems soon.- - -- --- -  -Jacek LewandowskiOn Mon, 16 Jan 2023 at 17:52, Benedict  wrote:That’s fair, though for long term contributors probably the risk is relatively low on that front. I guess that’s something we can perhaps raise as part of each CEP if we envisage it taking several months of development?> Did we document this or is it in an email thread somewhere?It’s probably buried in one of the many threads we’ve had about related topics on releases and development. We’ve definitely discussed feature branches before, and I recall discussing a goal of merging ~quarterly. But perhaps like most sub topics it didn’t get enough visibility, in which case this thread I suppose can serve as a dedicated rehash and we can formalise whatever falls out.In theory as Jeremiah says there’s only the normal merge criteria. But that includes nobody saying no to a piece of work or raising concerns, and advertising the opportunity to say no is important for that IMO.On 16 Jan 2023, at 16:36, J. D. Jordan  wrote:My only concern to merging (given all normal requirements are met) would be if there was a possibility that the feature would never be finished. Given all of the excitement and activity around accord, I do not think that is a concern here. So I see no reason not to merge incremental progress behind a feature flag.-JeremiahOn Jan 16, 2023, at 10:30 AM, Josh McKenzie  wrote:Did we document this or is it in an email thread somewhere?I don't see it on the confluence wiki nor does a cursory search of ponymail turn it up.What was it for something flagged experimental?1. Same tests pass on the branch as to the root it's merging back to2. 2 committers eyes on (author + reviewer or 2 reviewers, etc)3. Disabled by default w/flag to enableSo really only the 3rd thing is different right? Probably ought to add an informal step 4 which Benedict is doing here which is "hit the dev ML w/a DISCUSS thread about the upcoming merge so it's on people's radar and they can coordinate".On Mon, Jan 16, 2023, at 11:08 AM, Benedict wrote:My goal isn’t to ask if others believe we have the right to merge, only to invite feedback if there are any specific concerns. Large pieces of work like this cause headaches and concerns for other contributors, and so it’s only polite to provide notice of our intention, since probably many haven’t even noticed the feature branch developing.The relevant standard for merging a feature branch, if we want to rehash that, is that it is feature- and bug-neutral by default, ie that a release could be cut afterwards while maintaining our usual quality standards, and that the feature is disabled by default, yes. It is not however feature-complete or production read as a feature; that would prevent any incremental merging of feature development.> On 16 Jan 2023, at 15:57, J. D. Jordan  wrote:> > I haven’t been following the progress of the feature branch, but I would think the requirements for merging it into master would be the same as any other merge.> > A subset of those requirements being:> Is the code to be merged in releasable quality? Is it disabled by a feature flag by default if not?> Do all the tests pass?> Has there been review and +1 by two committer?> > If the code in the feature branch meets all of the merging criteria of the project then I see no reason to keep it in a feature branch for ever.> > -Jeremiah> > >> On Jan 16, 2023, at 3:21 AM, Benedict  wrote:>> >> Hi Everyone, I hope you all had a lovely holiday period. >> >> Those who have been following along will have seen a steady drip of progress into the cep-15-accord feature branch over the past year. We originally discussed that feature branches would merge periodically into trunk, and we are long overdue. With the release of 4.1, it’s time to rectify that. >> >> Barring complaints, I hope to merge the current state 

Re: Intra-project dependencies

2023-01-16 Thread Henrik Ingo
Hi Benedict

At least for my part, again, I'm not (yet) trying to argue for or  against
a particular alternative. So I think you'll find that if you allow a few
more iterations of discussion, we can gravitate to some good consensus. Or
failing that, we can at least gravitate around a small number of
alternatives and then argue about those :-D

It seems also in your email, the strongest argument for keeping a separate
library, is your desire or expectation that Accord would attract
significant 3rd party interest. And - this is btw also some advice Magnus
Carlsen would give - your main argument therefore is, if we expect we need
to make a specific move in the future, it's usually best to just do it
immediately.

I didn't write in my previous email, but I did have in mind that one
drawback with the proposal of later extracting Accord out of Cassandra into
its own repository would be to lose the history of commits. (At least
without significant effort to keep/recreate the history.) For example,
there could be commits in the Accord history that also edit files in
Cassandra. So yes, I agree that if this is a major goal, then keeping
Accord development in its own repository is the right choice.

This then leads to the question should the link from Cassandra to Accord be
via git sub-modules or via some bash code in the build system. I now
remember something that was a major problem for years in the MongoDB CI
system, and I believe this is also a problem with our dtests? That the
nightly CI system would just check out HEAD of each module, and then
compile them and run tests. This had the problem that it was impossible to
return to a specific failure, say, a week later, and expect to rebuild and
retest the same combination, because the system would just check out and
build whatever the HEAD was at that date. (The only way to test  the actual
SHA you had been bisecting or patching was to submit it as a patch to the
CI system. So if a test setup had 5 sub modules, and you were fixing a bug
in one of them, you had to "patch" the 4 other ones too, simply because
otherwise the CI system wouldn't check out the right position in their
history.)

So, whatever method we choose, it's important that our CI system and other
tools can know and track the correct and current SHA for each sub-module.
Presumably git sub-modules actually are the best answer to this need. How
have you dealt with this in Accord so far?


One point: I wouldn't directly compare dtest and Accord though. For a test
framework, it's the dtest framework that is consuming a Cassandra version,
while for Accord it's Cassandra that depends on a specific Accord version.
Because of this, the same solution may or may not be right for both of them.

henrik

On Mon, Jan 16, 2023 at 6:44 PM Benedict  wrote:

> How often have we modified Paxos?
>
> There are currently no proposals to develop Accord further after the
> initial release. So I think it is very likely that Accord development will
> decouple from Cassandra version, unless there is significant external
> interest that drives it.
>
> Furthermore, the idea of revisiting this later is problematic. We can’t
> easily decouple Accord if it becomes tightly coupled with Cassandra, which
> becomes quite likely when the builds are co-dependent. We have spent great
> effort developing them separately to avoid this.
>
> You can’t go back later and recover lost interest. How many projects have
> adopted ZAB, versus Raft?
>
> None of this also addresses the wider need for reform of our approach
> here, for both the dtest-api and the simulator.
>
> I’m still not clear on the concrete downsides of maintaining a separate
> tree here? Could somebody explain what they expect to go wrong? I respond
> to Mick’s points below, as I do not recognise them from our experience.
> We’ve been doing this for a year without incident.
>
> I will note we explicitly voted to develop Accord as a standalone library
> as part of the original CEP, and this was debated quite extensively, so to
> change that will require a new dedicated DISCUSS thread and vote.
>
>  - you can no longer just `git clone …`  (and we clone automatically in a
>> number of places)
>>
>> Yes you can, if your build script updates the sub modules like we have
> been doing.
>
>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>>
>> Yes you can, again for the same reason. This is no different to ensuring
> your libraries are in sync, which must be done on every pull or checkout.
>
>  - permanence from a git SHA no longer exists
>>
>> It is intact, if you link to a SHA.
>
>  - our releases get more complicated (our source tarballs are the asf
>> releases)
>>
>> How?
>
>  - handling patches cover submodules
>>
>> How is this different to patches affecting multiple versions in C*?
>
>  - switching branches, and using git worktrees, during dv
>>
>> Elaborate? I don’t see any problem, but I might be missing something.
>
> On 16 Jan 2023, at 16:11, Henrik Ingo

Re: Intra-project dependencies

2023-01-16 Thread Benedict
We have a build script that is invoked by ant to grab a specific SHA (or HEAD of a branch). We were previously just grabbing HEAD but this has the problems mentioned elsewhere in the thread, amongst others. I don’t think it probably matters much if we use a build script or submodules.I am driven in part by wanting to maintain the library status and not wanting to discard the work done to maintain this, but no less also by my expectation that tying Accord to C* version would entail additional maintenance burden (that might in the near term perhaps fall predominantly on me).I could be wrong in this prediction of course, but it seems to be a one-sided trade. I don’t think there‘s much extra work with separate repositories even in the worst case of a 1:1 mapping, and we can more easily reverse this decision if there’s no external interest and we really are just 1:1 for several releases.That said, clearly we don’t want to pursue this approach for every subsystem. So perhaps one of the decisive reasons is indeed the broader utility, but the fact the library is fully decoupled is by itself a strong reason IMO.I guess an interesting thought exercise to validate this is what other idealised subsystems I might want to apply this approach to. I’ll ponder that.On 16 Jan 2023, at 18:32, Henrik Ingo  wrote:Hi BenedictAt least for my part, again, I'm not (yet) trying to argue for or  against a particular alternative. So I think you'll find that if you allow a few more iterations of discussion, we can gravitate to some good consensus. Or failing that, we can at least gravitate around a small number of alternatives and then argue about those :-D It seems also in your email, the strongest argument for keeping a separate library, is your desire or expectation that Accord would attract significant 3rd party interest. And - this is btw also some advice Magnus Carlsen would give - your main argument therefore is, if we expect we need to make a specific move in the future, it's usually best to just do it immediately.I didn't write in my previous email, but I did have in mind that one drawback with the proposal of later extracting Accord out of Cassandra into its own repository would be to lose the history of commits. (At least without significant effort to keep/recreate the history.) For example, there could be commits in the Accord history that also edit files in Cassandra. So yes, I agree that if this is a major goal, then keeping Accord development in its own repository is the right choice.This then leads to the question should the link from Cassandra to Accord be via git sub-modules or via some bash code in the build system. I now remember something that was a major problem for years in the MongoDB CI system, and I believe this is also a problem with our dtests? That the nightly CI system would just check out HEAD of each module, and then compile them and run tests. This had the problem that it was impossible to return to a specific failure, say, a week later, and expect to rebuild and retest the same combination, because the system would just check out and build whatever the HEAD was at that date. (The only way to test  the actual SHA you had been bisecting or patching was to submit it as a patch to the CI system. So if a test setup had 5 sub modules, and you were fixing a bug in one of them, you had to "patch" the 4 other ones too, simply because otherwise the CI system wouldn't check out the right position in their history.)So, whatever method we choose, it's important that our CI system and other tools can know and track the correct and current SHA for each sub-module. Presumably git sub-modules actually are the best answer to this need. How have you dealt with this in Accord so far?One point: I wouldn't directly compare dtest and Accord though. For a test framework, it's the dtest framework that is consuming a Cassandra version, while for Accord it's Cassandra that depends on a specific Accord version. Because of this, the same solution may or may not be right for both of them.henrikOn Mon, Jan 16, 2023 at 6:44 PM Benedict  wrote:How often have we modified Paxos? There are currently no proposals to develop Accord further after the initial release. So I think it is very likely that Accord development will decouple from Cassandra version, unless there is significant external interest that drives it.Furthermore, the idea of revisiting this later is problematic. We can’t easily decouple Accord if it becomes tightly coupled with Cassandra, which becomes quite likely when the builds are co-dependent. We have spent great effort developing them separately to avoid this.You can’t go back later and recover lost interest. How many projects have adopted ZAB, versus Raft?None of this also addresses the wider need for reform of our approach here, for both the dtest-api and the simulator.I’m still not clear on the concrete downsides of maintaining a separate tree here? Could somebody explain what they expect to g

Re: Intra-project dependencies

2023-01-16 Thread Mick Semb Wever
>  - permanence from a git SHA no longer exists
>
> With the caveat that I haven't worked w/submodules before and only know
> about them from a cursory search, it looks like git-submodule status would
> show us the sha for submodules and …
>


That isn't one SHA, but a collection of SHAs.

I'm thinking about reproducible builds, switching between branches, and git
bisecting, this stuff needs to just work. A build that fails fast if a
submodule is not on a specific SHA helps but introduces more problems.



> we could have parent projects reference specific shas to pull for
> submodules to build?
> https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203
> 
>


Yes, we can enforce a 1:1 relationship from parent SHA to submodule SHAs,
but then what's the point: you have both the headache of submodules and
having to always commit to multiple branches and forward merge.

That is, with fixed parent-to-submodule SHA relationships, these new
challenges are introduced:
- patches are off submodule SHAs, not the submodule's HEAD,
- you need to be making commits to all branches (and forward merging)
anyway to update submodule SHAs,
- if development is active on trunk, and then you need an update on an
older branch, you have to accommodate to backporting all those trunk
changes (or introduce the same branching in the submodule),

IMHO submodules are just trading one set of problems for another. And
overall life is simpler if we reduce the cognitive burden to just what we
have today: forward merging.

Benedict, experience based on developing one feature against one branch
doesn't face the problems of working, and switching frequently, between
branches.

The problem of wanting an external repository for these libraries to
promote external non-cassandra consumers I would solve by exporting the
code out of cassandra (not trying to import it). Git history is easy to
keep/replicate. We were talking about doing this with the jamm library,
given its primary development is currently with C* but we want it to appear
as a standalone library (/github codebase).


Re: Intra-project dependencies

2023-01-16 Thread Benedict
 Benedict, experience based on developing one feature against one branch doesn't face the problems of working, and switching frequently, between branches.Mick, please take a look at the ongoing development. Over the last week I have been actively developing five separate PRs against each repository at once (ten in total), with not insignificant changes between them. I am quite experienced with actively developing against multiple branches, and of extrapolating this experience to multiple C* versions, and your hypothetical concerns do not invalidate that experience.- patches are off submodule SHAs, not the submodule's HEAD,A SHA would point to the HEAD of a given branch, at the time of merge, just by SHA? I’ve no idea what you imagine here, but this just ensures that a given SHA of the importing project continues to compile correctly when it is no longer HEAD. It does not mean there’s no HEAD that corresponds directly to the SHA of the importing project’s HEAD.- you need to be making commits to all branches (and forward merging) anyway to update submodule SHAs,Exactly as you would any library upgrade?- if development is active on trunk, and then you need an update on an older branch, you have to accommodate to backporting all those trunk changes (or introduce the same branching in the submodule),If you do feature development against Accord then you will obviously branch it? You would only make bug fixes to a bug fix branch. I’m not sure what you think is wrong here.On 16 Jan 2023, at 19:52, Mick Semb Wever  wrote: - permanence from a git SHA no longer existsWith the caveat that I haven't worked w/submodules before and only know about them from a cursory search, it looks like git-submodule status would show us the sha for submodules and …That isn't one SHA, but a collection of SHAs.I'm thinking about reproducible builds, switching between branches, and git bisecting, this stuff needs to just work. A build that fails fast if a submodule is not on a specific SHA helps but introduces more problems. we could have parent projects reference specific shas to pull for submodules to build? https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203Yes, we can enforce a 1:1 relationship from parent SHA to submodule SHAs, but then what's the point: you have both the headache of submodules and having to always commit to multiple branches and forward merge.That is, with fixed parent-to-submodule SHA relationships, these new challenges are introduced: - patches are off submodule SHAs, not the submodule's HEAD,- you need to be making commits to all branches (and forward merging) anyway to update submodule SHAs,- if development is active on trunk, and then you need an update on an older branch, you have to accommodate to backporting all those trunk changes (or introduce the same branching in the submodule),IMHO submodules are just trading one set of problems for another. And overall life is simpler if we reduce the cognitive burden to just what we have today: forward merging.Benedict, experience based on developing one feature against one branch doesn't face the problems of working, and switching frequently, between branches.The problem of wanting an external repository for these libraries to promote external non-cassandra consumers I would solve by exporting the code out of cassandra (not trying to import it). Git history is easy to keep/replicate. We were talking about doing this with the jamm library, given its primary development is currently with C* but we want it to appear as a standalone library (/github codebase).


Re: Intra-project dependencies

2023-01-16 Thread Mick Semb Wever
>
> … extrapolating this experience to multiple C* versions
>
>
To include forward-merging, bisecting old history, etc etc. that's a leap
of faith that I believe deserves the discussion.

- patches are off submodule SHAs, not the submodule's HEAD,
>
>
> A SHA would point to the HEAD of a given branch, at the time of merge,
> just by SHA? I’ve no idea what you imagine here, but this just ensures that
> a given SHA of the importing project continues to compile correctly when it
> is no longer HEAD. It does not mean there’s no HEAD that corresponds
> directly to the SHA of the importing project’s HEAD.
>


That wasn't my concern. Rather that you need to know in advance when the
SHA is not HEAD. You can't commit off a past SHA. Once you find out (and
how does this happen?) that the submodule code is not HEAD what do you then
do? What if fast-forwarding the submodule to HEAD's SHA breaks things, do
you now have to fix that or introduce branching in the submodule? If the
submodule doesn't have releases, is it doing versioning, and if not how are
branches distinguished?

Arn't these all fair enquiries to raise?

- you need to be making commits to all branches (and forward merging)
> anyway to update submodule SHAs,
>
>
> Exactly as you would any library upgrade?
>


Correct. submodules does not solve/remove the need to commit to multiple
branches and forward merge.
Furthermore submodules means at least one additional commit, and possibly
twice as many commits.


- if development is active on trunk, and then you need an update on an
> older branch, you have to accommodate to backporting all those trunk
> changes (or introduce the same branching in the submodule),
>
>
> If you do feature development against Accord then you will obviously
> branch it? You would only make bug fixes to a bug fix branch. I’m not sure
> what you think is wrong here.
>


That's not obvious, you stated that a goal was to avoid maintaining
multiple branches. Sure there's benefits to a lazy branching approach, but
it contradicts your initial motivations and introduces methodology changes
that are worth pointing out. What happens when there are multiple consumers
of Accord, and (like the situation we face with jamm) its HEAD is well in
front of anything C* is using.

As Henrik states, the underlying problem doesn't change, we're just
choosing between trade-offs. My concern is that we're not even doing a very
good job of choosing between the trade-offs. Based on past experiences with
submodules: that started with great excitement and led to tears and
frustration after a few years; I'm only pushing for a more thorough
discussion and proposal.


Re: Merging CEP-15 to trunk

2023-01-16 Thread Mick Semb Wever
Could you file a bug report with more detail about which classes you think
> are lacking adequate documentation in each project, and what you would like
> to see?
>


I suggest instead that we open a task ticket for the merge.

I 100% agree with merging work incrementally, under a feature flag, but the
pre-commit gateway here is higher than the previous tickets being worked
on. API changes, pre-commit test results, and high (/entry) level comments,
all deserve any extra eyeballs available.