Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-25 Thread bened...@apache.org
Hi Jeremiah,

My personal view is that work to modularise the codebase should be tied to 
specific use cases. If improved testing is the purpose of this work, I think it 
would help to include those improved tests that you plan to support as goals 
for the CEP.

If on the other hand some of this work is primarily intended to enable certain 
features, I personally think it would be preferable to tie them to those 
features - perhaps with their own CEP?


From: Jeremiah Jordan 
Date: Friday, 22 October 2021 at 16:24
To: Cassandra DEV 
Subject: [DISCUSS] CEP-18: Improving Modularity
Hi All,
As has been seen with the work already started in CEP-10, increasing the
modularity of our subsystems can improve their testability, and also the
ability to try new implementations without breaking things.

Our team has been working on doing this and CEP-18 has been created to
propose adding more modularity to a few different subsystems.
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-18%3A+Improving+Modularity

CASSANDRA-17044 has already been created for Schema Storage changes related
to this work and more JIRAs and PRs are to follow for the other subsystems
proposed in the CEP.

Thanks,
-Jeremiah Jordan


Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-25 Thread Henrik Ingo
Hi Benedict

This CEP is a bundle of APIs arising out of our recent work to re-architect
Cassandra into a more cloud native architecture. What our product marketing
has chosen to call "Serverless" is a variant of Cassandra where we have
separated compute from storage (coordinator vs data node), used S3-like
storage, and made various improvements to better support multi-tenancy in a
single Cassandra (Serverless) cluster. This whitepaper [1] explains this
work in detail for those of you interested to learn more. (Apologies that
it requires registration and the first page may at times sound a bit
marketingy, but it's really the most detailed report we have published so
far.)

[1] https://www.datastax.com/resources/whitepaper/astra-serverless

The above work was implemented in a way where by default a user can
continue to run Cassandra in the familiar "classic" way. The APIs
introduced by CEP-18 on the other hand allow alternate or additional
functionality to be provided, which in our case we have used to create a
"serverless" way of deploying a Cassandra cluster.

The logic behind proposing this bundle of APIs separately, is roughly for
these reasons:

The APIs touch existing code and functionality, so to minimize risk to the
next Cassandra release, it would make sense to try to complete merging this
work as early as possible in the development cycle. For the same reason,
keeping the new implementations out of this CEP allows us to focus review -
both of the CEP, and the eventual pull requests - on the APIs themselves,
whereas the related implementations (or plug-ins) would add to the scope
quite significantly. On the other hand non-default plugin functionality can
be added later with much lower risk.

Second, while it's completely fair to ask for context, why was this
particular refactoring or API done in the first place, the assumption for a
CEP like this one is that better defined interfaces, that are better
documented and come with better test coverage than existing code, should be
enough legs to stand on in itself. Also, in the best case a good API will
also enable other implementations than the one we had in mind when
developing the API, so we wouldn't want to tie the discussion too much into
the implementation that happened to be the first. (As an example of this
working out nicely, your own work in CASSANDRA-16926 was for you motivated
by enabling a new kind of testing, but it also just so happens it is the
same work that enables someone to implement remote file storage, which we
therefore could drop from this CEP-18.)

Conversely also, it was our expectation when proposing this CEP that
"better modularity" at least on a high level should be a fairly
straightforward conversation, while the actual plugins that make up our
"serverless" new architecture may reasonably ignite much more debate, or at
least questions as to how they work. As we have a backlog of several fairly
substantial CEPs lined up, we are trying to be very mindful of the
bandwidth of the developers on this list. For example, last week Jacek also
proposed CEP-17 for discussion. So we are trying to focus the discussion on
what's in CEP-17 and CEP-18 for now. (In addition I remember at least 2
CEPs that were discussed but not yet voted on. I don't know if this adds to
cognitive load for anyone else than myself.)

henrik

On Mon, Oct 25, 2021 at 12:39 PM bened...@apache.org 
wrote:

> Hi Jeremiah,
>
> My personal view is that work to modularise the codebase should be tied to
> specific use cases. If improved testing is the purpose of this work, I
> think it would help to include those improved tests that you plan to
> support as goals for the CEP.
>
> If on the other hand some of this work is primarily intended to enable
> certain features, I personally think it would be preferable to tie them to
> those features - perhaps with their own CEP?
>
>
> From: Jeremiah Jordan 
> Date: Friday, 22 October 2021 at 16:24
> To: Cassandra DEV 
> Subject: [DISCUSS] CEP-18: Improving Modularity
> Hi All,
> As has been seen with the work already started in CEP-10, increasing the
> modularity of our subsystems can improve their testability, and also the
> ability to try new implementations without breaking things.
>
> Our team has been working on doing this and CEP-18 has been created to
> propose adding more modularity to a few different subsystems.
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-18%3A+Improving+Modularity
>
> CASSANDRA-17044 has already been created for Schema Storage changes related
> to this work and more JIRAs and PRs are to follow for the other subsystems
> proposed in the CEP.
>
> Thanks,
> -Jeremiah Jordan
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]


Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-25 Thread David Capwell
I am cool with pluggability and making things easier to test; my main
comment for this work is that we must also define semantics around the APIs
and can't just create interfaces.  Simple example of this is CASSANDRA-17058
 (linked from the
CEP), i have made up my own interface below to explain.

interface Membership {
...
void addMember(AddressAndPort address);
}

This interface is extremely problematic as expectations are not clearly
defined.  Right now our membership changes are not atomic and depend on
propagation delay (and this is even in the same JVM), so in one
implementation this may be atomic, and we test against atomic (as
thats much simpler in testing), but then the actually implementation we
ship isn't...  aka 0 of our tests using these mocks matter; Also, this
doesn't work if we want to move to a thread-per-core architecture in the
future as the concurrency isn't defined.

A simple way I think about pluggability, we should have the interfaces and
clearly define expectations around them, we then write our tests against
such interfaces and then swap in the implementations to make sure that they
comply; this is a massive effort for each interface, but has good long term
benefits.

On Mon, Oct 25, 2021 at 2:39 AM bened...@apache.org 
wrote:

> Hi Jeremiah,
>
> My personal view is that work to modularise the codebase should be tied to
> specific use cases. If improved testing is the purpose of this work, I
> think it would help to include those improved tests that you plan to
> support as goals for the CEP.
>
> If on the other hand some of this work is primarily intended to enable
> certain features, I personally think it would be preferable to tie them to
> those features - perhaps with their own CEP?
>
>
> From: Jeremiah Jordan 
> Date: Friday, 22 October 2021 at 16:24
> To: Cassandra DEV 
> Subject: [DISCUSS] CEP-18: Improving Modularity
> Hi All,
> As has been seen with the work already started in CEP-10, increasing the
> modularity of our subsystems can improve their testability, and also the
> ability to try new implementations without breaking things.
>
> Our team has been working on doing this and CEP-18 has been created to
> propose adding more modularity to a few different subsystems.
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-18%3A+Improving+Modularity
>
> CASSANDRA-17044 has already been created for Schema Storage changes related
> to this work and more JIRAs and PRs are to follow for the other subsystems
> proposed in the CEP.
>
> Thanks,
> -Jeremiah Jordan
>


Re: [DISCUSS] CASSANDRA-15234

2021-10-25 Thread Ekaterina Dimitrova
Thank you Benedict.

Considering there were no objections I am closing the discussion and
getting back to work on the ticket itself. Thank you all. Have a great week
ahead.

On Wed, 20 Oct 2021 at 18:06, bened...@apache.org 
wrote:

> Thanks for moving this forwards Ekaterina.
>
> I think what we perhaps discovered is that there’s not really any
> consensus about how to best do config files. I think in this situation it’s
> best to defer to the one who’s actually putting in the time to _do_, so I
> am more than happy to defer to your decisions.
>
> I’m sure everyone is looking forward to the improved consistency of this
> work.
>
>
> From: Ekaterina Dimitrova 
> Date: Wednesday, 20 October 2021 at 22:27
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CASSANDRA-15234
> Hi everyone,
>
> I think it is time to summarize the discussion.
>
> First of all, thank you for all the valuable input, suggestions, concerns,
> and comments!
>
> The things that I believe we all agree on:
>
>-
>
>Simplicity for maintenance on our end - automation as much as possible
>so we don’t have to maintain more than one configuration file and our
>config is less prone to human errors while adding new features
>-
>
>Simplicity for our users - as less confusing and as simple as possible
>and having in mind the users’ toolset
>-
>
>Simplicity for testing and verification of the different config file
>formats
>
>
> It seems to me that most people want to see committed both proposed
> versions(feel free to correct me if I am wrong) with revision of the
> default values and potentially commented out all parameters that are not
> really mandatory to be changed. Also, versions with striped comments plus a
> way to maintain everything automatically, as much as possible.
>
> With that said it seems to me the current patch in CASSANDRA-15234 can be
> committed after rebase and addressing any outstanding review comments. The
> new version of cassandra.yaml, grouping the parameters can be added in a
> new ticket by me or anyone with free cycles for that. It will require
> additional work on the backward compatibility and the opportunity for
> Cassandra to operate on all of the current versions but it will be new
> additional opportunity which doesn’t disqualify the old ones so it seems as
> a fair game to be added at any point in time in the future as it won’t be a
> breaking change. We won’t replace anything. We will only add more options.
>
> If someone disagrees and wants to implement all possible options and
> functionalities at once, I will be happy to handover the work and try to
> find the time to provide feedback/reviews later.
>
> Please do not hesitate to correct me if I misunderstood something.
>
> I will leave this discussion open until Monday and if there are no
> objections I will continue with CASSANDRA-15234 as per my proposal.
>
> Best regards,
>
> Ekaterina
>
> On Fri, 10 Sep 2021 at 20:18, Patrick McFadin  wrote:
>
> > Ah, I feel like cassandra.yaml discussions are such an evergreen topic.
> >
> > This was something brought up a while back, but I remember years ago we
> > talked about emulating the config options that some other databases have
> > done. Providing different versions of the config for different
> approaches.
> > For instance, MySQL has had 'my-small.cnf' with just the bare minimum
> > config and restricted parameters for something like a laptop. A friendly
> > option for newcomers would be a clearly labeled  'cassandra-small.yaml'
> > with just the bare minimum and good comments. Then people new to
> Cassandra
> > wouldn't have a panic moment wondering if they have to know what
> concurrent
> > compactors are and how many you actually need? (Is there a right answer
> > even???) It's tackling the way operators approach config by the use case
> > they are trying to satisfy. Run one node on my laptop. Run a small
> cluster
> > on a budget cloud server. Run any size cluster on a ginormous server.
> >
> > Unfortunately, the cleaner solution would be how Apache HTTD solved it
> back
> > in the day with include files. It made config management much easier and
> > the overwhelm factor much lower. Yaml doesn't support it and it would all
> > have to be custom code in the Cassandra config loader. Not the best
> option
> > really.
> >
> > Back to the original question, I think Ekaterina's sectioned version
> could
> > be used for new operators because there is a lot to learn looking at the
> > comments.  Publish the following options:
> >
> > cassandra-small.yaml: Just the 'Quickstart' section
> > cassandra-medium.yaml: 'Quickstart' and 'Commonly used' with sane
> defaults
> > cassandra-advanced.yaml: Every section
> >
> > The addition is a similarly named JVM properties file .
> >
> > As somebody who has been using Cassandra for a while and would like to
> have
> > a more verbose version (especially for config management) Benedict's
> > grouped version is fantastic. Just one 

Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-25 Thread bened...@apache.org
Thanks Henrik for the additional context.

I’m not personally a fan of modularity only for modularity’s sake. Everything 
in software is a balancing act of competing priorities, and while pluggability 
supports certain use cases it can slow down development or prevent deeper 
integrations by preventing assumptions about how systems operate.

To be clear, I’m fully in favour of helping to enable your use cases, I just 
think it is important to make a decision for each refactor based on the merits 
and goals in question. If the justification is improved testing, then testing 
should be a core goal of the CEP. If it’s enabling a feature to be upstreamed 
later, I personally would prefer to tie the refactors to those features – which 
I hope will all find broad support for inclusion; certainly those I have heard 
of, I am eager to see arrive in Cassandra.

If the goal is to support entirely external features, we have to decide what 
kind of support we offer to these APIs, and this probably needs to be discussed 
on a per-API basis with the justification for pluggability weighed against any 
constraints this imposes on development. The most obvious example here is 
membership and schema, which I think is a primarily to support an external 
dependency but we expect this area of the codebase to be significantly revised 
over the coming months.


From: Henrik Ingo 
Date: Monday, 25 October 2021 at 14:52
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-18: Improving Modularity
Hi Benedict

This CEP is a bundle of APIs arising out of our recent work to re-architect
Cassandra into a more cloud native architecture. What our product marketing
has chosen to call "Serverless" is a variant of Cassandra where we have
separated compute from storage (coordinator vs data node), used S3-like
storage, and made various improvements to better support multi-tenancy in a
single Cassandra (Serverless) cluster. This whitepaper [1] explains this
work in detail for those of you interested to learn more. (Apologies that
it requires registration and the first page may at times sound a bit
marketingy, but it's really the most detailed report we have published so
far.)

[1] https://www.datastax.com/resources/whitepaper/astra-serverless

The above work was implemented in a way where by default a user can
continue to run Cassandra in the familiar "classic" way. The APIs
introduced by CEP-18 on the other hand allow alternate or additional
functionality to be provided, which in our case we have used to create a
"serverless" way of deploying a Cassandra cluster.

The logic behind proposing this bundle of APIs separately, is roughly for
these reasons:

The APIs touch existing code and functionality, so to minimize risk to the
next Cassandra release, it would make sense to try to complete merging this
work as early as possible in the development cycle. For the same reason,
keeping the new implementations out of this CEP allows us to focus review -
both of the CEP, and the eventual pull requests - on the APIs themselves,
whereas the related implementations (or plug-ins) would add to the scope
quite significantly. On the other hand non-default plugin functionality can
be added later with much lower risk.

Second, while it's completely fair to ask for context, why was this
particular refactoring or API done in the first place, the assumption for a
CEP like this one is that better defined interfaces, that are better
documented and come with better test coverage than existing code, should be
enough legs to stand on in itself. Also, in the best case a good API will
also enable other implementations than the one we had in mind when
developing the API, so we wouldn't want to tie the discussion too much into
the implementation that happened to be the first. (As an example of this
working out nicely, your own work in CASSANDRA-16926 was for you motivated
by enabling a new kind of testing, but it also just so happens it is the
same work that enables someone to implement remote file storage, which we
therefore could drop from this CEP-18.)

Conversely also, it was our expectation when proposing this CEP that
"better modularity" at least on a high level should be a fairly
straightforward conversation, while the actual plugins that make up our
"serverless" new architecture may reasonably ignite much more debate, or at
least questions as to how they work. As we have a backlog of several fairly
substantial CEPs lined up, we are trying to be very mindful of the
bandwidth of the developers on this list. For example, last week Jacek also
proposed CEP-17 for discussion. So we are trying to focus the discussion on
what's in CEP-17 and CEP-18 for now. (In addition I remember at least 2
CEPs that were discussed but not yet voted on. I don't know if this adds to
cognitive load for anyone else than myself.)

henrik

On Mon, Oct 25, 2021 at 12:39 PM bened...@apache.org 
wrote:

> Hi Jeremiah,
>
> My personal view is that work to modularise the codebase should be t

Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-25 Thread David Capwell
>
> ... I just think it is important to make a decision for each refactor
> based on the merits and goals in question. If the justification is improved
> testing, then testing should be a core goal of the CEP. If it’s enabling a
> feature to be upstreamed later, I personally would prefer to tie the
> refactors to those features – which I hope will all find broad support for
> inclusion; certainly those I have heard of, I am eager to see arrive in
> Cassandra.


 +1

If the goal is to support entirely external features, we have to decide
> what kind of support we offer to these APIs, and this probably needs to be
> discussed on a per-API basis with the justification for pluggability
> weighed against any constraints this imposes on development. The most
> obvious example here is membership and schema, which I think is a primarily
> to support an external dependency but we expect this area of the codebase
> to be significantly revised over the coming months.


Similar topic was brought up with indexes... we do not have a good way of
knowing what internal Java APIs must be supported and can't break, so
before adding new ones we need to figure that out.


On Mon, Oct 25, 2021 at 7:35 AM bened...@apache.org 
wrote:

> Thanks Henrik for the additional context.
>
> I’m not personally a fan of modularity only for modularity’s sake.
> Everything in software is a balancing act of competing priorities, and
> while pluggability supports certain use cases it can slow down development
> or prevent deeper integrations by preventing assumptions about how systems
> operate.
>
> To be clear, I’m fully in favour of helping to enable your use cases, I
> just think it is important to make a decision for each refactor based on
> the merits and goals in question. If the justification is improved testing,
> then testing should be a core goal of the CEP. If it’s enabling a feature
> to be upstreamed later, I personally would prefer to tie the refactors to
> those features – which I hope will all find broad support for inclusion;
> certainly those I have heard of, I am eager to see arrive in Cassandra.
>
> If the goal is to support entirely external features, we have to decide
> what kind of support we offer to these APIs, and this probably needs to be
> discussed on a per-API basis with the justification for pluggability
> weighed against any constraints this imposes on development. The most
> obvious example here is membership and schema, which I think is a primarily
> to support an external dependency but we expect this area of the codebase
> to be significantly revised over the coming months.
>
>
> From: Henrik Ingo 
> Date: Monday, 25 October 2021 at 14:52
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-18: Improving Modularity
> Hi Benedict
>
> This CEP is a bundle of APIs arising out of our recent work to re-architect
> Cassandra into a more cloud native architecture. What our product marketing
> has chosen to call "Serverless" is a variant of Cassandra where we have
> separated compute from storage (coordinator vs data node), used S3-like
> storage, and made various improvements to better support multi-tenancy in a
> single Cassandra (Serverless) cluster. This whitepaper [1] explains this
> work in detail for those of you interested to learn more. (Apologies that
> it requires registration and the first page may at times sound a bit
> marketingy, but it's really the most detailed report we have published so
> far.)
>
> [1] https://www.datastax.com/resources/whitepaper/astra-serverless
>
> The above work was implemented in a way where by default a user can
> continue to run Cassandra in the familiar "classic" way. The APIs
> introduced by CEP-18 on the other hand allow alternate or additional
> functionality to be provided, which in our case we have used to create a
> "serverless" way of deploying a Cassandra cluster.
>
> The logic behind proposing this bundle of APIs separately, is roughly for
> these reasons:
>
> The APIs touch existing code and functionality, so to minimize risk to the
> next Cassandra release, it would make sense to try to complete merging this
> work as early as possible in the development cycle. For the same reason,
> keeping the new implementations out of this CEP allows us to focus review -
> both of the CEP, and the eventual pull requests - on the APIs themselves,
> whereas the related implementations (or plug-ins) would add to the scope
> quite significantly. On the other hand non-default plugin functionality can
> be added later with much lower risk.
>
> Second, while it's completely fair to ask for context, why was this
> particular refactoring or API done in the first place, the assumption for a
> CEP like this one is that better defined interfaces, that are better
> documented and come with better test coverage than existing code, should be
> enough legs to stand on in itself. Also, in the best case a good API will
> also enable other implementations than the one we had in mind whe

Cassandra project biweekly status update 2021-10-25

2021-10-25 Thread Joshua McKenzie
I can't believe it's been two weeks already.

[New contributors getting started]
As a new contributor we recommend starting in one of two places: Failing
tests, or what we call "lhf" (low hanging fruit).

Query for failing tests:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=496&quickFilter=2252
Query for unassigned lhf:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2162&quickFilter=2160

In Cassandra failing tests often turn out to be incredibly interesting (and
tricky) to get to the bottom of. Right now we're at 18 unassigned failing
test tickets.

For unassigned lhf, we have 10 on 4.0.2 to pick from and 13 on 4.1.0;
please feel free to ping me directly if you want some help on where to get
started, or raise a flag in #cassandra-dev on slack and anyone will be
happy to help out!

[Dev list discussions in the past 14 days]
https://lists.apache.org/list.html?dev@cassandra.apache.org:lte=14d:

It's been a busy two weeks; probably why it's felt like time has flown.

Ekaterina got to a good close regarding our configuration and
CASSANDRA-15234, and Jacek looked to get a little clarification on what
qualifies for a CEP or not with CASSANDRA-11745 (paging by bytes). In
general, opening a [DISCUSS] thread here on the dev list and asking what
the community thinks on CEP vs. non is a fail-safe way to get clarity if
you're not sure. :)

There's been a little friction around the changes to circleci config with
CASSANDRA-16882; looks like maybe we closed things out and committed before
we were fully at consensus on the topic. All good actors here, and this is
definitely a "two way door" style change so let's keep working out the best
balance and/or scripting here to support multiple workflows (run on every
commit, require trigger manually, etc).

CEP-17 around an SSTable format API (CASSANDRA-17056) came forth last
Friday so I expect to see some interesting input on that one - if you have
some thoughts here please chime in as we've something of a history of
different storage engine perspectives with this project.

For CEP-18, there's a pretty large ongoing discussion around modularization
and whether we bundle CEP's for modularization with the artifacts that
prompt their creation or keep the API changes separate. This, much like
testing, maybe a little like vim vs. emacs, has been one of those topics
where there's multiple schools of thought and opinion here in the project
for years so working it out on a case-by-case basis is probably going to
continue to be our best bet. Please chime in if you have experience in this
domain (genericizing API's to support multiple implementations in mature
projects, etc) and have some wisdom to offer, or an opinion on the topic in
general.

And last but certainly not least, after some back and forth, the vote for
CEP-15 General Purpose Transactions passed! With the stipulation that the
API for our distributed transactions will be modular / pluggable along with
that work to allow for experimentation in the future with other algorithms.
Thanks to everyone involved for working through that; I think I can safely
speak for all of us when I say we're excited to see how the project evolves
in this space!

[Tickets in the past 14 days]
On the 4.0.2 front we've closed out 9 tickets, mostly relatively modest
bugfixes looks like (which is what we want to see on a .0.x line -
testament to the quality of 4.0).

For 4.1.0 we've closed out 13 issues; some more modest improvements and
adjustments, and some nodetool options to see stored hints and consistency
in output.

[Tickets that need attention]
No work is blocked on committers at this time.

We're up from 4 to 5 tickets on 4.0.2 that are in need of reviewers -
anyone with experience in any of these areas that has a few spare cycles
please take a look:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2161

Up from 16 to 17 on 4.1, and there's some repeat entrants here from two
weeks ago. If anyone has any ideas on the best way to link up reviewers to
this outstanding work please chime in on this thread; I can say personally
that rebasing something waiting for review over the CEP-10 merge was an
educational experience and great way to get to know some of the code
changes. ;) But in general, the faster we can go from patch available to
merged the more efficient for the project in terms of rebasing costs.

We have a large number of tickets that are "stalled", meaning they haven't
been touched in 30 days. Please check this filter and, if any of these are
assigned to you and their status doesn't reflect the current state (i.e. on
backburner, back to backlog, etc), please update the tickets as needed:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2171&quickFilter=2155

It's been a fun two weeks everyone, and thanks for your professionalism and
effort on some of the debates and discussions we've had. It's challenging
to evolve *any* sof

Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-25 Thread Jeremiah D Jordan
As Henrik said we have been refactoring access to these different internal APIs 
as part of some larger work.  For this CEP we pulled together a bunch of the 
smaller ones into one place, similar to the refactoring proposed in CEP-10, as 
we felt doing many small CEPs, one per module, would be less productive if 
there was support in the project in general for trying to standardize access to 
different sections of the code and start creating a more defined internal API.  
If there is consensus that it would be better to propose each change as its own 
CEP, or even just as single tickets without a CEP for these internal refactors, 
we can do that as well.  The CEP process is evolving as we go through these, so 
just trying to figure out the best way forward.

The currently proposed changes in CEP-18 should all include improved test 
coverage of the modules in question.  We have been developing them all with a 
requirement that all changes have at least %80 code coverage from sonar cloud 
jacoco reports.  We have also found and fixed some bugs in the existing code 
during this development work.

To me having some defined interfaces for interacting with different sections of 
the code is a huge boon for improving developer productivity going forward in 
the project.  Every place where we can reduce the amount of code reaching 
inside another module to get at a random internal class is a positive, as it 
prevents unknown side effects when changing that module when the person 
developing the new feature did not realize other parts of the code were 
depending on some current internal behavior that was not clearing part of the 
modules interface.

On the question of changing internal interfaces that I have seen in some other 
venues, I do not think creating such interfaces should prevent us from changing 
them as needed for future work.  I think having the interfaces actually 
improves on our ability to do so without breaking other parts of the code.  My 
suggestion would be that we try not to make such changes in patch releases if 
possible, but again I wouldn’t let that hold anything back.

So do people feel we should re-propose these as multiple CEP’s or just tickets? 
 Or do people prefer to have a discussion/vote on the idea of improving the 
modularity of the code base in general?

-Jeremiah

> On Oct 25, 2021, at 9:26 AM, bened...@apache.org wrote:
> 
> Thanks Henrik for the additional context.
> 
> I’m not personally a fan of modularity only for modularity’s sake. Everything 
> in software is a balancing act of competing priorities, and while 
> pluggability supports certain use cases it can slow down development or 
> prevent deeper integrations by preventing assumptions about how systems 
> operate.
> 
> To be clear, I’m fully in favour of helping to enable your use cases, I just 
> think it is important to make a decision for each refactor based on the 
> merits and goals in question. If the justification is improved testing, then 
> testing should be a core goal of the CEP. If it’s enabling a feature to be 
> upstreamed later, I personally would prefer to tie the refactors to those 
> features – which I hope will all find broad support for inclusion; certainly 
> those I have heard of, I am eager to see arrive in Cassandra.
> 
> If the goal is to support entirely external features, we have to decide what 
> kind of support we offer to these APIs, and this probably needs to be 
> discussed on a per-API basis with the justification for pluggability weighed 
> against any constraints this imposes on development. The most obvious example 
> here is membership and schema, which I think is a primarily to support an 
> external dependency but we expect this area of the codebase to be 
> significantly revised over the coming months.
> 
> 
> From: Henrik Ingo mailto:henrik.i...@datastax.com>>
> Date: Monday, 25 October 2021 at 14:52
> To: dev@cassandra.apache.org  
> mailto:dev@cassandra.apache.org>>
> Subject: Re: [DISCUSS] CEP-18: Improving Modularity
> Hi Benedict
> 
> This CEP is a bundle of APIs arising out of our recent work to re-architect
> Cassandra into a more cloud native architecture. What our product marketing
> has chosen to call "Serverless" is a variant of Cassandra where we have
> separated compute from storage (coordinator vs data node), used S3-like
> storage, and made various improvements to better support multi-tenancy in a
> single Cassandra (Serverless) cluster. This whitepaper [1] explains this
> work in detail for those of you interested to learn more. (Apologies that
> it requires registration and the first page may at times sound a bit
> marketingy, but it's really the most detailed report we have published so
> far.)
> 
> [1] https://www.datastax.com/resources/whitepaper/astra-serverless
> 
> The above work was implemented in a way where by default a user can
> continue to run Cassandra in the familiar "classic" way. The APIs
> introduced by CEP-18 o