[DISCUSS] How to implement backward compatibility (CASSANDRA-17048)

2021-10-26 Thread Jacek Lewandowski
Hi,

In short, we are discussing UUID based sstable generation identifiers in 
https://issues.apache.org/jira/browse/CASSANDRA-17048. 

The question which somehow hold us is support for downgrading. Long story 
short, when we generate new sstables with uuid based ids, they are not readable 
by older C* versions. 

1. should we implement a downgrade tool? (it may be quite complex)
2. should we let users enable the new uuid ids later when they are sure they 
will not downgrade in the future?

Thanks,
Jacek



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)

2021-10-26 Thread Bowen Song
Personally, I would prefer a transition period in which the new feature 
is not enabled by default. This not only makes version upgrading easier, 
it also allows the user to stay on the old behaviour if they experience 
any issue with the new feature (e.g.: bugs in the new feature, or edge 
use cases / 3rd party tools depending on the old behaviour) until the 
issue is resolved.


On 26/10/2021 10:21, Jacek Lewandowski wrote:

Hi,

In short, we are discussing UUID based sstable generation identifiers in 
https://issues.apache.org/jira/browse/CASSANDRA-17048.

The question which somehow hold us is support for downgrading. Long story 
short, when we generate new sstables with uuid based ids, they are not readable 
by older C* versions.

1. should we implement a downgrade tool? (it may be quite complex)
2. should we let users enable the new uuid ids later when they are sure they 
will not downgrade in the future?

Thanks,
Jacek



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)

2021-10-26 Thread Jacek Lewandowski
Though, the user is unable to test the new feature without enabling it. And
when it is enabled, the user is unable to revert it.

- - -- --- -  -
Jacek Lewandowski


On Tue, Oct 26, 2021 at 12:54 PM Bowen Song  wrote:

> Personally, I would prefer a transition period in which the new feature
> is not enabled by default. This not only makes version upgrading easier,
> it also allows the user to stay on the old behaviour if they experience
> any issue with the new feature (e.g.: bugs in the new feature, or edge
> use cases / 3rd party tools depending on the old behaviour) until the
> issue is resolved.
>
> On 26/10/2021 10:21, Jacek Lewandowski wrote:
> > Hi,
> >
> > In short, we are discussing UUID based sstable generation identifiers in
> https://issues.apache.org/jira/browse/CASSANDRA-17048.
> >
> > The question which somehow hold us is support for downgrading. Long
> story short, when we generate new sstables with uuid based ids, they are
> not readable by older C* versions.
> >
> > 1. should we implement a downgrade tool? (it may be quite complex)
> > 2. should we let users enable the new uuid ids later when they are sure
> they will not downgrade in the future?
> >
> > Thanks,
> > Jacek
> >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)

2021-10-26 Thread Bowen Song
The user will be able to test the new feature in a testing environment 
and not push the changes to their production environment if they are not 
satisfied.


On 26/10/2021 12:00, Jacek Lewandowski wrote:

Though, the user is unable to test the new feature without enabling it. And
when it is enabled, the user is unable to revert it.

- - -- --- -  -
Jacek Lewandowski


On Tue, Oct 26, 2021 at 12:54 PM Bowen Song  wrote:


Personally, I would prefer a transition period in which the new feature
is not enabled by default. This not only makes version upgrading easier,
it also allows the user to stay on the old behaviour if they experience
any issue with the new feature (e.g.: bugs in the new feature, or edge
use cases / 3rd party tools depending on the old behaviour) until the
issue is resolved.

On 26/10/2021 10:21, Jacek Lewandowski wrote:

Hi,

In short, we are discussing UUID based sstable generation identifiers in

https://issues.apache.org/jira/browse/CASSANDRA-17048.

The question which somehow hold us is support for downgrading. Long

story short, when we generate new sstables with uuid based ids, they are
not readable by older C* versions.

1. should we implement a downgrade tool? (it may be quite complex)
2. should we let users enable the new uuid ids later when they are sure

they will not downgrade in the future?

Thanks,
Jacek



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)

2021-10-26 Thread bened...@apache.org
I think it is probably acceptable to prevent downgrades once a new feature is 
enabled, as the exposure risk is limited to that one feature. The user can test 
the new version to ensure everything else works satisfactorily before 
committing to this one feature.

A downgrade tool would also be possible to produce, but probably the additional 
utility is limited.

I think this particular feature is probably easy enough to maintain as 
permanently optional, simply maintaining two system tables: one for the old 
generation format, one for the new. So long as the user doesn’t use the new 
format, it remains forever downgradeable. Though perhaps one day we may want to 
force users to migrate, I don’t think there’s any rush, and the important thing 
to avoid is providing users no version buffer to escape new bugs – if a major 
version later we force upgrade, then they have a whole range of major versions 
to downgrade to that still support this feature (but perhaps avoid some other 
new issue)



From: Jacek Lewandowski 
Date: Tuesday, 26 October 2021 at 12:01
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)
Though, the user is unable to test the new feature without enabling it. And
when it is enabled, the user is unable to revert it.

- - -- --- -  -
Jacek Lewandowski


On Tue, Oct 26, 2021 at 12:54 PM Bowen Song  wrote:

> Personally, I would prefer a transition period in which the new feature
> is not enabled by default. This not only makes version upgrading easier,
> it also allows the user to stay on the old behaviour if they experience
> any issue with the new feature (e.g.: bugs in the new feature, or edge
> use cases / 3rd party tools depending on the old behaviour) until the
> issue is resolved.
>
> On 26/10/2021 10:21, Jacek Lewandowski wrote:
> > Hi,
> >
> > In short, we are discussing UUID based sstable generation identifiers in
> https://issues.apache.org/jira/browse/CASSANDRA-17048.
> >
> > The question which somehow hold us is support for downgrading. Long
> story short, when we generate new sstables with uuid based ids, they are
> not readable by older C* versions.
> >
> > 1. should we implement a downgrade tool? (it may be quite complex)
> > 2. should we let users enable the new uuid ids later when they are sure
> they will not downgrade in the future?
> >
> > Thanks,
> > Jacek
> >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-26 Thread Joshua McKenzie
>
> To me having some defined interfaces for interacting with different
> sections of the code is a huge boon for improving developer productivity
> going forward in the project.  Every place where we can reduce the amount
> of code reaching inside another module to get at a random internal class is
> a positive,

I've long been of the opinion that the benefits outweigh the costs of
having clear interface points between major subsystems in a codebase. I'm
not particularly sympathetic to the concerns about friction on making
changes to internal API's since modern IDE tooling makes this a trivial
exercise, however I _am_ quite sympathetic to the concerns about
introducing friction against deeper integrations between subsystems.

That said, we have a history on the project of being somewhat hot and cold
when it comes to our approach to performance testing; I think our low
hanging fruit as a project revolves more around discipline and
reproducibility on knowing where our performance is today and making
changes with an eye to that rather than keeping open the flexibility of
tightly coupling subsystems through their implementations.

With the modern runtime environment shifting so much toward
containerization I can't help but think smaller, clearly modularized
components are more resilient against a rapidly evolving runtime
environment and more sympathetic to the constrained resource environments
they run in, as well as more classically optimizable in their own right.

I air all this just to contribute perspective to the discussion; all that
said, I think refactoring APIs as a pure reflection of what the DB is doing
today just risks ossifying something that grew up organically and probably
isn't going to do us any favors, so having a use-case (or better yet a few
implementations) we're deriving an interface from, or targeting a more
testable / mockable structure plus introducing those tests should give us
guidance to improve the route we go.

 ~Josh


On Mon, Oct 25, 2021 at 4:22 PM Jeremiah D Jordan 
wrote:

> As Henrik said we have been refactoring access to these different internal
> APIs as part of some larger work.  For this CEP we pulled together a bunch
> of the smaller ones into one place, similar to the refactoring proposed in
> CEP-10, as we felt doing many small CEPs, one per module, would be less
> productive if there was support in the project in general for trying to
> standardize access to different sections of the code and start creating a
> more defined internal API.  If there is consensus that it would be better
> to propose each change as its own CEP, or even just as single tickets
> without a CEP for these internal refactors, we can do that as well.  The
> CEP process is evolving as we go through these, so just trying to figure
> out the best way forward.
>
> The currently proposed changes in CEP-18 should all include improved test
> coverage of the modules in question.  We have been developing them all with
> a requirement that all changes have at least %80 code coverage from sonar
> cloud jacoco reports.  We have also found and fixed some bugs in the
> existing code during this development work.
>
> To me having some defined interfaces for interacting with different
> sections of the code is a huge boon for improving developer productivity
> going forward in the project.  Every place where we can reduce the amount
> of code reaching inside another module to get at a random internal class is
> a positive, as it prevents unknown side effects when changing that module
> when the person developing the new feature did not realize other parts of
> the code were depending on some current internal behavior that was not
> clearing part of the modules interface.
>
> On the question of changing internal interfaces that I have seen in some
> other venues, I do not think creating such interfaces should prevent us
> from changing them as needed for future work.  I think having the
> interfaces actually improves on our ability to do so without breaking other
> parts of the code.  My suggestion would be that we try not to make such
> changes in patch releases if possible, but again I wouldn’t let that hold
> anything back.
>
> So do people feel we should re-propose these as multiple CEP’s or just
> tickets?  Or do people prefer to have a discussion/vote on the idea of
> improving the modularity of the code base in general?
>
> -Jeremiah
>
> > On Oct 25, 2021, at 9:26 AM, bened...@apache.org wrote:
> >
> > Thanks Henrik for the additional context.
> >
> > I’m not personally a fan of modularity only for modularity’s sake.
> Everything in software is a balancing act of competing priorities, and
> while pluggability supports certain use cases it can slow down development
> or prevent deeper integrations by preventing assumptions about how systems
> operate.
> >
> > To be clear, I’m fully in favour of helping to enable your use cases, I
> just think it is important to make a decision for each refactor based on

Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)

2021-10-26 Thread Joshua McKenzie
+1 to Benedict's perspective here. Supporting both sstable ID paradigms
should be relatively trivial and low cost to maintain going forward.

On Tue, Oct 26, 2021 at 7:54 AM bened...@apache.org 
wrote:

> I think it is probably acceptable to prevent downgrades once a new feature
> is enabled, as the exposure risk is limited to that one feature. The user
> can test the new version to ensure everything else works satisfactorily
> before committing to this one feature.
>
> A downgrade tool would also be possible to produce, but probably the
> additional utility is limited.
>
> I think this particular feature is probably easy enough to maintain as
> permanently optional, simply maintaining two system tables: one for the old
> generation format, one for the new. So long as the user doesn’t use the new
> format, it remains forever downgradeable. Though perhaps one day we may
> want to force users to migrate, I don’t think there’s any rush, and the
> important thing to avoid is providing users no version buffer to escape new
> bugs – if a major version later we force upgrade, then they have a whole
> range of major versions to downgrade to that still support this feature
> (but perhaps avoid some other new issue)
>
>
>
> From: Jacek Lewandowski 
> Date: Tuesday, 26 October 2021 at 12:01
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] How to implement backward compatibility
> (CASSANDRA-17048)
> Though, the user is unable to test the new feature without enabling it. And
> when it is enabled, the user is unable to revert it.
>
> - - -- --- -  -
> Jacek Lewandowski
>
>
> On Tue, Oct 26, 2021 at 12:54 PM Bowen Song  wrote:
>
> > Personally, I would prefer a transition period in which the new feature
> > is not enabled by default. This not only makes version upgrading easier,
> > it also allows the user to stay on the old behaviour if they experience
> > any issue with the new feature (e.g.: bugs in the new feature, or edge
> > use cases / 3rd party tools depending on the old behaviour) until the
> > issue is resolved.
> >
> > On 26/10/2021 10:21, Jacek Lewandowski wrote:
> > > Hi,
> > >
> > > In short, we are discussing UUID based sstable generation identifiers
> in
> > https://issues.apache.org/jira/browse/CASSANDRA-17048.
> > >
> > > The question which somehow hold us is support for downgrading. Long
> > story short, when we generate new sstables with uuid based ids, they are
> > not readable by older C* versions.
> > >
> > > 1. should we implement a downgrade tool? (it may be quite complex)
> > > 2. should we let users enable the new uuid ids later when they are sure
> > they will not downgrade in the future?
> > >
> > > Thanks,
> > > Jacek
> > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-26 Thread bened...@apache.org
> I'm not particularly sympathetic to the concerns about friction on making 
> changes to internal API's since modern IDE tooling makes this a trivial 
> exercise

We’re getting abstract here, so this isn’t a rebuttal or even tied to strongly 
this particularly discussion, but to express my point more clearly.

We don’t abstract everything in the codebase, and in fact in general we (or at 
least, I) try to keep things concrete as long as there’s no reason to abstract 
them, because this is usually easier to reason about and lower overhead to 
modify. This is true even on the single class level, so of course it happens at 
the module level. This isn’t about the IDE refactoring, but the cognitive 
burden of reasoning simultaneously about the concrete class and the 
abstraction, and how they relate.

The problem with premature abstraction, and particularly when multiple 
implementations start appearing, is that you have to start formalising the 
abstractions in ways that permit you to reason only about the abstraction. This 
necessarily means eschewing some knowledge of how the concrete 
implementation(s) work. This may prevent very useful simplifications for how 
you interact with a specific concrete implementation, as we have to code to the 
API. This may prevent optimisations. This may also introduce additional 
complexity when either implementing the abstraction or when reasoning about the 
actions you are performing against it, where often you may not entirely ignore 
the concrete implementation (due to imperfect or ambiguous API specifications), 
so you must now consider if you are compatible with both the abstraction and 
any known concrete implementations.

These are all additional burdens, but we often pay the cost for perceived 
benefits.

It seems to me though that this discussion is conflating 
modularisation/pluggability with decoupling, which is a benefit we might gain 
in return for these additional costs. To me this is a distinct problem, 
however. It’s quite possible to modularise and yet tightly couple, though 
usually it will break tight coupling. But breaking tight coupling doesn’t 
require modularisation, and certainly doesn’t require pluggability.

To bring it back to this discussion, the intent of a piece of work always 
drives the outcome, and in my opinion it is best to always consider a work in 
its actual context. The primary purpose of this work is pluggability, and so 
this will inform the API modifications. A straightforward goal of reducing 
tight coupling in the codebase would likely approach this problem differently. 
None of this is a bad thing, just in my opinion the nature of development.

That said, I’m broadly happy to see this work go ahead. I would prefer to split 
the conversations out into their driving projects for the aforementioned 
reasons, but I wouldn’t veto the proposal on that basis. It would be nice to 
see others’ opinions about this.

The only sub-proposal I’m particularly unsure about is 17059, which doesn’t 
seem to increase modularity at all. It looks to be a kind of plugin hook, and 
IMO should definitely be addressed separately. Perhaps a simple DISCUSS thread 
and its Jira will suffice?


From: Joshua McKenzie 
Date: Tuesday, 26 October 2021 at 19:16
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-18: Improving Modularity
>
> To me having some defined interfaces for interacting with different
> sections of the code is a huge boon for improving developer productivity
> going forward in the project.  Every place where we can reduce the amount
> of code reaching inside another module to get at a random internal class is
> a positive,

I've long been of the opinion that the benefits outweigh the costs of
having clear interface points between major subsystems in a codebase. I'm
not particularly sympathetic to the concerns about friction on making
changes to internal API's since modern IDE tooling makes this a trivial
exercise, however I _am_ quite sympathetic to the concerns about
introducing friction against deeper integrations between subsystems.

That said, we have a history on the project of being somewhat hot and cold
when it comes to our approach to performance testing; I think our low
hanging fruit as a project revolves more around discipline and
reproducibility on knowing where our performance is today and making
changes with an eye to that rather than keeping open the flexibility of
tightly coupling subsystems through their implementations.

With the modern runtime environment shifting so much toward
containerization I can't help but think smaller, clearly modularized
components are more resilient against a rapidly evolving runtime
environment and more sympathetic to the constrained resource environments
they run in, as well as more classically optimizable in their own right.

I air all this just to contribute perspective to the discussion; all that
said, I think refactoring APIs as a pure reflection of what the DB is doing
today just ris

Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-26 Thread Stefan Miklosovic
I am all for good extensibility / interfaces and so on, however I am
afraid that this might actually break a lot of things if enough
attention is not paid. For example, over all these years, the
community around Cassandra tooling is somehow used to the "mess",
placing one fat jar to the class path and it somehow works. Then we
just cherry-pick what we want and we are all (reasonably) happy if we
do not find ourselves doing some reflection because we just need this
private final field to be public and non-final and for some reason a
developer was thinking it is actually a good idea to do it like that
...

Even these ceps are not about modularity on a build system level (as
Cassandra would logically consist of different jars) (if I understand
that correctly), if changes are introduced e.g. in 4.1, then 4.2 then
4.3 and so on, the tooling which expects that it will work for all
point releases might have to accommodate to each of these releases
which is quite a bummer. There is not always a bandwidth to support
each individual version of a tool. Maybe one for 4, 3.11, 3.0 and
that's it. I just want to stress the fact that from the users' and
integrators' perspective it has to be a smooth transition. So yes,
extend, but do not break, please.

Before any big refactoring, I would actually spend some time on
removing what is not necessary. If one digs deeper, Cassandra is
living with a lot of legacy code. For example, I was removing support
for Windows which is taking away a lot of stuff with it. I believe
there are many places where we are just taking a lot of baggage with
us because ...

Snapshot subsystem we are looking into together with Paulo Motta is
another example of how weirdly wired a subsystem might be. It is all
over the place and it is quite discouraging to implement something new
without cleaning it all up first because it just does not make sense
to add on top of that anymore.

The way I see it is that while working on this "extensibility and
interfaces work" we should probably also focus on getting rid of what
is obsolete and simplify and unify the codebase where it smells.

I am pretty confident that extending / interfacing would be way easier too.

If this is a side effect of these CEPs I am all over it.

On Tue, 26 Oct 2021 at 20:16, Joshua McKenzie  wrote:
>
> >
> > To me having some defined interfaces for interacting with different
> > sections of the code is a huge boon for improving developer productivity
> > going forward in the project.  Every place where we can reduce the amount
> > of code reaching inside another module to get at a random internal class is
> > a positive,
>
> I've long been of the opinion that the benefits outweigh the costs of
> having clear interface points between major subsystems in a codebase. I'm
> not particularly sympathetic to the concerns about friction on making
> changes to internal API's since modern IDE tooling makes this a trivial
> exercise, however I _am_ quite sympathetic to the concerns about
> introducing friction against deeper integrations between subsystems.
>
> That said, we have a history on the project of being somewhat hot and cold
> when it comes to our approach to performance testing; I think our low
> hanging fruit as a project revolves more around discipline and
> reproducibility on knowing where our performance is today and making
> changes with an eye to that rather than keeping open the flexibility of
> tightly coupling subsystems through their implementations.
>
> With the modern runtime environment shifting so much toward
> containerization I can't help but think smaller, clearly modularized
> components are more resilient against a rapidly evolving runtime
> environment and more sympathetic to the constrained resource environments
> they run in, as well as more classically optimizable in their own right.
>
> I air all this just to contribute perspective to the discussion; all that
> said, I think refactoring APIs as a pure reflection of what the DB is doing
> today just risks ossifying something that grew up organically and probably
> isn't going to do us any favors, so having a use-case (or better yet a few
> implementations) we're deriving an interface from, or targeting a more
> testable / mockable structure plus introducing those tests should give us
> guidance to improve the route we go.
>
>  ~Josh
>
>
> On Mon, Oct 25, 2021 at 4:22 PM Jeremiah D Jordan 
> wrote:
>
> > As Henrik said we have been refactoring access to these different internal
> > APIs as part of some larger work.  For this CEP we pulled together a bunch
> > of the smaller ones into one place, similar to the refactoring proposed in
> > CEP-10, as we felt doing many small CEPs, one per module, would be less
> > productive if there was support in the project in general for trying to
> > standardize access to different sections of the code and start creating a
> > more defined internal API.  If there is consensus that it would be better
> > to propose each change as 

Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-26 Thread Jeremiah D Jordan
> The only sub-proposal I’m particularly unsure about is 17059, which doesn’t 
> seem to increase modularity at all. It looks to be a kind of plugin hook, and 
> IMO should definitely be addressed separately. Perhaps a simple DISCUSS 
> thread and its Jira will suffice?

Ok.  I will remove that one from the CEP to discuss separately.

> On Oct 26, 2021, at 2:32 PM, bened...@apache.org wrote:
> 
>> I'm not particularly sympathetic to the concerns about friction on making 
>> changes to internal API's since modern IDE tooling makes this a trivial 
>> exercise
> 
> We’re getting abstract here, so this isn’t a rebuttal or even tied to 
> strongly this particularly discussion, but to express my point more clearly.
> 
> We don’t abstract everything in the codebase, and in fact in general we (or 
> at least, I) try to keep things concrete as long as there’s no reason to 
> abstract them, because this is usually easier to reason about and lower 
> overhead to modify. This is true even on the single class level, so of course 
> it happens at the module level. This isn’t about the IDE refactoring, but the 
> cognitive burden of reasoning simultaneously about the concrete class and the 
> abstraction, and how they relate.
> 
> The problem with premature abstraction, and particularly when multiple 
> implementations start appearing, is that you have to start formalising the 
> abstractions in ways that permit you to reason only about the abstraction. 
> This necessarily means eschewing some knowledge of how the concrete 
> implementation(s) work. This may prevent very useful simplifications for how 
> you interact with a specific concrete implementation, as we have to code to 
> the API. This may prevent optimisations. This may also introduce additional 
> complexity when either implementing the abstraction or when reasoning about 
> the actions you are performing against it, where often you may not entirely 
> ignore the concrete implementation (due to imperfect or ambiguous API 
> specifications), so you must now consider if you are compatible with both the 
> abstraction and any known concrete implementations.
> 
> These are all additional burdens, but we often pay the cost for perceived 
> benefits.
> 
> It seems to me though that this discussion is conflating 
> modularisation/pluggability with decoupling, which is a benefit we might gain 
> in return for these additional costs. To me this is a distinct problem, 
> however. It’s quite possible to modularise and yet tightly couple, though 
> usually it will break tight coupling. But breaking tight coupling doesn’t 
> require modularisation, and certainly doesn’t require pluggability.
> 
> To bring it back to this discussion, the intent of a piece of work always 
> drives the outcome, and in my opinion it is best to always consider a work in 
> its actual context. The primary purpose of this work is pluggability, and so 
> this will inform the API modifications. A straightforward goal of reducing 
> tight coupling in the codebase would likely approach this problem 
> differently. None of this is a bad thing, just in my opinion the nature of 
> development.
> 
> That said, I’m broadly happy to see this work go ahead. I would prefer to 
> split the conversations out into their driving projects for the 
> aforementioned reasons, but I wouldn’t veto the proposal on that basis. It 
> would be nice to see others’ opinions about this.
> 
> The only sub-proposal I’m particularly unsure about is 17059, which doesn’t 
> seem to increase modularity at all. It looks to be a kind of plugin hook, and 
> IMO should definitely be addressed separately. Perhaps a simple DISCUSS 
> thread and its Jira will suffice?
> 
> 
> From: Joshua McKenzie 
> Date: Tuesday, 26 October 2021 at 19:16
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-18: Improving Modularity
>> 
>> To me having some defined interfaces for interacting with different
>> sections of the code is a huge boon for improving developer productivity
>> going forward in the project.  Every place where we can reduce the amount
>> of code reaching inside another module to get at a random internal class is
>> a positive,
> 
> I've long been of the opinion that the benefits outweigh the costs of
> having clear interface points between major subsystems in a codebase. I'm
> not particularly sympathetic to the concerns about friction on making
> changes to internal API's since modern IDE tooling makes this a trivial
> exercise, however I _am_ quite sympathetic to the concerns about
> introducing friction against deeper integrations between subsystems.
> 
> That said, we have a history on the project of being somewhat hot and cold
> when it comes to our approach to performance testing; I think our low
> hanging fruit as a project revolves more around discipline and
> reproducibility on knowing where our performance is today and making
> changes with an eye to that rather than keeping open the flexibility of
> tightly coupl

Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-26 Thread bened...@apache.org
> we should probably also focus on getting rid of what is obsolete and simplify 
> and unify the codebase where it smells.

Agreed. Lots of the codebase has had a spring clean over the past couple of 
years, but lots hasn’t. Some areas are very long in the tooth and could do with 
some heavy pruning.

From: Stefan Miklosovic 
Date: Tuesday, 26 October 2021 at 20:40
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-18: Improving Modularity
I am all for good extensibility / interfaces and so on, however I am
afraid that this might actually break a lot of things if enough
attention is not paid. For example, over all these years, the
community around Cassandra tooling is somehow used to the "mess",
placing one fat jar to the class path and it somehow works. Then we
just cherry-pick what we want and we are all (reasonably) happy if we
do not find ourselves doing some reflection because we just need this
private final field to be public and non-final and for some reason a
developer was thinking it is actually a good idea to do it like that
...

Even these ceps are not about modularity on a build system level (as
Cassandra would logically consist of different jars) (if I understand
that correctly), if changes are introduced e.g. in 4.1, then 4.2 then
4.3 and so on, the tooling which expects that it will work for all
point releases might have to accommodate to each of these releases
which is quite a bummer. There is not always a bandwidth to support
each individual version of a tool. Maybe one for 4, 3.11, 3.0 and
that's it. I just want to stress the fact that from the users' and
integrators' perspective it has to be a smooth transition. So yes,
extend, but do not break, please.

Before any big refactoring, I would actually spend some time on
removing what is not necessary. If one digs deeper, Cassandra is
living with a lot of legacy code. For example, I was removing support
for Windows which is taking away a lot of stuff with it. I believe
there are many places where we are just taking a lot of baggage with
us because ...

Snapshot subsystem we are looking into together with Paulo Motta is
another example of how weirdly wired a subsystem might be. It is all
over the place and it is quite discouraging to implement something new
without cleaning it all up first because it just does not make sense
to add on top of that anymore.

The way I see it is that while working on this "extensibility and
interfaces work" we should probably also focus on getting rid of what
is obsolete and simplify and unify the codebase where it smells.

I am pretty confident that extending / interfacing would be way easier too.

If this is a side effect of these CEPs I am all over it.

On Tue, 26 Oct 2021 at 20:16, Joshua McKenzie  wrote:
>
> >
> > To me having some defined interfaces for interacting with different
> > sections of the code is a huge boon for improving developer productivity
> > going forward in the project.  Every place where we can reduce the amount
> > of code reaching inside another module to get at a random internal class is
> > a positive,
>
> I've long been of the opinion that the benefits outweigh the costs of
> having clear interface points between major subsystems in a codebase. I'm
> not particularly sympathetic to the concerns about friction on making
> changes to internal API's since modern IDE tooling makes this a trivial
> exercise, however I _am_ quite sympathetic to the concerns about
> introducing friction against deeper integrations between subsystems.
>
> That said, we have a history on the project of being somewhat hot and cold
> when it comes to our approach to performance testing; I think our low
> hanging fruit as a project revolves more around discipline and
> reproducibility on knowing where our performance is today and making
> changes with an eye to that rather than keeping open the flexibility of
> tightly coupling subsystems through their implementations.
>
> With the modern runtime environment shifting so much toward
> containerization I can't help but think smaller, clearly modularized
> components are more resilient against a rapidly evolving runtime
> environment and more sympathetic to the constrained resource environments
> they run in, as well as more classically optimizable in their own right.
>
> I air all this just to contribute perspective to the discussion; all that
> said, I think refactoring APIs as a pure reflection of what the DB is doing
> today just risks ossifying something that grew up organically and probably
> isn't going to do us any favors, so having a use-case (or better yet a few
> implementations) we're deriving an interface from, or targeting a more
> testable / mockable structure plus introducing those tests should give us
> guidance to improve the route we go.
>
>  ~Josh
>
>
> On Mon, Oct 25, 2021 at 4:22 PM Jeremiah D Jordan 
> wrote:
>
> > As Henrik said we have been refactoring access to these different internal
> > APIs as part of some larger work.  For thi

Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)

2021-10-26 Thread Jacek Lewandowski
Yes, those explanations sound very reasonable to me as well and I'll push the 
implementation soon.

Thank you guys

On 2021/10/26 18:21:44, Joshua McKenzie  wrote: 
> +1 to Benedict's perspective here. Supporting both sstable ID paradigms
> should be relatively trivial and low cost to maintain going forward.
> 
> On Tue, Oct 26, 2021 at 7:54 AM bened...@apache.org 
> wrote:
> 
> > I think it is probably acceptable to prevent downgrades once a new feature
> > is enabled, as the exposure risk is limited to that one feature. The user
> > can test the new version to ensure everything else works satisfactorily
> > before committing to this one feature.
> >
> > A downgrade tool would also be possible to produce, but probably the
> > additional utility is limited.
> >
> > I think this particular feature is probably easy enough to maintain as
> > permanently optional, simply maintaining two system tables: one for the old
> > generation format, one for the new. So long as the user doesn’t use the new
> > format, it remains forever downgradeable. Though perhaps one day we may
> > want to force users to migrate, I don’t think there’s any rush, and the
> > important thing to avoid is providing users no version buffer to escape new
> > bugs – if a major version later we force upgrade, then they have a whole
> > range of major versions to downgrade to that still support this feature
> > (but perhaps avoid some other new issue)
> >
> >
> >
> > From: Jacek Lewandowski 
> > Date: Tuesday, 26 October 2021 at 12:01
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] How to implement backward compatibility
> > (CASSANDRA-17048)
> > Though, the user is unable to test the new feature without enabling it. And
> > when it is enabled, the user is unable to revert it.
> >
> > - - -- --- -  -
> > Jacek Lewandowski
> >
> >
> > On Tue, Oct 26, 2021 at 12:54 PM Bowen Song  wrote:
> >
> > > Personally, I would prefer a transition period in which the new feature
> > > is not enabled by default. This not only makes version upgrading easier,
> > > it also allows the user to stay on the old behaviour if they experience
> > > any issue with the new feature (e.g.: bugs in the new feature, or edge
> > > use cases / 3rd party tools depending on the old behaviour) until the
> > > issue is resolved.
> > >
> > > On 26/10/2021 10:21, Jacek Lewandowski wrote:
> > > > Hi,
> > > >
> > > > In short, we are discussing UUID based sstable generation identifiers
> > in
> > > https://issues.apache.org/jira/browse/CASSANDRA-17048.
> > > >
> > > > The question which somehow hold us is support for downgrading. Long
> > > story short, when we generate new sstables with uuid based ids, they are
> > > not readable by older C* versions.
> > > >
> > > > 1. should we implement a downgrade tool? (it may be quite complex)
> > > > 2. should we let users enable the new uuid ids later when they are sure
> > > they will not downgrade in the future?
> > > >
> > > > Thanks,
> > > > Jacek
> > > >
> > > >
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-26 Thread Dinesh Joshi
> On Oct 25, 2021, at 1:22 PM, Jeremiah D Jordan  wrote:
> 
> The currently proposed changes in CEP-18 should all include improved test 
> coverage of the modules in question.  We have been developing them all with a 
> requirement that all changes have at least %80 code coverage from sonar cloud 
> jacoco reports.  We have also found and fixed some bugs in the existing code 
> during this development work.

This is great! We, as a project, should encourage improved test code coverage. 
So I welcome this change.

> So do people feel we should re-propose these as multiple CEP’s or just 
> tickets?  Or do people prefer to have a discussion/vote on the idea of 
> improving the modularity of the code base in general?

My personal preference would be to see this work appear as individual CEPs or 
even JIRA tickets with discussions but definitely not one giant CEP that is 
pulling together a lot of different changes.

I really like the idea of building pluggable modular components. However, I am 
concerned about few things.

1. Performance regression.
2. Breaking backward compatibility for our users & tools.
3. Interfaces with single implementation.

I would like to ensure that we are mindful of these concerns while making big 
refactors.

Thanks,

Dinesh