[DISCUSS] How to implement backward compatibility (CASSANDRA-17048)
Hi, In short, we are discussing UUID based sstable generation identifiers in https://issues.apache.org/jira/browse/CASSANDRA-17048. The question which somehow hold us is support for downgrading. Long story short, when we generate new sstables with uuid based ids, they are not readable by older C* versions. 1. should we implement a downgrade tool? (it may be quite complex) 2. should we let users enable the new uuid ids later when they are sure they will not downgrade in the future? Thanks, Jacek - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)
Personally, I would prefer a transition period in which the new feature is not enabled by default. This not only makes version upgrading easier, it also allows the user to stay on the old behaviour if they experience any issue with the new feature (e.g.: bugs in the new feature, or edge use cases / 3rd party tools depending on the old behaviour) until the issue is resolved. On 26/10/2021 10:21, Jacek Lewandowski wrote: Hi, In short, we are discussing UUID based sstable generation identifiers in https://issues.apache.org/jira/browse/CASSANDRA-17048. The question which somehow hold us is support for downgrading. Long story short, when we generate new sstables with uuid based ids, they are not readable by older C* versions. 1. should we implement a downgrade tool? (it may be quite complex) 2. should we let users enable the new uuid ids later when they are sure they will not downgrade in the future? Thanks, Jacek - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)
Though, the user is unable to test the new feature without enabling it. And when it is enabled, the user is unable to revert it. - - -- --- - - Jacek Lewandowski On Tue, Oct 26, 2021 at 12:54 PM Bowen Song wrote: > Personally, I would prefer a transition period in which the new feature > is not enabled by default. This not only makes version upgrading easier, > it also allows the user to stay on the old behaviour if they experience > any issue with the new feature (e.g.: bugs in the new feature, or edge > use cases / 3rd party tools depending on the old behaviour) until the > issue is resolved. > > On 26/10/2021 10:21, Jacek Lewandowski wrote: > > Hi, > > > > In short, we are discussing UUID based sstable generation identifiers in > https://issues.apache.org/jira/browse/CASSANDRA-17048. > > > > The question which somehow hold us is support for downgrading. Long > story short, when we generate new sstables with uuid based ids, they are > not readable by older C* versions. > > > > 1. should we implement a downgrade tool? (it may be quite complex) > > 2. should we let users enable the new uuid ids later when they are sure > they will not downgrade in the future? > > > > Thanks, > > Jacek > > > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)
The user will be able to test the new feature in a testing environment and not push the changes to their production environment if they are not satisfied. On 26/10/2021 12:00, Jacek Lewandowski wrote: Though, the user is unable to test the new feature without enabling it. And when it is enabled, the user is unable to revert it. - - -- --- - - Jacek Lewandowski On Tue, Oct 26, 2021 at 12:54 PM Bowen Song wrote: Personally, I would prefer a transition period in which the new feature is not enabled by default. This not only makes version upgrading easier, it also allows the user to stay on the old behaviour if they experience any issue with the new feature (e.g.: bugs in the new feature, or edge use cases / 3rd party tools depending on the old behaviour) until the issue is resolved. On 26/10/2021 10:21, Jacek Lewandowski wrote: Hi, In short, we are discussing UUID based sstable generation identifiers in https://issues.apache.org/jira/browse/CASSANDRA-17048. The question which somehow hold us is support for downgrading. Long story short, when we generate new sstables with uuid based ids, they are not readable by older C* versions. 1. should we implement a downgrade tool? (it may be quite complex) 2. should we let users enable the new uuid ids later when they are sure they will not downgrade in the future? Thanks, Jacek - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)
I think it is probably acceptable to prevent downgrades once a new feature is enabled, as the exposure risk is limited to that one feature. The user can test the new version to ensure everything else works satisfactorily before committing to this one feature. A downgrade tool would also be possible to produce, but probably the additional utility is limited. I think this particular feature is probably easy enough to maintain as permanently optional, simply maintaining two system tables: one for the old generation format, one for the new. So long as the user doesn’t use the new format, it remains forever downgradeable. Though perhaps one day we may want to force users to migrate, I don’t think there’s any rush, and the important thing to avoid is providing users no version buffer to escape new bugs – if a major version later we force upgrade, then they have a whole range of major versions to downgrade to that still support this feature (but perhaps avoid some other new issue) From: Jacek Lewandowski Date: Tuesday, 26 October 2021 at 12:01 To: dev@cassandra.apache.org Subject: Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048) Though, the user is unable to test the new feature without enabling it. And when it is enabled, the user is unable to revert it. - - -- --- - - Jacek Lewandowski On Tue, Oct 26, 2021 at 12:54 PM Bowen Song wrote: > Personally, I would prefer a transition period in which the new feature > is not enabled by default. This not only makes version upgrading easier, > it also allows the user to stay on the old behaviour if they experience > any issue with the new feature (e.g.: bugs in the new feature, or edge > use cases / 3rd party tools depending on the old behaviour) until the > issue is resolved. > > On 26/10/2021 10:21, Jacek Lewandowski wrote: > > Hi, > > > > In short, we are discussing UUID based sstable generation identifiers in > https://issues.apache.org/jira/browse/CASSANDRA-17048. > > > > The question which somehow hold us is support for downgrading. Long > story short, when we generate new sstables with uuid based ids, they are > not readable by older C* versions. > > > > 1. should we implement a downgrade tool? (it may be quite complex) > > 2. should we let users enable the new uuid ids later when they are sure > they will not downgrade in the future? > > > > Thanks, > > Jacek > > > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: [DISCUSS] CEP-18: Improving Modularity
> > To me having some defined interfaces for interacting with different > sections of the code is a huge boon for improving developer productivity > going forward in the project. Every place where we can reduce the amount > of code reaching inside another module to get at a random internal class is > a positive, I've long been of the opinion that the benefits outweigh the costs of having clear interface points between major subsystems in a codebase. I'm not particularly sympathetic to the concerns about friction on making changes to internal API's since modern IDE tooling makes this a trivial exercise, however I _am_ quite sympathetic to the concerns about introducing friction against deeper integrations between subsystems. That said, we have a history on the project of being somewhat hot and cold when it comes to our approach to performance testing; I think our low hanging fruit as a project revolves more around discipline and reproducibility on knowing where our performance is today and making changes with an eye to that rather than keeping open the flexibility of tightly coupling subsystems through their implementations. With the modern runtime environment shifting so much toward containerization I can't help but think smaller, clearly modularized components are more resilient against a rapidly evolving runtime environment and more sympathetic to the constrained resource environments they run in, as well as more classically optimizable in their own right. I air all this just to contribute perspective to the discussion; all that said, I think refactoring APIs as a pure reflection of what the DB is doing today just risks ossifying something that grew up organically and probably isn't going to do us any favors, so having a use-case (or better yet a few implementations) we're deriving an interface from, or targeting a more testable / mockable structure plus introducing those tests should give us guidance to improve the route we go. ~Josh On Mon, Oct 25, 2021 at 4:22 PM Jeremiah D Jordan wrote: > As Henrik said we have been refactoring access to these different internal > APIs as part of some larger work. For this CEP we pulled together a bunch > of the smaller ones into one place, similar to the refactoring proposed in > CEP-10, as we felt doing many small CEPs, one per module, would be less > productive if there was support in the project in general for trying to > standardize access to different sections of the code and start creating a > more defined internal API. If there is consensus that it would be better > to propose each change as its own CEP, or even just as single tickets > without a CEP for these internal refactors, we can do that as well. The > CEP process is evolving as we go through these, so just trying to figure > out the best way forward. > > The currently proposed changes in CEP-18 should all include improved test > coverage of the modules in question. We have been developing them all with > a requirement that all changes have at least %80 code coverage from sonar > cloud jacoco reports. We have also found and fixed some bugs in the > existing code during this development work. > > To me having some defined interfaces for interacting with different > sections of the code is a huge boon for improving developer productivity > going forward in the project. Every place where we can reduce the amount > of code reaching inside another module to get at a random internal class is > a positive, as it prevents unknown side effects when changing that module > when the person developing the new feature did not realize other parts of > the code were depending on some current internal behavior that was not > clearing part of the modules interface. > > On the question of changing internal interfaces that I have seen in some > other venues, I do not think creating such interfaces should prevent us > from changing them as needed for future work. I think having the > interfaces actually improves on our ability to do so without breaking other > parts of the code. My suggestion would be that we try not to make such > changes in patch releases if possible, but again I wouldn’t let that hold > anything back. > > So do people feel we should re-propose these as multiple CEP’s or just > tickets? Or do people prefer to have a discussion/vote on the idea of > improving the modularity of the code base in general? > > -Jeremiah > > > On Oct 25, 2021, at 9:26 AM, bened...@apache.org wrote: > > > > Thanks Henrik for the additional context. > > > > I’m not personally a fan of modularity only for modularity’s sake. > Everything in software is a balancing act of competing priorities, and > while pluggability supports certain use cases it can slow down development > or prevent deeper integrations by preventing assumptions about how systems > operate. > > > > To be clear, I’m fully in favour of helping to enable your use cases, I > just think it is important to make a decision for each refactor based on
Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)
+1 to Benedict's perspective here. Supporting both sstable ID paradigms should be relatively trivial and low cost to maintain going forward. On Tue, Oct 26, 2021 at 7:54 AM bened...@apache.org wrote: > I think it is probably acceptable to prevent downgrades once a new feature > is enabled, as the exposure risk is limited to that one feature. The user > can test the new version to ensure everything else works satisfactorily > before committing to this one feature. > > A downgrade tool would also be possible to produce, but probably the > additional utility is limited. > > I think this particular feature is probably easy enough to maintain as > permanently optional, simply maintaining two system tables: one for the old > generation format, one for the new. So long as the user doesn’t use the new > format, it remains forever downgradeable. Though perhaps one day we may > want to force users to migrate, I don’t think there’s any rush, and the > important thing to avoid is providing users no version buffer to escape new > bugs – if a major version later we force upgrade, then they have a whole > range of major versions to downgrade to that still support this feature > (but perhaps avoid some other new issue) > > > > From: Jacek Lewandowski > Date: Tuesday, 26 October 2021 at 12:01 > To: dev@cassandra.apache.org > Subject: Re: [DISCUSS] How to implement backward compatibility > (CASSANDRA-17048) > Though, the user is unable to test the new feature without enabling it. And > when it is enabled, the user is unable to revert it. > > - - -- --- - - > Jacek Lewandowski > > > On Tue, Oct 26, 2021 at 12:54 PM Bowen Song wrote: > > > Personally, I would prefer a transition period in which the new feature > > is not enabled by default. This not only makes version upgrading easier, > > it also allows the user to stay on the old behaviour if they experience > > any issue with the new feature (e.g.: bugs in the new feature, or edge > > use cases / 3rd party tools depending on the old behaviour) until the > > issue is resolved. > > > > On 26/10/2021 10:21, Jacek Lewandowski wrote: > > > Hi, > > > > > > In short, we are discussing UUID based sstable generation identifiers > in > > https://issues.apache.org/jira/browse/CASSANDRA-17048. > > > > > > The question which somehow hold us is support for downgrading. Long > > story short, when we generate new sstables with uuid based ids, they are > > not readable by older C* versions. > > > > > > 1. should we implement a downgrade tool? (it may be quite complex) > > > 2. should we let users enable the new uuid ids later when they are sure > > they will not downgrade in the future? > > > > > > Thanks, > > > Jacek > > > > > > > > > > > > - > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > >
Re: [DISCUSS] CEP-18: Improving Modularity
> I'm not particularly sympathetic to the concerns about friction on making > changes to internal API's since modern IDE tooling makes this a trivial > exercise We’re getting abstract here, so this isn’t a rebuttal or even tied to strongly this particularly discussion, but to express my point more clearly. We don’t abstract everything in the codebase, and in fact in general we (or at least, I) try to keep things concrete as long as there’s no reason to abstract them, because this is usually easier to reason about and lower overhead to modify. This is true even on the single class level, so of course it happens at the module level. This isn’t about the IDE refactoring, but the cognitive burden of reasoning simultaneously about the concrete class and the abstraction, and how they relate. The problem with premature abstraction, and particularly when multiple implementations start appearing, is that you have to start formalising the abstractions in ways that permit you to reason only about the abstraction. This necessarily means eschewing some knowledge of how the concrete implementation(s) work. This may prevent very useful simplifications for how you interact with a specific concrete implementation, as we have to code to the API. This may prevent optimisations. This may also introduce additional complexity when either implementing the abstraction or when reasoning about the actions you are performing against it, where often you may not entirely ignore the concrete implementation (due to imperfect or ambiguous API specifications), so you must now consider if you are compatible with both the abstraction and any known concrete implementations. These are all additional burdens, but we often pay the cost for perceived benefits. It seems to me though that this discussion is conflating modularisation/pluggability with decoupling, which is a benefit we might gain in return for these additional costs. To me this is a distinct problem, however. It’s quite possible to modularise and yet tightly couple, though usually it will break tight coupling. But breaking tight coupling doesn’t require modularisation, and certainly doesn’t require pluggability. To bring it back to this discussion, the intent of a piece of work always drives the outcome, and in my opinion it is best to always consider a work in its actual context. The primary purpose of this work is pluggability, and so this will inform the API modifications. A straightforward goal of reducing tight coupling in the codebase would likely approach this problem differently. None of this is a bad thing, just in my opinion the nature of development. That said, I’m broadly happy to see this work go ahead. I would prefer to split the conversations out into their driving projects for the aforementioned reasons, but I wouldn’t veto the proposal on that basis. It would be nice to see others’ opinions about this. The only sub-proposal I’m particularly unsure about is 17059, which doesn’t seem to increase modularity at all. It looks to be a kind of plugin hook, and IMO should definitely be addressed separately. Perhaps a simple DISCUSS thread and its Jira will suffice? From: Joshua McKenzie Date: Tuesday, 26 October 2021 at 19:16 To: dev@cassandra.apache.org Subject: Re: [DISCUSS] CEP-18: Improving Modularity > > To me having some defined interfaces for interacting with different > sections of the code is a huge boon for improving developer productivity > going forward in the project. Every place where we can reduce the amount > of code reaching inside another module to get at a random internal class is > a positive, I've long been of the opinion that the benefits outweigh the costs of having clear interface points between major subsystems in a codebase. I'm not particularly sympathetic to the concerns about friction on making changes to internal API's since modern IDE tooling makes this a trivial exercise, however I _am_ quite sympathetic to the concerns about introducing friction against deeper integrations between subsystems. That said, we have a history on the project of being somewhat hot and cold when it comes to our approach to performance testing; I think our low hanging fruit as a project revolves more around discipline and reproducibility on knowing where our performance is today and making changes with an eye to that rather than keeping open the flexibility of tightly coupling subsystems through their implementations. With the modern runtime environment shifting so much toward containerization I can't help but think smaller, clearly modularized components are more resilient against a rapidly evolving runtime environment and more sympathetic to the constrained resource environments they run in, as well as more classically optimizable in their own right. I air all this just to contribute perspective to the discussion; all that said, I think refactoring APIs as a pure reflection of what the DB is doing today just ris
Re: [DISCUSS] CEP-18: Improving Modularity
I am all for good extensibility / interfaces and so on, however I am afraid that this might actually break a lot of things if enough attention is not paid. For example, over all these years, the community around Cassandra tooling is somehow used to the "mess", placing one fat jar to the class path and it somehow works. Then we just cherry-pick what we want and we are all (reasonably) happy if we do not find ourselves doing some reflection because we just need this private final field to be public and non-final and for some reason a developer was thinking it is actually a good idea to do it like that ... Even these ceps are not about modularity on a build system level (as Cassandra would logically consist of different jars) (if I understand that correctly), if changes are introduced e.g. in 4.1, then 4.2 then 4.3 and so on, the tooling which expects that it will work for all point releases might have to accommodate to each of these releases which is quite a bummer. There is not always a bandwidth to support each individual version of a tool. Maybe one for 4, 3.11, 3.0 and that's it. I just want to stress the fact that from the users' and integrators' perspective it has to be a smooth transition. So yes, extend, but do not break, please. Before any big refactoring, I would actually spend some time on removing what is not necessary. If one digs deeper, Cassandra is living with a lot of legacy code. For example, I was removing support for Windows which is taking away a lot of stuff with it. I believe there are many places where we are just taking a lot of baggage with us because ... Snapshot subsystem we are looking into together with Paulo Motta is another example of how weirdly wired a subsystem might be. It is all over the place and it is quite discouraging to implement something new without cleaning it all up first because it just does not make sense to add on top of that anymore. The way I see it is that while working on this "extensibility and interfaces work" we should probably also focus on getting rid of what is obsolete and simplify and unify the codebase where it smells. I am pretty confident that extending / interfacing would be way easier too. If this is a side effect of these CEPs I am all over it. On Tue, 26 Oct 2021 at 20:16, Joshua McKenzie wrote: > > > > > To me having some defined interfaces for interacting with different > > sections of the code is a huge boon for improving developer productivity > > going forward in the project. Every place where we can reduce the amount > > of code reaching inside another module to get at a random internal class is > > a positive, > > I've long been of the opinion that the benefits outweigh the costs of > having clear interface points between major subsystems in a codebase. I'm > not particularly sympathetic to the concerns about friction on making > changes to internal API's since modern IDE tooling makes this a trivial > exercise, however I _am_ quite sympathetic to the concerns about > introducing friction against deeper integrations between subsystems. > > That said, we have a history on the project of being somewhat hot and cold > when it comes to our approach to performance testing; I think our low > hanging fruit as a project revolves more around discipline and > reproducibility on knowing where our performance is today and making > changes with an eye to that rather than keeping open the flexibility of > tightly coupling subsystems through their implementations. > > With the modern runtime environment shifting so much toward > containerization I can't help but think smaller, clearly modularized > components are more resilient against a rapidly evolving runtime > environment and more sympathetic to the constrained resource environments > they run in, as well as more classically optimizable in their own right. > > I air all this just to contribute perspective to the discussion; all that > said, I think refactoring APIs as a pure reflection of what the DB is doing > today just risks ossifying something that grew up organically and probably > isn't going to do us any favors, so having a use-case (or better yet a few > implementations) we're deriving an interface from, or targeting a more > testable / mockable structure plus introducing those tests should give us > guidance to improve the route we go. > > ~Josh > > > On Mon, Oct 25, 2021 at 4:22 PM Jeremiah D Jordan > wrote: > > > As Henrik said we have been refactoring access to these different internal > > APIs as part of some larger work. For this CEP we pulled together a bunch > > of the smaller ones into one place, similar to the refactoring proposed in > > CEP-10, as we felt doing many small CEPs, one per module, would be less > > productive if there was support in the project in general for trying to > > standardize access to different sections of the code and start creating a > > more defined internal API. If there is consensus that it would be better > > to propose each change as
Re: [DISCUSS] CEP-18: Improving Modularity
> The only sub-proposal I’m particularly unsure about is 17059, which doesn’t > seem to increase modularity at all. It looks to be a kind of plugin hook, and > IMO should definitely be addressed separately. Perhaps a simple DISCUSS > thread and its Jira will suffice? Ok. I will remove that one from the CEP to discuss separately. > On Oct 26, 2021, at 2:32 PM, bened...@apache.org wrote: > >> I'm not particularly sympathetic to the concerns about friction on making >> changes to internal API's since modern IDE tooling makes this a trivial >> exercise > > We’re getting abstract here, so this isn’t a rebuttal or even tied to > strongly this particularly discussion, but to express my point more clearly. > > We don’t abstract everything in the codebase, and in fact in general we (or > at least, I) try to keep things concrete as long as there’s no reason to > abstract them, because this is usually easier to reason about and lower > overhead to modify. This is true even on the single class level, so of course > it happens at the module level. This isn’t about the IDE refactoring, but the > cognitive burden of reasoning simultaneously about the concrete class and the > abstraction, and how they relate. > > The problem with premature abstraction, and particularly when multiple > implementations start appearing, is that you have to start formalising the > abstractions in ways that permit you to reason only about the abstraction. > This necessarily means eschewing some knowledge of how the concrete > implementation(s) work. This may prevent very useful simplifications for how > you interact with a specific concrete implementation, as we have to code to > the API. This may prevent optimisations. This may also introduce additional > complexity when either implementing the abstraction or when reasoning about > the actions you are performing against it, where often you may not entirely > ignore the concrete implementation (due to imperfect or ambiguous API > specifications), so you must now consider if you are compatible with both the > abstraction and any known concrete implementations. > > These are all additional burdens, but we often pay the cost for perceived > benefits. > > It seems to me though that this discussion is conflating > modularisation/pluggability with decoupling, which is a benefit we might gain > in return for these additional costs. To me this is a distinct problem, > however. It’s quite possible to modularise and yet tightly couple, though > usually it will break tight coupling. But breaking tight coupling doesn’t > require modularisation, and certainly doesn’t require pluggability. > > To bring it back to this discussion, the intent of a piece of work always > drives the outcome, and in my opinion it is best to always consider a work in > its actual context. The primary purpose of this work is pluggability, and so > this will inform the API modifications. A straightforward goal of reducing > tight coupling in the codebase would likely approach this problem > differently. None of this is a bad thing, just in my opinion the nature of > development. > > That said, I’m broadly happy to see this work go ahead. I would prefer to > split the conversations out into their driving projects for the > aforementioned reasons, but I wouldn’t veto the proposal on that basis. It > would be nice to see others’ opinions about this. > > The only sub-proposal I’m particularly unsure about is 17059, which doesn’t > seem to increase modularity at all. It looks to be a kind of plugin hook, and > IMO should definitely be addressed separately. Perhaps a simple DISCUSS > thread and its Jira will suffice? > > > From: Joshua McKenzie > Date: Tuesday, 26 October 2021 at 19:16 > To: dev@cassandra.apache.org > Subject: Re: [DISCUSS] CEP-18: Improving Modularity >> >> To me having some defined interfaces for interacting with different >> sections of the code is a huge boon for improving developer productivity >> going forward in the project. Every place where we can reduce the amount >> of code reaching inside another module to get at a random internal class is >> a positive, > > I've long been of the opinion that the benefits outweigh the costs of > having clear interface points between major subsystems in a codebase. I'm > not particularly sympathetic to the concerns about friction on making > changes to internal API's since modern IDE tooling makes this a trivial > exercise, however I _am_ quite sympathetic to the concerns about > introducing friction against deeper integrations between subsystems. > > That said, we have a history on the project of being somewhat hot and cold > when it comes to our approach to performance testing; I think our low > hanging fruit as a project revolves more around discipline and > reproducibility on knowing where our performance is today and making > changes with an eye to that rather than keeping open the flexibility of > tightly coupl
Re: [DISCUSS] CEP-18: Improving Modularity
> we should probably also focus on getting rid of what is obsolete and simplify > and unify the codebase where it smells. Agreed. Lots of the codebase has had a spring clean over the past couple of years, but lots hasn’t. Some areas are very long in the tooth and could do with some heavy pruning. From: Stefan Miklosovic Date: Tuesday, 26 October 2021 at 20:40 To: dev@cassandra.apache.org Subject: Re: [DISCUSS] CEP-18: Improving Modularity I am all for good extensibility / interfaces and so on, however I am afraid that this might actually break a lot of things if enough attention is not paid. For example, over all these years, the community around Cassandra tooling is somehow used to the "mess", placing one fat jar to the class path and it somehow works. Then we just cherry-pick what we want and we are all (reasonably) happy if we do not find ourselves doing some reflection because we just need this private final field to be public and non-final and for some reason a developer was thinking it is actually a good idea to do it like that ... Even these ceps are not about modularity on a build system level (as Cassandra would logically consist of different jars) (if I understand that correctly), if changes are introduced e.g. in 4.1, then 4.2 then 4.3 and so on, the tooling which expects that it will work for all point releases might have to accommodate to each of these releases which is quite a bummer. There is not always a bandwidth to support each individual version of a tool. Maybe one for 4, 3.11, 3.0 and that's it. I just want to stress the fact that from the users' and integrators' perspective it has to be a smooth transition. So yes, extend, but do not break, please. Before any big refactoring, I would actually spend some time on removing what is not necessary. If one digs deeper, Cassandra is living with a lot of legacy code. For example, I was removing support for Windows which is taking away a lot of stuff with it. I believe there are many places where we are just taking a lot of baggage with us because ... Snapshot subsystem we are looking into together with Paulo Motta is another example of how weirdly wired a subsystem might be. It is all over the place and it is quite discouraging to implement something new without cleaning it all up first because it just does not make sense to add on top of that anymore. The way I see it is that while working on this "extensibility and interfaces work" we should probably also focus on getting rid of what is obsolete and simplify and unify the codebase where it smells. I am pretty confident that extending / interfacing would be way easier too. If this is a side effect of these CEPs I am all over it. On Tue, 26 Oct 2021 at 20:16, Joshua McKenzie wrote: > > > > > To me having some defined interfaces for interacting with different > > sections of the code is a huge boon for improving developer productivity > > going forward in the project. Every place where we can reduce the amount > > of code reaching inside another module to get at a random internal class is > > a positive, > > I've long been of the opinion that the benefits outweigh the costs of > having clear interface points between major subsystems in a codebase. I'm > not particularly sympathetic to the concerns about friction on making > changes to internal API's since modern IDE tooling makes this a trivial > exercise, however I _am_ quite sympathetic to the concerns about > introducing friction against deeper integrations between subsystems. > > That said, we have a history on the project of being somewhat hot and cold > when it comes to our approach to performance testing; I think our low > hanging fruit as a project revolves more around discipline and > reproducibility on knowing where our performance is today and making > changes with an eye to that rather than keeping open the flexibility of > tightly coupling subsystems through their implementations. > > With the modern runtime environment shifting so much toward > containerization I can't help but think smaller, clearly modularized > components are more resilient against a rapidly evolving runtime > environment and more sympathetic to the constrained resource environments > they run in, as well as more classically optimizable in their own right. > > I air all this just to contribute perspective to the discussion; all that > said, I think refactoring APIs as a pure reflection of what the DB is doing > today just risks ossifying something that grew up organically and probably > isn't going to do us any favors, so having a use-case (or better yet a few > implementations) we're deriving an interface from, or targeting a more > testable / mockable structure plus introducing those tests should give us > guidance to improve the route we go. > > ~Josh > > > On Mon, Oct 25, 2021 at 4:22 PM Jeremiah D Jordan > wrote: > > > As Henrik said we have been refactoring access to these different internal > > APIs as part of some larger work. For thi
Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)
Yes, those explanations sound very reasonable to me as well and I'll push the implementation soon. Thank you guys On 2021/10/26 18:21:44, Joshua McKenzie wrote: > +1 to Benedict's perspective here. Supporting both sstable ID paradigms > should be relatively trivial and low cost to maintain going forward. > > On Tue, Oct 26, 2021 at 7:54 AM bened...@apache.org > wrote: > > > I think it is probably acceptable to prevent downgrades once a new feature > > is enabled, as the exposure risk is limited to that one feature. The user > > can test the new version to ensure everything else works satisfactorily > > before committing to this one feature. > > > > A downgrade tool would also be possible to produce, but probably the > > additional utility is limited. > > > > I think this particular feature is probably easy enough to maintain as > > permanently optional, simply maintaining two system tables: one for the old > > generation format, one for the new. So long as the user doesn’t use the new > > format, it remains forever downgradeable. Though perhaps one day we may > > want to force users to migrate, I don’t think there’s any rush, and the > > important thing to avoid is providing users no version buffer to escape new > > bugs – if a major version later we force upgrade, then they have a whole > > range of major versions to downgrade to that still support this feature > > (but perhaps avoid some other new issue) > > > > > > > > From: Jacek Lewandowski > > Date: Tuesday, 26 October 2021 at 12:01 > > To: dev@cassandra.apache.org > > Subject: Re: [DISCUSS] How to implement backward compatibility > > (CASSANDRA-17048) > > Though, the user is unable to test the new feature without enabling it. And > > when it is enabled, the user is unable to revert it. > > > > - - -- --- - - > > Jacek Lewandowski > > > > > > On Tue, Oct 26, 2021 at 12:54 PM Bowen Song wrote: > > > > > Personally, I would prefer a transition period in which the new feature > > > is not enabled by default. This not only makes version upgrading easier, > > > it also allows the user to stay on the old behaviour if they experience > > > any issue with the new feature (e.g.: bugs in the new feature, or edge > > > use cases / 3rd party tools depending on the old behaviour) until the > > > issue is resolved. > > > > > > On 26/10/2021 10:21, Jacek Lewandowski wrote: > > > > Hi, > > > > > > > > In short, we are discussing UUID based sstable generation identifiers > > in > > > https://issues.apache.org/jira/browse/CASSANDRA-17048. > > > > > > > > The question which somehow hold us is support for downgrading. Long > > > story short, when we generate new sstables with uuid based ids, they are > > > not readable by older C* versions. > > > > > > > > 1. should we implement a downgrade tool? (it may be quite complex) > > > > 2. should we let users enable the new uuid ids later when they are sure > > > they will not downgrade in the future? > > > > > > > > Thanks, > > > > Jacek > > > > > > > > > > > > > > > > - > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > - > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] CEP-18: Improving Modularity
> On Oct 25, 2021, at 1:22 PM, Jeremiah D Jordan wrote: > > The currently proposed changes in CEP-18 should all include improved test > coverage of the modules in question. We have been developing them all with a > requirement that all changes have at least %80 code coverage from sonar cloud > jacoco reports. We have also found and fixed some bugs in the existing code > during this development work. This is great! We, as a project, should encourage improved test code coverage. So I welcome this change. > So do people feel we should re-propose these as multiple CEP’s or just > tickets? Or do people prefer to have a discussion/vote on the idea of > improving the modularity of the code base in general? My personal preference would be to see this work appear as individual CEPs or even JIRA tickets with discussions but definitely not one giant CEP that is pulling together a lot of different changes. I really like the idea of building pluggable modular components. However, I am concerned about few things. 1. Performance regression. 2. Breaking backward compatibility for our users & tools. 3. Interfaces with single implementation. I would like to ensure that we are mindful of these concerns while making big refactors. Thanks, Dinesh