> I do think adding the ability to do “Cluster and Code Simulations” is a new > feature.
I don’t. I understand a feature to be a user-visible change, such as new functionality, and it was on this basis I endorsed the release lifecycle document. I do not believe that all improvement should stop to patch releases, as I do not believe this produces the highest quality outcome. From: Jeremiah D Jordan <jerem...@datastax.com> Date: Tuesday, 13 July 2021 at 14:41 To: Cassandra DEV <dev@cassandra.apache.org> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations I do not think fixing CASSANDRA-12126 is not a new feature. I do think adding the ability to do “Cluster and Code Simulations” is a new feature. -Jeremiah > On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote: > > Nothing we’re discussing constitutes a feature. We’re discussing stability > enhancements, and important bug fixes. > > I think this disagreement is to some extent founded on our different premises > about what a patch release should contain, and this seems to be the fault of > incompletely specified documentation. > > 1. The release lifecycle only forbids feature work from being developed in a > patch release, and only expressly includes bug fixes. Note that, the document > even has a comment by the author suggesting that features may be backported > to a patch release from trunk (not something I agree with, but it > demonstrates the ambiguity of the definition). > 2. There seems to be some conflation of size-of-change with the admissibility > wrt release lifecycle – I don’t think there’s any criteria here, and it’s > open to the community’s case-by-case assessment. Whatever we do to fix the > bug in question will necessarily be a very significant piece of work itself, > for instance. > > My interpretation of the release lifecycle document is that it is acceptable > to include this work in a patch release. My belief about its impact is that > it would contribute positively to the stability of the project’s 4.0 releases > over the lifecycle, and improve project velocity. > > With respect to whether we can ship a fix to 12126 without validation, I > would be strongly opposed to this, and certainly would not produce a patch > myself in this way. Not only would it be burdensome (given the divergences in > the codebase), but I would not consider it acceptably safe (given the > divergence). > > > From: Jeremiah D Jordan <jeremiah.jor...@gmail.com> > Date: Tuesday, 13 July 2021 at 14:15 > To: Cassandra DEV <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations > I tend to agree with Paulo that a major refactoring of some internal > interfaces sounds like something to be explicitly avoided in a patch release. > I thought this was the type of change we all agreed we should stop letting > in to patch releases, and that we would attempt to release more often (once a > year) so changes that only go to trunk would get out faster? Are we really > wanting to break that promise to ourselves before we even release 4.0? To me > “I think we need this feature released faster” is not a reason to put it in > 4.0, it could be a reason to release 4.1 sooner. This is where having a > releasable trunk helps, as if we decided as a project that some change was > worth a new major being released early the effort of doing that release is > much smaller when trunk is releasable. > > Any fix we make in 4.0 would be merged forward into trunk and could be fully > verified there? Probably not the best, but would give more confidence in a > fix than otherwise without adding other major changes to 4.0? > > -Jeremiah > >> On Jul 13, 2021, at 7:59 AM, Benjamin Lerer <b.le...@gmail.com> wrote: >> >>> >>> Furthermore, we introduced a significant performance regression in all >>> lines of the software by increasing the number of LWT round-trips. Unless >>> we intend to leave this regression for a further year without _any_ release >>> offering a solution, we will need suitable verification mechanisms for >>> whatever fixes we deliver. >>> >>> My view is that it is unacceptable to leave such a significant regression >>> unaddressed in all lines of software we intend to release for the >>> foreseeable future. >> >> >> I would like to expand a bit on this as I believe it might be important for >> people to have the full picture. The fix for CASSANDRA-12126 >> <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a >> regression by increasing the number of LWT round-trips. Nevertheless, the >> patch introduced a flag to allow users to revert to the previous behavior >> (previous performance + consistency issue). >> >> Also the patch did not address all paxos consistency issues. There are >> still some issues during topologie changes (may be in some other scenarios). >> >> My understanding of Benedict's proposal is to fix paxos once and for all >> without any performance regression. >> >> That goal makes total sense to me. "Where do we do that?" is a more tricky >> question. >> >> Le mar. 13 juil. 2021 à 14:46, bened...@apache.org <bened...@apache.org> a >> écrit : >> >>> Hmm. It occurs to me I’m not entirely sure how our new release process is >>> going to work. >>> >>> Will we be releasing 4.1 builds immediately, as part of shippable trunk? >>> Or will 4.0 be our only active line of software for the next year? >>> >>> Either way, I bet my bottom dollar there will come some regret if we >>> introduce such divergence between the two most active branches we maintain, >>> so early in their lifecycles. If we invest significant resources in >>> improved testing using this framework (which I very much expect) then >>> branches that are not compatible will not benefit, likely reducing their >>> quality; and the risk of backports will increase, due to divergence. >>> >>> Altogether, I think it would be a huge mistake. But if we will be shipping >>> releases soon that can fix these aforementioned regressions, I won’t >>> campaign for it. >>> >>> >>> >>> From: bened...@apache.org <bened...@apache.org> >>> Date: Tuesday, 13 July 2021 at 13:31 >>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations >>> No change is without risk; we have introduced serious regressions with bug >>> fixes to patch releases. The overall risk to the release lifecycle is >>> reduced significantly in my opinion, as we reduce the likelihood of >>> introducing regressions, and can use the same test infrastructure across >>> all of the actively developed releases, increasing our confidence in 4.0.x >>> releases. >>> >>> Furthermore, we introduced a significant performance regression in all >>> lines of the software by increasing the number of LWT round-trips. Unless >>> we intend to leave this regression for a further year without _any_ release >>> offering a solution, we will need suitable verification mechanisms for >>> whatever fixes we deliver. >>> >>> My view is that it is unacceptable to leave such a significant regression >>> unaddressed in all lines of software we intend to release for the >>> foreseeable future. >>> >>> >>> From: Paulo Motta <pauloricard...@gmail.com> >>> Date: Tuesday, 13 July 2021 at 13:21 >>> To: Cassandra DEV <dev@cassandra.apache.org> >>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations >>>> No, in my opinion the target should be 4.0.x. We are reaching for a >>> shippable trunk and this has no public API impacts. This work is IMO >>> central to achieving a shippable trunk, either way. The only reason I do >>> not target 3.x is that it would be too burdensome. >>> >>> In my limited view of the proposal, a major refactor of internal >>> concurrency APIs to support the testing facility potentially risks the >>> stability of a minor release, something we've been wanting to avoid with >>> our focus on stability. So I'd prefer this to go in trunk/4.1, otherwise >>> we will create precedence to including non-bugfix changes in minor >>> versions, something I think we should avoid. >>> >>> In the past we've been lenient to including seemingly harmless internal >>> changes that caused client impact and we should be careful to avoid this in >>> the future. To prevent this I think we should take a strict approach and >>> only accept bug fixes in minor (ie. 4.0.x) versions moving forward. >>> >>> I'd go one step further and propose that any CEPs, which are generally >>> about new features, major API changes or internal refactorings, should only >>> be allowed in subsequent major versions, unless an explicit exception is >>> granted. >>> >>> Em ter., 13 de jul. de 2021 às 07:11, bened...@apache.org < >>> bened...@apache.org> escreveu: >>> >>>> Perhaps it’s worth looking forward at the roadmap that we plan to >>> develop, >>>> and consider whether such a facility would be welcome for proving their >>>> safety, and we can then worry about evolving the specifics of any API(s) >>>> together as we deploy the capability? Looking ahead, there are very few >>>> major features I wouldn’t want to see exercised with this approach, given >>>> the choice. >>>> >>>> The LWT Verifier by itself is an integration test that covers many of the >>>> affected subsystems, including sstables, memtables and repair. But we >>> will >>>> have the ability to introduce dedicated verification for each of these >>>> features and systems, and we will necessarily produce more robust code >>>> (repair is a great example of a brittle system that would be impossible >>> to >>>> produce with such an adversarial test system) >>>> >>>> >>>> *Query side improvements:* >>>> >>>> * Storage Attached Index or SAI. The CEP can be found at >>>> >>>> >>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index >>>> * Add support for OR predicates in the CQL where clause >>>> * Allow to aggregate by time intervals (CASSANDRA-11871) and allow UDFs >>>> in GROUP BY clause >>>> * Ability to read the TTL and WRITE TIME of an element in a collection >>>> (CASSANDRA-8877) >>>> * Multi-Partition LWTs >>>> * Materialized views hardening: Addressing the different Materialized >>>> Views issues (see CASSANDRA-15921 and [1] for some of the work involved) >>>> >>>> *Security improvements:* >>>> >>>> * SSTables encryption (CASSANDRA-9633) >>>> * Add support for Dynamic Data Masking (CEP pending) >>>> * Allow the creation of roles that have the ability to assign arbitrary >>>> privileges, or scoped privileges without also granting those roles access >>>> to database objects. >>>> * Filter rows from system and system_schema based on users permissions >>>> (CASSANDRA-15871) >>>> >>>> *Performance improvements:* >>>> >>>> * Trie-based index format (CEP pending) >>>> * Trie-based memtables (CEP pending) >>>> * Paxos improvements: Paxos / LWT implementation that would enable the >>>> database to serve serial writes with two round-trips and serial reads >>> with >>>> one round-trip in the uncontended case >>>> >>>> *Safety/Usability improvements:* >>>> >>>> * Guardrails. The CEP can be found at >>>> >>>> >>> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails >>>> * Add ability to track state in repair (CASSANDRA-15399) >>>> * Repair coordinator improvements (CASSANDRA-15399) >>>> * Make incremental backup configurable per keyspace and table >>>> (CASSANDRA-15402) >>>> * Add ability to blacklist a CQL partition so all requests are ignored >>>> (CASSANDRA-12106) >>>> * Add default and required keyspace replication options >>> (CASSANDRA-14557) >>>> * Transactional Cluster Metadata: Use of transactions to propagate >>>> cluster metadata >>>> * Downgrade-ability: Ability to downgrade to downgrade in the event >>> that >>>> a serious issue has been identified >>>> >>>> *Pluggability improvements:* >>>> >>>> * Pluggable schema manager (CEP pending) >>>> * Pluggable filesystem (CEP pending) >>>> * Pluggable authenticator for CQLSH (CASSANDRA-16456). A CEP draft can >>> be >>>> found at >>>> >>>> >>> https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit >>>> * Memtable API (CEP pending). The goal being to allow improvements such >>>> as CASSANDRA-13981 to be easily plugged into Cassandra >>>> >>>> *Memtable pluggable implementation:* >>>> >>>> * Enable Cassandra for Persistent Memory (CASSANDRA-13981) >>>> >>>> >>>> >>>> >>>> From: bened...@apache.org <bened...@apache.org> >>>> Date: Tuesday, 13 July 2021 at 10:51 >>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations >>>> Ach, editing code in the email editor isn’t smart when editors all have >>>> different meanings for key combinations (accidentally hit send), but you >>>> get the idea. The simulator would intercept these thread executions, the >>>> memory accesses for the annotated field, and evaluate them so that in >>> some >>>> cases the assertions would fail. >>>> >>>> This is obviously a toy example that is not very interesting, but the >>> main >>>> real example we have is too complicated to produce a snippet to >>>> demonstrate. In my view, the long term outcome of this work is likely the >>>> enablement of many unit tests that are a little more complicated than >>> this, >>>> on less obvious code. >>>> >>>> But the headline goal of the CEP is not. By itself, the LWT Verifier >>>> demonstrates the power and utility of the work. I don’t believe it is >>>> terribly helpful to focus on secondary justifications like the example I >>>> gave. For me, the _ability_ to prove the correctness of difficult but >>>> critical systems is justification enough, whether or not we deliver a >>>> simple API as part of the CEP. >>>> >>>> >>>> >>>> From: bened...@apache.org <bened...@apache.org> >>>> Date: Tuesday, 13 July 2021 at 10:43 >>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations >>>>> Should target release be 4.1. (not 4.0.x) ? >>>> >>>> >>>> >>>> No, in my opinion the target should be 4.0.x. We are reaching for a >>>> shippable trunk and this has no public API impacts. This work is IMO >>>> central to achieving a shippable trunk, either way. The only reason I do >>>> not target 3.x is that it would be too burdensome. >>>> >>>>> My concern is that changing code and tests at the same time risks >>>> regressions… >>>> >>>> >>>> >>>> I’ve never heard this position before. Would you care to elaborate? It is >>>> quite normal for us to update tests alongside changes to the code. >>>> >>>>> And seconding Benjamin's comments… some documentation on how to write a >>>> test, and a simple test example, that this CEP then allows us to write >>>> would help a lot (a la "working backwards"). >>>> >>>> 1) This work is to _enable_ the development of tests, with the only test >>>> originally planned to arrive alongside it the fairly sophisticated LWT >>>> Verifier. This is something we have sorely needed as a project, as we >>> have >>>> had serious correctness violations for multiple years. This broad >>> category >>>> of integrated test for verifying correctness is the main goal of the work >>>> and is not easily condensed into an example snippet. >>>> 2) It is _possible_ that some simple and fluid APIs will be introduced in >>>> a later phase of this work, but they haven’t been designed yet, so I >>> cannot >>>> share snippets. >>>> >>>> In principle, however, you would be able to do something like: >>>> >>>> @Nemesis volatile int x = 0; >>>> int foo() { >>>> x = x + 1; >>>> return x; >>>> } >>>> >>>> @Test >>>> void test() { >>>> Future<?> f1 = executor.submit(() -> foo()); >>>> Future<?> f2 = executor.submit(() -> foo()); >>>> Assert.assertTrue(f1.get() == 1 || f2.get() == 1); >>>> } >>>> >>>> >>>> From: Mick Semb Wever <m...@apache.org> >>>> Date: Tuesday, 13 July 2021 at 10:28 >>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations >>>>> >>>>> To achieve this, significant modifications will be required to the >>>> codebase, mostly cleaning up existing abstractions. Specifically, we will >>>> need to be able to mock executors, any blocking concurrency primitives, >>>> time, filesystem access and internode streaming. >>>>> >>>>> The work is – in large part – already complete, with JIRA and PRs to >>>> follow in the coming weeks. Of course, the work is subject to the usual >>>> community input and review, so this does not preclude changes to the work >>>> (even significant ones, if they are warranted). I know a lot of incoming >>>> CEP are likely to be backed up by significant off-list development as a >>>> result of the focus on a shippable 4.0. Hopefully this is just a >>> temporary >>>> growing pain, particularly as we move towards a shippable trunk. >>>>> >>>>> I hope this work will be of huge value to the project, particularly as >>>> we race to catch up on years of limited feature development. >>>>> >>>>> JIRA and PRs will follow, but I wanted to kick-off discussion in >>> advance. >>>>> >>>> >>>> >>>> >>>> Should target release be 4.1. (not 4.0.x) ? >>>> >>>> I'd be interested in seeing a rough timeline/plan of how the proposed >>>> changes are to be defined in JIRAs and ordered. >>>> >>>> I'd like to hear a bit more about the test plan. Not so much about how >>>> the CEP itself improves testability of the project, but for example >>>> the testing required to be in place to introduce the changes of the >>>> CEP (and if it already exists, where). My concern is that changing >>>> code and tests at the same time risks regressions… >>>> >>>> And seconding Benjamin's comments… some documentation on how to write >>>> a test, and a simple test example, that this CEP then allows us to >>>> write would help a lot (a la "working backwards"). >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>> >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org