Re: [DISCUSS] CEP-10: Cluster and Code Simulations

bened...@apache.org Tue, 13 Jul 2021 06:47:38 -0700

> I do think adding the ability to do “Cluster and Code Simulations” is a new 
> feature.


I don’t. I understand a feature to be a user-visible change, such as new 
functionality, and it was on this basis I endorsed the release lifecycle 
document. I do not believe that all improvement should stop to patch releases, 
as I do not believe this produces the highest quality outcome.




From: Jeremiah D Jordan <jerem...@datastax.com>
Date: Tuesday, 13 July 2021 at 14:41
To: Cassandra DEV <dev@cassandra.apache.org>
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
I do not think fixing CASSANDRA-12126 is not a new feature.  I do think adding 
the ability to do “Cluster and Code Simulations” is a new feature.

-Jeremiah

> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote:
>
> Nothing we’re discussing constitutes a feature. We’re discussing stability 
> enhancements, and important bug fixes.
>
> I think this disagreement is to some extent founded on our different premises 
> about what a patch release should contain, and this seems to be the fault of 
> incompletely specified documentation.
>
> 1. The release lifecycle only forbids feature work from being developed in a 
> patch release, and only expressly includes bug fixes. Note that, the document 
> even has a comment by the author suggesting that features may be backported 
> to a patch release from trunk (not something I agree with, but it 
> demonstrates the ambiguity of the definition).
> 2. There seems to be some conflation of size-of-change with the admissibility 
> wrt release lifecycle – I don’t think there’s any criteria here, and it’s 
> open to the community’s case-by-case assessment. Whatever we do to fix the 
> bug in question will necessarily be a very significant piece of work itself, 
> for instance.
>
> My interpretation of the release lifecycle document is that it is acceptable 
> to include this work in a patch release. My belief about its impact is that 
> it would contribute positively to the stability of the project’s 4.0 releases 
> over the lifecycle, and improve project velocity.
>
> With respect to whether we can ship a fix to 12126 without validation, I 
> would be strongly opposed to this, and certainly would not produce a patch 
> myself in this way. Not only would it be burdensome (given the divergences in 
> the codebase), but I would not consider it acceptably safe (given the 
> divergence).
>
>
> From: Jeremiah D Jordan <jeremiah.jor...@gmail.com>
> Date: Tuesday, 13 July 2021 at 14:15
> To: Cassandra DEV <dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> I tend to agree with Paulo that a major refactoring of some internal 
> interfaces sounds like something to be explicitly avoided in a patch release. 
>  I thought this was the type of change we all agreed we should stop letting 
> in to patch releases, and that we would attempt to release more often (once a 
> year) so changes that only go to trunk would get out faster?  Are we really 
> wanting to break that promise to ourselves before we even release 4.0?  To me 
> “I think we need this feature released faster” is not a reason to put it in 
> 4.0, it could be a reason to release 4.1 sooner.  This is where having a 
> releasable trunk helps, as if we decided as a project that some change was 
> worth a new major being released early the effort of doing that release is 
> much smaller when trunk is releasable.
>
> Any fix we make in 4.0 would be merged forward into trunk and could be fully 
> verified there?  Probably not the best, but would give more confidence in a 
> fix than otherwise without adding other major changes to 4.0?
>
> -Jeremiah
>
>> On Jul 13, 2021, at 7:59 AM, Benjamin Lerer <b.le...@gmail.com> wrote:
>>
>>>
>>> Furthermore, we introduced a significant performance regression in all
>>> lines of the software by increasing the number of LWT round-trips. Unless
>>> we intend to leave this regression for a further year without _any_ release
>>> offering a solution, we will need suitable verification mechanisms for
>>> whatever fixes we deliver.
>>>
>>> My view is that it is unacceptable to leave such a significant regression
>>> unaddressed in all lines of software we intend to release for the
>>> foreseeable future.
>>
>>
>> I would like to expand a bit on this as I believe it might be important for
>> people to have the full picture. The fix for  CASSANDRA-12126
>> <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a
>> regression by increasing the number of LWT round-trips. Nevertheless, the
>> patch introduced a flag to allow users to revert to the previous behavior
>> (previous performance + consistency issue).
>>
>> Also the patch did not address all paxos consistency issues. There are
>> still some issues during topologie changes (may be in some other scenarios).
>>
>> My understanding of Benedict's proposal is to fix paxos once and for all
>> without any performance regression.
>>
>> That goal makes total sense to me. "Where do we do that?" is a more tricky
>> question.
>>
>> Le mar. 13 juil. 2021 à 14:46, bened...@apache.org <bened...@apache.org> a
>> écrit :
>>
>>> Hmm. It occurs to me I’m not entirely sure how our new release process is
>>> going to work.
>>>
>>> Will we be releasing 4.1 builds immediately, as part of shippable trunk?
>>> Or will 4.0 be our only active line of software for the next year?
>>>
>>> Either way, I bet my bottom dollar there will come some regret if we
>>> introduce such divergence between the two most active branches we maintain,
>>> so early in their lifecycles. If we invest significant resources in
>>> improved testing using this framework (which I very much expect) then
>>> branches that are not compatible will not benefit, likely reducing their
>>> quality; and the risk of backports will increase, due to divergence.
>>>
>>> Altogether, I think it would be a huge mistake. But if we will be shipping
>>> releases soon that can fix these aforementioned regressions, I won’t
>>> campaign for it.
>>>
>>>
>>>
>>> From: bened...@apache.org <bened...@apache.org>
>>> Date: Tuesday, 13 July 2021 at 13:31
>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>>> No change is without risk; we have introduced serious regressions with bug
>>> fixes to patch releases. The overall risk to the release lifecycle is
>>> reduced significantly in my opinion, as we reduce the likelihood of
>>> introducing regressions, and can use the same test infrastructure across
>>> all of the actively developed releases, increasing our confidence in 4.0.x
>>> releases.
>>>
>>> Furthermore, we introduced a significant performance regression in all
>>> lines of the software by increasing the number of LWT round-trips. Unless
>>> we intend to leave this regression for a further year without _any_ release
>>> offering a solution, we will need suitable verification mechanisms for
>>> whatever fixes we deliver.
>>>
>>> My view is that it is unacceptable to leave such a significant regression
>>> unaddressed in all lines of software we intend to release for the
>>> foreseeable future.
>>>
>>>
>>> From: Paulo Motta <pauloricard...@gmail.com>
>>> Date: Tuesday, 13 July 2021 at 13:21
>>> To: Cassandra DEV <dev@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>>>> No, in my opinion the target should be 4.0.x. We are reaching for a
>>> shippable trunk and this has no public API impacts. This work is IMO
>>> central to achieving a shippable trunk, either way. The only reason I do
>>> not target 3.x is that it would be too burdensome.
>>>
>>> In my limited view of the proposal, a major refactor of internal
>>> concurrency APIs to support the testing facility potentially risks the
>>> stability of a minor release, something we've been wanting to avoid with
>>> our focus on stability. So I'd prefer this to go in  trunk/4.1, otherwise
>>> we will create precedence to including non-bugfix changes in minor
>>> versions, something I think we should avoid.
>>>
>>> In the past we've been lenient to including seemingly harmless internal
>>> changes that caused client impact and we should be careful to avoid this in
>>> the future. To prevent this I think we should take a strict approach and
>>> only accept bug fixes in minor (ie. 4.0.x) versions moving forward.
>>>
>>> I'd go one step further and propose that any CEPs, which are generally
>>> about new features, major API changes or internal refactorings, should only
>>> be allowed in subsequent major versions, unless an explicit exception is
>>> granted.
>>>
>>> Em ter., 13 de jul. de 2021 às 07:11, bened...@apache.org <
>>> bened...@apache.org> escreveu:
>>>
>>>> Perhaps it’s worth looking forward at the roadmap that we plan to
>>> develop,
>>>> and consider whether such a facility would be welcome for proving their
>>>> safety, and we can then worry about evolving the specifics of any API(s)
>>>> together as we deploy the capability? Looking ahead, there are very few
>>>> major features I wouldn’t want to see exercised with this approach, given
>>>> the choice.
>>>>
>>>> The LWT Verifier by itself is an integration test that covers many of the
>>>> affected subsystems, including sstables, memtables and repair. But we
>>> will
>>>> have the ability to introduce dedicated verification for each of these
>>>> features and systems, and we will necessarily produce more robust code
>>>> (repair is a great example of a brittle system that would be impossible
>>> to
>>>> produce with such an adversarial test system)
>>>>
>>>>
>>>> *Query side improvements:*
>>>>
>>>> * Storage Attached Index or SAI. The CEP can be found at
>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
>>>> * Add support for OR predicates in the CQL where clause
>>>> * Allow to aggregate by time intervals (CASSANDRA-11871) and allow UDFs
>>>> in GROUP BY clause
>>>> * Ability to read the TTL and WRITE TIME of an element in a collection
>>>> (CASSANDRA-8877)
>>>> * Multi-Partition LWTs
>>>> * Materialized views hardening: Addressing the different Materialized
>>>> Views issues (see CASSANDRA-15921 and [1] for some of the work involved)
>>>>
>>>> *Security improvements:*
>>>>
>>>> * SSTables encryption (CASSANDRA-9633)
>>>> * Add support for Dynamic Data Masking (CEP pending)
>>>> * Allow the creation of roles that have the ability to assign arbitrary
>>>> privileges, or scoped privileges without also granting those roles access
>>>> to database objects.
>>>> * Filter rows from system and system_schema based on users permissions
>>>> (CASSANDRA-15871)
>>>>
>>>> *Performance improvements:*
>>>>
>>>> * Trie-based index format (CEP pending)
>>>> * Trie-based memtables (CEP pending)
>>>> * Paxos improvements: Paxos / LWT implementation that would enable the
>>>> database to serve serial writes with two round-trips and serial reads
>>> with
>>>> one round-trip in the uncontended case
>>>>
>>>> *Safety/Usability improvements:*
>>>>
>>>> * Guardrails. The CEP can be found at
>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
>>>> * Add ability to track state in repair (CASSANDRA-15399)
>>>> * Repair coordinator improvements (CASSANDRA-15399)
>>>> * Make incremental backup configurable per keyspace and table
>>>> (CASSANDRA-15402)
>>>> * Add ability to blacklist a CQL partition so all requests are ignored
>>>> (CASSANDRA-12106)
>>>> * Add default and required keyspace replication options
>>> (CASSANDRA-14557)
>>>> * Transactional Cluster Metadata: Use of transactions to propagate
>>>> cluster metadata
>>>> * Downgrade-ability: Ability to downgrade to downgrade in the event
>>> that
>>>> a serious issue has been identified
>>>>
>>>> *Pluggability improvements:*
>>>>
>>>> * Pluggable schema manager (CEP pending)
>>>> * Pluggable filesystem (CEP pending)
>>>> * Pluggable authenticator for CQLSH (CASSANDRA-16456). A CEP draft can
>>> be
>>>> found at
>>>>
>>>>
>>> https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit
>>>> * Memtable API (CEP pending). The goal being to allow improvements such
>>>> as CASSANDRA-13981 to be easily plugged into Cassandra
>>>>
>>>> *Memtable pluggable implementation:*
>>>>
>>>> * Enable Cassandra for Persistent Memory (CASSANDRA-13981)
>>>>
>>>>
>>>>
>>>>
>>>> From: bened...@apache.org <bened...@apache.org>
>>>> Date: Tuesday, 13 July 2021 at 10:51
>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>>>> Ach, editing code in the email editor isn’t smart when editors all have
>>>> different meanings for key combinations (accidentally hit send), but you
>>>> get the idea. The simulator would intercept these thread executions, the
>>>> memory accesses for the annotated field, and evaluate them so that in
>>> some
>>>> cases the assertions would fail.
>>>>
>>>> This is obviously a toy example that is not very interesting, but the
>>> main
>>>> real example we have is too complicated to produce a snippet to
>>>> demonstrate. In my view, the long term outcome of this work is likely the
>>>> enablement of many unit tests that are a little more complicated than
>>> this,
>>>> on less obvious code.
>>>>
>>>> But the headline goal of the CEP is not. By itself, the LWT Verifier
>>>> demonstrates the power and utility of the work. I don’t believe it is
>>>> terribly helpful to focus on secondary justifications like the example I
>>>> gave. For me, the _ability_ to prove the correctness of difficult but
>>>> critical systems is justification enough, whether or not we deliver a
>>>> simple API as part of the CEP.
>>>>
>>>>
>>>>
>>>> From: bened...@apache.org <bened...@apache.org>
>>>> Date: Tuesday, 13 July 2021 at 10:43
>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>>>>> Should target release be 4.1. (not 4.0.x) ?
>>>>
>>>>
>>>>
>>>> No, in my opinion the target should be 4.0.x. We are reaching for a
>>>> shippable trunk and this has no public API impacts. This work is IMO
>>>> central to achieving a shippable trunk, either way. The only reason I do
>>>> not target 3.x is that it would be too burdensome.
>>>>
>>>>> My concern is that changing code and tests at the same time risks
>>>> regressions…
>>>>
>>>>
>>>>
>>>> I’ve never heard this position before. Would you care to elaborate? It is
>>>> quite normal for us to update tests alongside changes to the code.
>>>>
>>>>> And seconding Benjamin's comments… some documentation on how to write a
>>>> test, and a simple test example, that this CEP then allows us to write
>>>> would help a lot (a la "working backwards").
>>>>
>>>> 1) This work is to _enable_ the development of tests, with the only test
>>>> originally planned to arrive alongside it the fairly sophisticated LWT
>>>> Verifier. This is something we have sorely needed as a project, as we
>>> have
>>>> had serious correctness violations for multiple years. This broad
>>> category
>>>> of integrated test for verifying correctness is the main goal of the work
>>>> and is not easily condensed into an example snippet.
>>>> 2) It is _possible_ that some simple and fluid APIs will be introduced in
>>>> a later phase of this work, but they haven’t been designed yet, so I
>>> cannot
>>>> share snippets.
>>>>
>>>> In principle, however, you would be able to do something like:
>>>>
>>>> @Nemesis volatile int x = 0;
>>>> int foo() {
>>>>   x = x + 1;
>>>>   return x;
>>>> }
>>>>
>>>> @Test
>>>> void test() {
>>>>   Future<?> f1 = executor.submit(() -> foo());
>>>>   Future<?> f2 = executor.submit(() -> foo());
>>>>   Assert.assertTrue(f1.get() == 1 || f2.get() == 1);
>>>> }
>>>>
>>>>
>>>> From: Mick Semb Wever <m...@apache.org>
>>>> Date: Tuesday, 13 July 2021 at 10:28
>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>>>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>>>>>
>>>>> To achieve this, significant modifications will be required to the
>>>> codebase, mostly cleaning up existing abstractions. Specifically, we will
>>>> need to be able to mock executors, any blocking concurrency primitives,
>>>> time, filesystem access and internode streaming.
>>>>>
>>>>> The work is – in large part – already complete, with JIRA and PRs to
>>>> follow in the coming weeks. Of course, the work is subject to the usual
>>>> community input and review, so this does not preclude changes to the work
>>>> (even significant ones, if they are warranted). I know a lot of incoming
>>>> CEP are likely to be backed up by significant off-list development as a
>>>> result of the focus on a shippable 4.0. Hopefully this is just a
>>> temporary
>>>> growing pain, particularly as we move towards a shippable trunk.
>>>>>
>>>>> I hope this work will be of huge value to the project, particularly as
>>>> we race to catch up on years of limited feature development.
>>>>>
>>>>> JIRA and PRs will follow, but I wanted to kick-off discussion in
>>> advance.
>>>>>
>>>>
>>>>
>>>>
>>>> Should target release be 4.1. (not 4.0.x) ?
>>>>
>>>> I'd be interested in seeing a rough timeline/plan of how the proposed
>>>> changes are to be defined in JIRAs and ordered.
>>>>
>>>> I'd like to hear a bit more about the test plan. Not so much about how
>>>> the CEP itself improves testability of the project, but for example
>>>> the testing required to be in place to introduce the changes of the
>>>> CEP (and if it already exists, where). My concern is that changing
>>>> code and tests at the same time risks regressions…
>>>>
>>>> And seconding Benjamin's comments… some documentation on how to write
>>>> a test, and a simple test example, that this CEP then allows us to
>>>> write would help a lot (a la "working backwards").
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

Reply via email to