Re: What branches should perf fixes be targeting

Dmitry Konstantinov Thu, 23 Jan 2025 06:45:23 -0800

>> That is ... 6 branches at once. We were there, 3.0, 3.11, 4.0, 4.1, 5.0,
trunk. If there was a bug in 3.0, because we were supporting that, we had
to put this into 6 branches
My idea is not to increase the number of support branches (it is
definitely not what I want to, I am more a fan of release-ready trunk-based
development with a faster feedback loop, but it is not always applicable).
The option was about releasing non-long term support minor versions: like
JDK release JDK 9/10 as short term and then JDK11 as long term, then 12/13
as short term and so on.
So, in the case of Cassandra for example, we now have 5.0.x as a long term
support version with a branch, we can release 5.1/5.2 from trunk (without
any new support branches for them) and then 5.3 as a long term again with a
bug fix branch. The overhead here is only for the more frequent release
(like once per 3 or 6 months), there is no overhead for branches/merges.



On Thu, 23 Jan 2025 at 14:31, Štefan Miklošovič <[email protected]>
wrote:

>
>
> On Thu, Jan 23, 2025 at 3:20 PM Dmitry Konstantinov <[email protected]>
> wrote:
>
>> Hi Stefan,
>>
>> Thank you a lot for the detailed feedback! Few comments:
>>
>> >> I think this is already the case, more or less. We are not doing perf
>> changes in older branches.
>> Yes, I understand the idea about stability of older branches, the primary
>> issue for me is that if I contribute even a small improvement to trunk - I
>> cannot really use it for a long time (except having it in my own
>> fork), because there is no release to get it back for me or anybody else..
>>
>> >> Maybe it would be better to make the upgrading process as smooth as
>> possible so respective businesses are open to upgrade their clusters in a
>> more frequent manner.
>> About the upgrade process: my personal experience (3.0.x -> 3.11.x ->
>> 4.0.x -> 4.1.x), the upgrade in Cassandra is positive (I suppose the
>> autotests which test it are really helpful), I have not experienced any
>> serious issues with it. I suppose the majority of time when people have an
>> issue with upgrades is due to delaying them for too long and staying on
>> very old unsupported versions till the last moment.
>>
>> >>  Cassandra is not JDK. We need to fix bugs in older branches we said
>> we support
>> Regarding the necessity to support the older branches it is the same
>> story for JDK: they now support and fix bugs in JDK8, JDK11, JDK17 and JDK
>> 21 as LTS versions and JDK23 as the latest release while developing and
>> releasing JDK24 now.
>>
>
> That is ... 6 branches at once. We were there, 3.0, 3.11, 4.0, 4.1, 5.0,
> trunk. If there was a bug in 3.0, because we were supporting that, we had
> to put this into 6 branches. That means 6 builds in CI. Each CI takes a
> couple hours ... If there is something wrong or the patch is changed we
> need to rebuild. So what looks like "just merge up from 3.0 and that's it"
> becomes a multi-day odyssey somebody needs to invest resources into. As we
> dropped 3.0 and 3.11 and we took care of 4.0+ that is better but still not
> fun when done "at scale".
>
>
>> Another example, Postgres does a major release every year:
>> https://www.postgresql.org/support/versioning/ and supports the last 5
>> major versions.
>>
>
> Yeah, but they have most probably way more man-power as well etc ...
>
>
>>
>> >> please keep in mind that there are people behind the releases who are
>> spending time on that.
>> Yes, as I already mentioned, I really thank you to Brandon and Mick for
>> doing it! It is hard, exhausting and not the most exciting work to do.
>> Please contact me if I can help somehow with it, like checking and fixing
>> CI test failures(I've already done it for a while) / doing some scripting/
>> etc.
>> I have a hypothesis (maybe I am completely wrong here) that actually the
>> low interest in the releasing process is somehow related to having a
>> Cassandra fork by many contributors, so there is no big demand for regular
>> mainline releases if you have them in a fork..
>>
>> Regards,
>> Dmitry
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, 23 Jan 2025 at 12:30, Štefan Miklošovič <[email protected]>
>> wrote:
>>
>>> I think the current guidelines are sensible.
>>>
>>> Going through your suggestions:
>>>
>>> 1) I think this is already the case, more or less. We are not doing perf
>>> changes in older branches. This is what we see in CASSANDRA-19429, a user
>>> reported that it is a performance improvement, and most probably he is
>>> right, but I am hesitant to refactor / introduce changes into older
>>> branches.
>>>
>>> Cassandra has a lot of inertia, we can not mess with what works even
>>> performance improvements are appealing. Maybe it would be better to make
>>> the upgrading process as smooth as possible so respective businesses are
>>> open to upgrade their clusters in a more frequent manner.
>>>
>>> 2) Well, but Cassandra is not JDK. We need to fix bugs in older branches
>>> we said we support. This is again related to inertia Cassandra has as a
>>> database. Bug fixes are always welcome, especially if there is 0 risk
>>> deploying it.
>>>
>>> What particularly resonates with me is your wording "more frequent and
>>> predictable". Well ... I understand it would be the most ideal outcome, but
>>> please keep in mind that there are people behind the releases who are
>>> spending time on that. I have been following this project for a couple
>>> years and the only people who are taking care of releases are Brandon and
>>> Mick. I was helping here and there to at least stage it and I am willing to
>>> continue to do so, but that is basically it. "two and a half" people are
>>> doing releases. For all these years.
>>>
>>> So if you ask for more frequent releases, that is something which is
>>> going to directly affect respective people involved in them. I guess they
>>> are doing it basically out of courtesy and it would be great to see more
>>> PMCs involved in release processes. As of now, it looks like everybody just
>>> assumes that "it will be somehow released" and "releases just happen" but
>>> that is not the case. Releases are not "just happening". There are people
>>> behind them who need to plan when it is going to happen and they need to
>>> find time for that etc. There are a lot of things not visible behind the
>>> scenes and doing releases is a job in itself.
>>>
>>> So if we ask for more frequent releases, it is a good question to ask
>>> who would be actually releasing that.
>>>
>>> On Wed, Jan 22, 2025 at 12:17 PM Dmitry Konstantinov <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am one of the contributors for the recent perf changes, like:
>>>> https://issues.apache.org/jira/browse/CASSANDRA-20165
>>>> https://issues.apache.org/jira/browse/CASSANDRA-20226
>>>> https://issues.apache.org/jira/browse/CASSANDRA-19557
>>>> ...
>>>>
>>>> My motivation: I am currently using 4.1.x and planning to adopt 5.0.x
>>>> in the next quarter. Of course, I want to have it in the best possible
>>>> share from performance point of view, performance is one of important
>>>> selling points for upgrades. In general, performance is one of key reasons
>>>> why people select NoSQL and Cassandra particularly, so any improvement here
>>>> should be appreciated by users, especially in the current cloud-oriented
>>>> world where every such improvement is a potential cost saving.
>>>>
>>>> For me the question is tightly related to the release scheduling. We
>>>> have periodic and quite frequent patch releases now, thank you a lot to the
>>>> people who spend their time to do it. When we speak about minor releases -
>>>> it looks like the release process is much slower and not so predictable, it
>>>> can be a year or even more before I can get any minor release which
>>>> includes a change, and nobody can say even a preliminary date for it.
>>>> As a result when I have a performance patch and it is suggested to
>>>> merge only to trunk I will not get the improvement back to use for a long
>>>> time.
>>>> So, I have 2 options in this case:
>>>> 1) relax and wait (potentially losing an interest due to a delayed
>>>> feedback)
>>>> 2) keep my own private fork to accumulate such changes with
>>>> correspondent overheads (what I am actually do now)
>>>>
>>>> As a guy who supports Cassandra in production for systems with 99.999
>>>> availability requirements, of course I am curious about stability too, but
>>>> I think we need some balance here and we should rely more on things like
>>>> test coverage and different policies for different branches to not stagnate
>>>> due to fear of any change. I am not saying about massive breaking changes,
>>>> especially which modify (even in a compatible way) network communication
>>>> protocols or disk data formats, it should be a separate individual
>>>> discussion for them.
>>>>
>>>> The situation reminds me of the story of JDK prior to Java 9. There
>>>> were also some big bang minor releases (1.5/1.6/1.7/1.8) which we waited
>>>> for a very long time and Java was evolving very slowly. Now we have a model
>>>> where a new release is available every 1/2 year and some of them are
>>>> supported as long term. So, the people who prefer stability select and use
>>>> LTS versions, the people who want to get access to new
>>>> features/improvements can take the latest release, all are happy. Similar
>>>> models like stable/latest releases are available for other products.
>>>>
>>>> So, my suggestion is one of the following options:
>>>> 1) Classify the current release branches as more and less stable, like:
>>>> -- 4.0.x/4.1.x - avoid perf changes unless it is really a bug-like
>>>> -- 5.0.x - more relaxed rules
>>>>
>>>> 2) Do something similar to JDK with LTS versions: make minor releases
>>>> for the latest major version (like: 5.1/5.2) more frequent and predictable,
>>>> like a train release, do not create a fix branch for every one,
>>>> periodically for some selected minor versions establish fix branches and
>>>> release patch versions for them.
>>>>
>>>> Thank you,
>>>> Dmitry
>>>>
>>>>
>>>> On Wed, 22 Jan 2025 at 09:02, Jeff Jirsa <[email protected]> wrote:
>>>>
>>>>> I think the status quo is fine - perf goes to trunk, if you think
>>>>> something is special, it goes to the mailing list to justify exceptions
>>>>>
>>>>>
>>>>> On Jan 22, 2025, at 3:36 AM, Jordan West <[email protected]> wrote:
>>>>>
>>>>> 
>>>>> Thanks for the initial feedback. I hear a couple different themes /
>>>>> POVs.
>>>>>
>>>>> David/Paulo, it sounds like maybe a guide for perf backports + mailing
>>>>> list consensus when necessary + clear documentation of this could be a way
>>>>> forward. I agree that each change comes with stability risks but at the
>>>>> same time the greatest stability risk with Cassandra historically has been
>>>>> major version upgrades (although we have made great improvements here). 
>>>>> For
>>>>> folks who want only the performance improvements, we are asking them to
>>>>> take greater risk by upgrading a major version or to maintain a fork. The
>>>>> fork is reasonable for some of the larger operators but not others. That
>>>>> said, I do agree we need to use judgement. Not all changes are worth
>>>>> backporting and some may incur too much risk. We could also add to the
>>>>> guide suggestions of how to de-risk a change (e.g. code is isolated, 
>>>>> config
>>>>> to turn it off / off by default, etc).
>>>>>
>>>>> Jeff, I agree 1% wins aren't worth it if they are invasive and in
>>>>> risky areas. Not all of the improvements are that minor.
>>>>>
>>>>> Jordan
>>>>>
>>>>> On Tue, Jan 21, 2025 at 1:57 PM Jeff Jirsa <[email protected]> wrote:
>>>>>
>>>>>> We expect users to treat patch and minor releases as low risk.
>>>>>> Changing something deep in the storage engine to be 1% faster is not 
>>>>>> worth
>>>>>> the risk, because most users will skip the type of qualification that 
>>>>>> finds
>>>>>> those one in a billion regressions.
>>>>>>
>>>>>> Patch releases are for bug fixes not perf improvements.
>>>>>>
>>>>>>
>>>>>> On Jan 21, 2025, at 9:10 PM, Jordan West <[email protected]> wrote:
>>>>>>
>>>>>> 
>>>>>> Hi folks,
>>>>>>
>>>>>> A topic that’s come up recently is what branches are valid targets
>>>>>> for performance improvements. Should they only go into trunk? This has 
>>>>>> come
>>>>>> up in the context of BTI improvements, Dmitry’s work on reducing object
>>>>>> overhead, and my work on CASSANDRA-15452.
>>>>>>
>>>>>> We currently have guidelines published:
>>>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199530302#Patching,versioning,andLTSreleases-Wheretoapplypatches.
>>>>>> But there’s no explicit discussion of how to handle performance
>>>>>> improvements. We tend to discuss whether they’re “bugfixes”.
>>>>>>
>>>>>> I’d like to discuss whether performance improvements should target
>>>>>> more than just trunk. I believe they should target every active branch
>>>>>> because performance is a major selling point of Cassandra. It’s not
>>>>>> practical to ask users to upgrade major versions for simple performance
>>>>>> wins. A major version can be deployed for years, especially when the next
>>>>>> one has major changes. But we shouldn’t target non-supported major
>>>>>> versions, either. Also, there will be exceptions: patches that are too
>>>>>> large, invasive, risky, or complicated to backport. For these, we rely on
>>>>>> the contributor and reviewer’s judgment and the mailing list. So, I’m
>>>>>> proposing an allowance to backport to active branches, not a requirement 
>>>>>> to
>>>>>> merge them.
>>>>>>
>>>>>> I’m curious to hear your thoughts.
>>>>>> Jordan
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Dmitry Konstantinov
>>>>
>>>
>>
>> --
>> Dmitry Konstantinov
>>
>

-- 
Dmitry Konstantinov

Re: What branches should perf fixes be targeting

Reply via email to