Re: What branches should perf fixes be targeting

2025-01-23 Thread Štefan Miklošovič
I think the current guidelines are sensible.

Going through your suggestions:

1) I think this is already the case, more or less. We are not doing perf
changes in older branches. This is what we see in CASSANDRA-19429, a user
reported that it is a performance improvement, and most probably he is
right, but I am hesitant to refactor / introduce changes into older
branches.

Cassandra has a lot of inertia, we can not mess with what works even
performance improvements are appealing. Maybe it would be better to make
the upgrading process as smooth as possible so respective businesses are
open to upgrade their clusters in a more frequent manner.

2) Well, but Cassandra is not JDK. We need to fix bugs in older branches we
said we support. This is again related to inertia Cassandra has as a
database. Bug fixes are always welcome, especially if there is 0 risk
deploying it.

What particularly resonates with me is your wording "more frequent and
predictable". Well ... I understand it would be the most ideal outcome, but
please keep in mind that there are people behind the releases who are
spending time on that. I have been following this project for a couple
years and the only people who are taking care of releases are Brandon and
Mick. I was helping here and there to at least stage it and I am willing to
continue to do so, but that is basically it. "two and a half" people are
doing releases. For all these years.

So if you ask for more frequent releases, that is something which is going
to directly affect respective people involved in them. I guess they are
doing it basically out of courtesy and it would be great to see more PMCs
involved in release processes. As of now, it looks like everybody just
assumes that "it will be somehow released" and "releases just happen" but
that is not the case. Releases are not "just happening". There are people
behind them who need to plan when it is going to happen and they need to
find time for that etc. There are a lot of things not visible behind the
scenes and doing releases is a job in itself.

So if we ask for more frequent releases, it is a good question to ask who
would be actually releasing that.

On Wed, Jan 22, 2025 at 12:17 PM Dmitry Konstantinov 
wrote:

> Hi all,
>
> I am one of the contributors for the recent perf changes, like:
> https://issues.apache.org/jira/browse/CASSANDRA-20165
> https://issues.apache.org/jira/browse/CASSANDRA-20226
> https://issues.apache.org/jira/browse/CASSANDRA-19557
> ...
>
> My motivation: I am currently using 4.1.x and planning to adopt 5.0.x in
> the next quarter. Of course, I want to have it in the best possible share
> from performance point of view, performance is one of important selling
> points for upgrades. In general, performance is one of key reasons why
> people select NoSQL and Cassandra particularly, so any improvement here
> should be appreciated by users, especially in the current cloud-oriented
> world where every such improvement is a potential cost saving.
>
> For me the question is tightly related to the release scheduling. We have
> periodic and quite frequent patch releases now, thank you a lot to the
> people who spend their time to do it. When we speak about minor releases -
> it looks like the release process is much slower and not so predictable, it
> can be a year or even more before I can get any minor release which
> includes a change, and nobody can say even a preliminary date for it.
> As a result when I have a performance patch and it is suggested to merge
> only to trunk I will not get the improvement back to use for a long time.
> So, I have 2 options in this case:
> 1) relax and wait (potentially losing an interest due to a delayed
> feedback)
> 2) keep my own private fork to accumulate such changes with correspondent
> overheads (what I am actually do now)
>
> As a guy who supports Cassandra in production for systems with 99.999
> availability requirements, of course I am curious about stability too, but
> I think we need some balance here and we should rely more on things like
> test coverage and different policies for different branches to not stagnate
> due to fear of any change. I am not saying about massive breaking changes,
> especially which modify (even in a compatible way) network communication
> protocols or disk data formats, it should be a separate individual
> discussion for them.
>
> The situation reminds me of the story of JDK prior to Java 9. There were
> also some big bang minor releases (1.5/1.6/1.7/1.8) which we waited for a
> very long time and Java was evolving very slowly. Now we have a model where
> a new release is available every 1/2 year and some of them are supported as
> long term. So, the people who prefer stability select and use LTS versions,
> the people who want to get access to new features/improvements can take the
> latest release, all are happy. Similar models like stable/latest releases
> are available for other products.
>
> So, my suggestion is one of the follow

Re: [VOTE] Release Apache Cassandra Java Driver 3.12.1

2025-01-23 Thread Josh McKenzie
+1

On Thu, Jan 23, 2025, at 9:58 AM, Štefan Miklošovič wrote:
> +1
> 
> On Sat, Jan 18, 2025 at 10:54 PM Bret McGuire  wrote:
>> Greetings all!
>> 
>> 
>>I’m proposing the Cassandra Java Driver 3.12.1 for release.
>> 
>> 
>> sha1: 873e6f764a499bd9c5a42cafa53dc77184711eea
>> 
>> git: https://github.com/apache/cassandra-java-driver/tree/3.12.1
>> 
>> Maven Artifacts: 
>> https://repository.apache.org/content/repositories/orgapachecassandra-1355
>> 
>> 
>>The Source release is available here:
>> 
>> https://dist.apache.org/repos/dist/dev/cassandra/cassandra-java-driver/3.12.1/
>> 
>> 
>>This is the first release of new functionality for the 3.x Java driver 
>> since its donation.  Our recent 3.12.0 release was intended to provide an 
>> ASF-branded baseline for the 3.x Java driver while this release is intended 
>> to get any changes that might have been waiting in the 3.x branch out into 
>> the wild.  The full changelog can be found at 
>> https://github.com/apache/cassandra-java-driver/tree/3.12.1/changelog#3121
>> 
>> 
>>The vote will be open for 120 hours (longer if needed) due to the 
>> upcoming holiday weekend. Everyone who has tested the build is invited to 
>> vote. Votes by PMC members are considered binding. A vote passes if there 
>> are at least three binding +1s and no -1's.
>> 
>> 
>>Thanks!
>> 

Re: What branches should perf fixes be targeting

2025-01-23 Thread Dmitry Konstantinov
Hi Stefan,

Thank you a lot for the detailed feedback! Few comments:

>> I think this is already the case, more or less. We are not doing perf
changes in older branches.
Yes, I understand the idea about stability of older branches, the primary
issue for me is that if I contribute even a small improvement to trunk - I
cannot really use it for a long time (except having it in my own
fork), because there is no release to get it back for me or anybody else..

>> Maybe it would be better to make the upgrading process as smooth as
possible so respective businesses are open to upgrade their clusters in a
more frequent manner.
About the upgrade process: my personal experience (3.0.x -> 3.11.x -> 4.0.x
-> 4.1.x), the upgrade in Cassandra is positive (I suppose the autotests
which test it are really helpful), I have not experienced any serious
issues with it. I suppose the majority of time when people have an
issue with upgrades is due to delaying them for too long and staying on
very old unsupported versions till the last moment.

>>  Cassandra is not JDK. We need to fix bugs in older branches we said we
support
Regarding the necessity to support the older branches it is the same story
for JDK: they now support and fix bugs in JDK8, JDK11, JDK17 and JDK 21 as
LTS versions and JDK23 as the latest release while developing and releasing
JDK24 now.
Another example, Postgres does a major release every year:
https://www.postgresql.org/support/versioning/ and supports the last 5
major versions.

>> please keep in mind that there are people behind the releases who are
spending time on that.
Yes, as I already mentioned, I really thank you to Brandon and Mick for
doing it! It is hard, exhausting and not the most exciting work to do.
Please contact me if I can help somehow with it, like checking and fixing
CI test failures(I've already done it for a while) / doing some scripting/
etc.
I have a hypothesis (maybe I am completely wrong here) that actually the
low interest in the releasing process is somehow related to having a
Cassandra fork by many contributors, so there is no big demand for regular
mainline releases if you have them in a fork..

Regards,
Dmitry









On Thu, 23 Jan 2025 at 12:30, Štefan Miklošovič 
wrote:

> I think the current guidelines are sensible.
>
> Going through your suggestions:
>
> 1) I think this is already the case, more or less. We are not doing perf
> changes in older branches. This is what we see in CASSANDRA-19429, a user
> reported that it is a performance improvement, and most probably he is
> right, but I am hesitant to refactor / introduce changes into older
> branches.
>
> Cassandra has a lot of inertia, we can not mess with what works even
> performance improvements are appealing. Maybe it would be better to make
> the upgrading process as smooth as possible so respective businesses are
> open to upgrade their clusters in a more frequent manner.
>
> 2) Well, but Cassandra is not JDK. We need to fix bugs in older branches
> we said we support. This is again related to inertia Cassandra has as a
> database. Bug fixes are always welcome, especially if there is 0 risk
> deploying it.
>
> What particularly resonates with me is your wording "more frequent and
> predictable". Well ... I understand it would be the most ideal outcome, but
> please keep in mind that there are people behind the releases who are
> spending time on that. I have been following this project for a couple
> years and the only people who are taking care of releases are Brandon and
> Mick. I was helping here and there to at least stage it and I am willing to
> continue to do so, but that is basically it. "two and a half" people are
> doing releases. For all these years.
>
> So if you ask for more frequent releases, that is something which is going
> to directly affect respective people involved in them. I guess they are
> doing it basically out of courtesy and it would be great to see more PMCs
> involved in release processes. As of now, it looks like everybody just
> assumes that "it will be somehow released" and "releases just happen" but
> that is not the case. Releases are not "just happening". There are people
> behind them who need to plan when it is going to happen and they need to
> find time for that etc. There are a lot of things not visible behind the
> scenes and doing releases is a job in itself.
>
> So if we ask for more frequent releases, it is a good question to ask who
> would be actually releasing that.
>
> On Wed, Jan 22, 2025 at 12:17 PM Dmitry Konstantinov 
> wrote:
>
>> Hi all,
>>
>> I am one of the contributors for the recent perf changes, like:
>> https://issues.apache.org/jira/browse/CASSANDRA-20165
>> https://issues.apache.org/jira/browse/CASSANDRA-20226
>> https://issues.apache.org/jira/browse/CASSANDRA-19557
>> ...
>>
>> My motivation: I am currently using 4.1.x and planning to adopt 5.0.x in
>> the next quarter. Of course, I want to have it in the best possible share
>> from performance point

Re: What branches should perf fixes be targeting

2025-01-23 Thread Dmitry Konstantinov
>> That is ... 6 branches at once. We were there, 3.0, 3.11, 4.0, 4.1, 5.0,
trunk. If there was a bug in 3.0, because we were supporting that, we had
to put this into 6 branches
My idea is not to increase the number of support branches (it is
definitely not what I want to, I am more a fan of release-ready trunk-based
development with a faster feedback loop, but it is not always applicable).
The option was about releasing non-long term support minor versions: like
JDK release JDK 9/10 as short term and then JDK11 as long term, then 12/13
as short term and so on.
So, in the case of Cassandra for example, we now have 5.0.x as a long term
support version with a branch, we can release 5.1/5.2 from trunk (without
any new support branches for them) and then 5.3 as a long term again with a
bug fix branch. The overhead here is only for the more frequent release
(like once per 3 or 6 months), there is no overhead for branches/merges.


On Thu, 23 Jan 2025 at 14:31, Štefan Miklošovič 
wrote:

>
>
> On Thu, Jan 23, 2025 at 3:20 PM Dmitry Konstantinov 
> wrote:
>
>> Hi Stefan,
>>
>> Thank you a lot for the detailed feedback! Few comments:
>>
>> >> I think this is already the case, more or less. We are not doing perf
>> changes in older branches.
>> Yes, I understand the idea about stability of older branches, the primary
>> issue for me is that if I contribute even a small improvement to trunk - I
>> cannot really use it for a long time (except having it in my own
>> fork), because there is no release to get it back for me or anybody else..
>>
>> >> Maybe it would be better to make the upgrading process as smooth as
>> possible so respective businesses are open to upgrade their clusters in a
>> more frequent manner.
>> About the upgrade process: my personal experience (3.0.x -> 3.11.x ->
>> 4.0.x -> 4.1.x), the upgrade in Cassandra is positive (I suppose the
>> autotests which test it are really helpful), I have not experienced any
>> serious issues with it. I suppose the majority of time when people have an
>> issue with upgrades is due to delaying them for too long and staying on
>> very old unsupported versions till the last moment.
>>
>> >>  Cassandra is not JDK. We need to fix bugs in older branches we said
>> we support
>> Regarding the necessity to support the older branches it is the same
>> story for JDK: they now support and fix bugs in JDK8, JDK11, JDK17 and JDK
>> 21 as LTS versions and JDK23 as the latest release while developing and
>> releasing JDK24 now.
>>
>
> That is ... 6 branches at once. We were there, 3.0, 3.11, 4.0, 4.1, 5.0,
> trunk. If there was a bug in 3.0, because we were supporting that, we had
> to put this into 6 branches. That means 6 builds in CI. Each CI takes a
> couple hours ... If there is something wrong or the patch is changed we
> need to rebuild. So what looks like "just merge up from 3.0 and that's it"
> becomes a multi-day odyssey somebody needs to invest resources into. As we
> dropped 3.0 and 3.11 and we took care of 4.0+ that is better but still not
> fun when done "at scale".
>
>
>> Another example, Postgres does a major release every year:
>> https://www.postgresql.org/support/versioning/ and supports the last 5
>> major versions.
>>
>
> Yeah, but they have most probably way more man-power as well etc ...
>
>
>>
>> >> please keep in mind that there are people behind the releases who are
>> spending time on that.
>> Yes, as I already mentioned, I really thank you to Brandon and Mick for
>> doing it! It is hard, exhausting and not the most exciting work to do.
>> Please contact me if I can help somehow with it, like checking and fixing
>> CI test failures(I've already done it for a while) / doing some scripting/
>> etc.
>> I have a hypothesis (maybe I am completely wrong here) that actually the
>> low interest in the releasing process is somehow related to having a
>> Cassandra fork by many contributors, so there is no big demand for regular
>> mainline releases if you have them in a fork..
>>
>> Regards,
>> Dmitry
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, 23 Jan 2025 at 12:30, Štefan Miklošovič 
>> wrote:
>>
>>> I think the current guidelines are sensible.
>>>
>>> Going through your suggestions:
>>>
>>> 1) I think this is already the case, more or less. We are not doing perf
>>> changes in older branches. This is what we see in CASSANDRA-19429, a user
>>> reported that it is a performance improvement, and most probably he is
>>> right, but I am hesitant to refactor / introduce changes into older
>>> branches.
>>>
>>> Cassandra has a lot of inertia, we can not mess with what works even
>>> performance improvements are appealing. Maybe it would be better to make
>>> the upgrading process as smooth as possible so respective businesses are
>>> open to upgrade their clusters in a more frequent manner.
>>>
>>> 2) Well, but Cassandra is not JDK. We need to fix bugs in older branches
>>> we said we support. This is again related to inertia Cassandra has as a
>>> database. Bug fixes ar

Re: What branches should perf fixes be targeting

2025-01-23 Thread Štefan Miklošovič
On Thu, Jan 23, 2025 at 3:20 PM Dmitry Konstantinov 
wrote:

> Hi Stefan,
>
> Thank you a lot for the detailed feedback! Few comments:
>
> >> I think this is already the case, more or less. We are not doing perf
> changes in older branches.
> Yes, I understand the idea about stability of older branches, the primary
> issue for me is that if I contribute even a small improvement to trunk - I
> cannot really use it for a long time (except having it in my own
> fork), because there is no release to get it back for me or anybody else..
>
> >> Maybe it would be better to make the upgrading process as smooth as
> possible so respective businesses are open to upgrade their clusters in a
> more frequent manner.
> About the upgrade process: my personal experience (3.0.x -> 3.11.x ->
> 4.0.x -> 4.1.x), the upgrade in Cassandra is positive (I suppose the
> autotests which test it are really helpful), I have not experienced any
> serious issues with it. I suppose the majority of time when people have an
> issue with upgrades is due to delaying them for too long and staying on
> very old unsupported versions till the last moment.
>
> >>  Cassandra is not JDK. We need to fix bugs in older branches we said we
> support
> Regarding the necessity to support the older branches it is the same story
> for JDK: they now support and fix bugs in JDK8, JDK11, JDK17 and JDK 21 as
> LTS versions and JDK23 as the latest release while developing and releasing
> JDK24 now.
>

That is ... 6 branches at once. We were there, 3.0, 3.11, 4.0, 4.1, 5.0,
trunk. If there was a bug in 3.0, because we were supporting that, we had
to put this into 6 branches. That means 6 builds in CI. Each CI takes a
couple hours ... If there is something wrong or the patch is changed we
need to rebuild. So what looks like "just merge up from 3.0 and that's it"
becomes a multi-day odyssey somebody needs to invest resources into. As we
dropped 3.0 and 3.11 and we took care of 4.0+ that is better but still not
fun when done "at scale".


> Another example, Postgres does a major release every year:
> https://www.postgresql.org/support/versioning/ and supports the last 5
> major versions.
>

Yeah, but they have most probably way more man-power as well etc ...


>
> >> please keep in mind that there are people behind the releases who are
> spending time on that.
> Yes, as I already mentioned, I really thank you to Brandon and Mick for
> doing it! It is hard, exhausting and not the most exciting work to do.
> Please contact me if I can help somehow with it, like checking and fixing
> CI test failures(I've already done it for a while) / doing some scripting/
> etc.
> I have a hypothesis (maybe I am completely wrong here) that actually the
> low interest in the releasing process is somehow related to having a
> Cassandra fork by many contributors, so there is no big demand for regular
> mainline releases if you have them in a fork..
>
> Regards,
> Dmitry
>
>
>
>
>
>
>
>
>
> On Thu, 23 Jan 2025 at 12:30, Štefan Miklošovič 
> wrote:
>
>> I think the current guidelines are sensible.
>>
>> Going through your suggestions:
>>
>> 1) I think this is already the case, more or less. We are not doing perf
>> changes in older branches. This is what we see in CASSANDRA-19429, a user
>> reported that it is a performance improvement, and most probably he is
>> right, but I am hesitant to refactor / introduce changes into older
>> branches.
>>
>> Cassandra has a lot of inertia, we can not mess with what works even
>> performance improvements are appealing. Maybe it would be better to make
>> the upgrading process as smooth as possible so respective businesses are
>> open to upgrade their clusters in a more frequent manner.
>>
>> 2) Well, but Cassandra is not JDK. We need to fix bugs in older branches
>> we said we support. This is again related to inertia Cassandra has as a
>> database. Bug fixes are always welcome, especially if there is 0 risk
>> deploying it.
>>
>> What particularly resonates with me is your wording "more frequent and
>> predictable". Well ... I understand it would be the most ideal outcome, but
>> please keep in mind that there are people behind the releases who are
>> spending time on that. I have been following this project for a couple
>> years and the only people who are taking care of releases are Brandon and
>> Mick. I was helping here and there to at least stage it and I am willing to
>> continue to do so, but that is basically it. "two and a half" people are
>> doing releases. For all these years.
>>
>> So if you ask for more frequent releases, that is something which is
>> going to directly affect respective people involved in them. I guess they
>> are doing it basically out of courtesy and it would be great to see more
>> PMCs involved in release processes. As of now, it looks like everybody just
>> assumes that "it will be somehow released" and "releases just happen" but
>> that is not the case. Releases are not "just happening". There are people
>> beh

Re: [VOTE] Release Apache Cassandra Java Driver 3.12.1

2025-01-23 Thread Štefan Miklošovič
+1

On Sat, Jan 18, 2025 at 10:54 PM Bret McGuire 
wrote:

> Greetings all!
>
>I’m proposing the Cassandra Java Driver 3.12.1 for release.
>
> sha1: 873e6f764a499bd9c5a42cafa53dc77184711eea
>
> git: https://github.com/apache/cassandra-java-driver/tree/3.12.1
>
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1355
>
>The Source release is available here:
>
>
> https://dist.apache.org/repos/dist/dev/cassandra/cassandra-java-driver/3.12.1/
>
>This is the first release of new functionality for the 3.x Java driver
> since its donation.  Our recent 3.12.0 release was intended to provide an
> ASF-branded baseline for the 3.x Java driver while this release is intended
> to get any changes that might have been waiting in the 3.x branch out into
> the wild.  The full changelog can be found at
> https://github.com/apache/cassandra-java-driver/tree/3.12.1/changelog#3121
>
>The vote will be open for 120 hours (longer if needed) due to the
> upcoming holiday weekend. Everyone who has tested the build is invited to
> vote. Votes by PMC members are considered binding. A vote passes if there
> are at least three binding +1s and no -1's.
>
>Thanks!
>


Re: What branches should perf fixes be targeting

2025-01-23 Thread Josh McKenzie
> Of note, it's been 13 months since 5.0 GA. :)
On a scale of 1-10, I'm a 10 out of 10 for being wrong here. It's been 13 
months *since we initially intended to release 5.0*. Stabilization of CI and 
some bugs took us to mid 2024. So it's not as bad as all that. Thanks to those 
that pointed this out to me; brain derped.

So keeping things constrained to this thread: I think "bugfix only to 
non-trunk, ML for consensus otherwise" is a very workable solution. We can 
augment our wiki to reflect that since it's not here 

 yet, assuming consensus on the thread here.

On Thu, Jan 23, 2025, at 9:45 AM, Dmitry Konstantinov wrote:
> >> That is ... 6 branches at once. We were there, 3.0, 3.11, 4.0, 4.1, 5.0, 
> >> trunk. If there was a bug in 3.0, because we were supporting that, we had 
> >> to put this into 6 branches
> My idea is not to increase the number of support branches (it is definitely 
> not what I want to, I am more a fan of release-ready trunk-based development 
> with a faster feedback loop, but it is not always applicable).
> The option was about releasing non-long term support minor versions: like JDK 
> release JDK 9/10 as short term and then JDK11 as long term, then 12/13 as 
> short term and so on.
> So, in the case of Cassandra for example, we now have 5.0.x as a long term 
> support version with a branch, we can release 5.1/5.2 from trunk (without any 
> new support branches for them) and then 5.3 as a long term again with a bug 
> fix branch. The overhead here is only for the more frequent release (like 
> once per 3 or 6 months), there is no overhead for branches/merges.
> 
> 
> On Thu, 23 Jan 2025 at 14:31, Štefan Miklošovič  
> wrote:
>> 
>> 
>> On Thu, Jan 23, 2025 at 3:20 PM Dmitry Konstantinov  
>> wrote:
>>> Hi Stefan,
>>> 
>>> Thank you a lot for the detailed feedback! Few comments:
>>> 
>>> >> I think this is already the case, more or less. We are not doing perf 
>>> >> changes in older branches.
>>> Yes, I understand the idea about stability of older branches, the primary 
>>> issue for me is that if I contribute even a small improvement to trunk - I 
>>> cannot really use it for a long time (except having it in my own fork), 
>>> because there is no release to get it back for me or anybody else..
>>> 
>>> >> Maybe it would be better to make the upgrading process as smooth as 
>>> >> possible so respective businesses are open to upgrade their clusters in 
>>> >> a more frequent manner.
>>> About the upgrade process: my personal experience (3.0.x -> 3.11.x -> 4.0.x 
>>> -> 4.1.x), the upgrade in Cassandra is positive (I suppose the autotests 
>>> which test it are really helpful), I have not experienced any serious 
>>> issues with it. I suppose the majority of time when people have an issue 
>>> with upgrades is due to delaying them for too long and staying on very old 
>>> unsupported versions till the last moment.
>>> 
>>> >>  Cassandra is not JDK. We need to fix bugs in older branches we said we 
>>> >> support
>>> Regarding the necessity to support the older branches it is the same story 
>>> for JDK: they now support and fix bugs in JDK8, JDK11, JDK17 and JDK 21 as 
>>> LTS versions and JDK23 as the latest release while developing and releasing 
>>> JDK24 now.
>> 
>> That is ... 6 branches at once. We were there, 3.0, 3.11, 4.0, 4.1, 5.0, 
>> trunk. If there was a bug in 3.0, because we were supporting that, we had to 
>> put this into 6 branches. That means 6 builds in CI. Each CI takes a couple 
>> hours ... If there is something wrong or the patch is changed we need to 
>> rebuild. So what looks like "just merge up from 3.0 and that's it" becomes a 
>> multi-day odyssey somebody needs to invest resources into. As we dropped 3.0 
>> and 3.11 and we took care of 4.0+ that is better but still not fun when done 
>> "at scale". 
>>  
>>> Another example, Postgres does a major release every year: 
>>> https://www.postgresql.org/support/versioning/ and supports the last 5 
>>> major versions.
>> 
>> Yeah, but they have most probably way more man-power as well etc ... 
>>  
>>> 
>>> >> please keep in mind that there are people behind the releases who are 
>>> >> spending time on that.
>>> Yes, as I already mentioned, I really thank you to Brandon and Mick for 
>>> doing it! It is hard, exhausting and not the most exciting work to do. 
>>> Please contact me if I can help somehow with it, like checking and fixing 
>>> CI test failures(I've already done it for a while) / doing some scripting/ 
>>> etc.
>>> I have a hypothesis (maybe I am completely wrong here) that actually the 
>>> low interest in the releasing process is somehow related to having a 
>>> Cassandra fork by many contributors, so there is no big demand for regular 
>>> mainline releases if you have them in a fork..
>>> 
>>> Regards,
>>> Dmitry
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-23 Thread Jeremiah Jordan
For commit log archiving we already have the concept of “commands” to be
executed.  Maybe a similar concept would be useful for snapshots?  Maybe a
new “user snapshot with command” nodetool action could be added.  The
server would make its usual hard links inside a snapshot folder and then it
could shell off a new process running the “snapshot archiving command”
passing it the directory just made.  Then what ever logic wanted could be
implemented in the command script.  Be that copying to S3, or copying to a
folder on another mount point, or what ever the operator wants to happen.

-Jeremiah

On Jan 23, 2025 at 7:54:20 AM, Štefan Miklošovič 
wrote:

> Interesting, I will need to think about it more. Thanks for chiming in.
>
> On Wed, Jan 22, 2025 at 8:10 PM Blake Eggleston 
> wrote:
>
>> Somewhat tangential, but I’d like to see Cassandra provide a backup story
>> that doesn’t involve making copies of sstables. They’re constantly
>> rewritten by compaction, and intelligent backup systems often need to be
>> able to read sstable metadata to optimize storage usage.
>>
>> An interface purpose built to support incremental backup and restore
>> would almost definitely be more efficient since it could account for
>> compaction, and would separate operational requirements from storage layer
>> implementation details.
>>
>> On Jan 22, 2025, at 2:33 AM, Štefan Miklošovič 
>> wrote:
>>
>>
>>
>> On Wed, Jan 22, 2025 at 2:21 AM James Berragan 
>> wrote:
>>
>>> I think this is an idea worth exploring, my guess is that even if the
>>> scope is confined to just "copy if not exists" it would still largely be
>>> used as a cloud-agnostic backup/restore solution, and so will be shaped
>>> accordingly.
>>>
>>> Some thoughts:
>>>
>>> - I think it would be worth exploring more what the directory structure
>>> looks like. You mention a flat directory hierarchy, but it seems to me it
>>> would need to be delimited by node (or token range) in some way as the
>>> SSTable identifier will not be unique across the cluster. If we do need to
>>> delimit by node, is the configuration burden then on the user to mount
>>> individual drives to S3/Azure/wherever to unique per node paths? What do
>>> they do in the event of a host replacement, backup to a new empty
>>> directory?
>>>
>>
>> It will be unique when "uuid_sstable_identifiers_enabled: true", even
>> across the cluster. If we worked with "old identifiers" too, these are
>> indeed not unique (even across different tables in the same node). I am not
>> completely sure how far we want to go with this, I don't have a problem
>> saying that we support this feature only with
>> "uuid_sstable_identifiers_enabled: true". If we were to support the older
>> SSTable identifier naming as well, that would complicate it more. Esop's
>> directory structure of a remote destination is here:
>>
>>
>> https://github.com/instaclustr/esop?tab=readme-ov-file#directory-structure-of-a-remote-destination
>>
>> and how the content of the snapshot's manifest looks just below it.
>>
>> We may go with hierarchical structure as well if this is evaluated to be
>> a better approach. I just find flat hierarchy simpler. We can not have flat
>> hierarchy with old / non-unique identifiers so we would need to find a way
>> how to differentiate one SSTable from another, which naturally leads to
>> them being placed in keyspace/table/sstable hierarchy but I do not want to
>> complicated it more to have flat and non-flat hierarchies supported
>> simultaneously (where a user could pick which one he wants). We should go
>> just with one solution.
>>
>> When it comes to node replacement, I think that it would be just up to an
>> operator to rename the whole directory to reflect a new path for that
>> particular node. Imagine an operator has a bucket in Azure which is empty
>> (/) and it is mounted to /mnt/nfs/cassandra in every node. Then on node1,
>> Cassandra would automatically start to put SSTables into
>> /mnt/azure/cassandra/cluster-name/dc-name/node-id-1 and node 2 would put
>> that into /mnt/nfs/cassandra/cluster-name/dc-name/node-id-2.
>>
>> The part of "cluster-name/dc-name/node-id" would be automatically done by
>> Cassandra itself. It would just append it to /mnt/nfs/cassandra under which
>> a bucket be mounted.
>>
>> If you replaced the node, data would stay, it would just change node's
>> ID. In that case, all that would need to be necessary would be to rename
>> "node-id-1" directory to "node-id-3" (id-3 being a host id of the replaced
>> node). Snapshot manifest does not know anything about host id so content of
>> the manifest would not need to be changed. If you don't rename the node id
>> directory, then snapshots would be indeed made under a new host id
>> directory which would be empty at first.
>>
>>
>>> - The challenge often with restore is restoring from snapshots created
>>> before a cluster topology change (node replacements, token moves,
>>> cluster expansions/shrinks etc). This could be solved by

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-23 Thread Štefan Miklošovič
I feel uneasy about executing scripts from Cassandra. Jon was talking about
this here (1) as well. I would not base this on any shell scripts /
commands executions. I think nothing beats pure Java copying files to a
directory ...

(1) https://lists.apache.org/thread/jcr3mln2tohbckvr8fjrr0sq0syof080

On Thu, Jan 23, 2025 at 5:16 PM Jeremiah Jordan 
wrote:

> For commit log archiving we already have the concept of “commands” to be
> executed.  Maybe a similar concept would be useful for snapshots?  Maybe a
> new “user snapshot with command” nodetool action could be added.  The
> server would make its usual hard links inside a snapshot folder and then it
> could shell off a new process running the “snapshot archiving command”
> passing it the directory just made.  Then what ever logic wanted could be
> implemented in the command script.  Be that copying to S3, or copying to a
> folder on another mount point, or what ever the operator wants to happen.
>
> -Jeremiah
>
> On Jan 23, 2025 at 7:54:20 AM, Štefan Miklošovič 
> wrote:
>
>> Interesting, I will need to think about it more. Thanks for chiming in.
>>
>> On Wed, Jan 22, 2025 at 8:10 PM Blake Eggleston 
>> wrote:
>>
>>> Somewhat tangential, but I’d like to see Cassandra provide a backup
>>> story that doesn’t involve making copies of sstables. They’re constantly
>>> rewritten by compaction, and intelligent backup systems often need to be
>>> able to read sstable metadata to optimize storage usage.
>>>
>>> An interface purpose built to support incremental backup and restore
>>> would almost definitely be more efficient since it could account for
>>> compaction, and would separate operational requirements from storage layer
>>> implementation details.
>>>
>>> On Jan 22, 2025, at 2:33 AM, Štefan Miklošovič 
>>> wrote:
>>>
>>>
>>>
>>> On Wed, Jan 22, 2025 at 2:21 AM James Berragan 
>>> wrote:
>>>
 I think this is an idea worth exploring, my guess is that even if the
 scope is confined to just "copy if not exists" it would still largely be
 used as a cloud-agnostic backup/restore solution, and so will be shaped
 accordingly.

 Some thoughts:

 - I think it would be worth exploring more what the directory structure
 looks like. You mention a flat directory hierarchy, but it seems to me it
 would need to be delimited by node (or token range) in some way as the
 SSTable identifier will not be unique across the cluster. If we do need to
 delimit by node, is the configuration burden then on the user to mount
 individual drives to S3/Azure/wherever to unique per node paths? What do
 they do in the event of a host replacement, backup to a new empty
 directory?

>>>
>>> It will be unique when "uuid_sstable_identifiers_enabled: true", even
>>> across the cluster. If we worked with "old identifiers" too, these are
>>> indeed not unique (even across different tables in the same node). I am not
>>> completely sure how far we want to go with this, I don't have a problem
>>> saying that we support this feature only with
>>> "uuid_sstable_identifiers_enabled: true". If we were to support the older
>>> SSTable identifier naming as well, that would complicate it more. Esop's
>>> directory structure of a remote destination is here:
>>>
>>>
>>> https://github.com/instaclustr/esop?tab=readme-ov-file#directory-structure-of-a-remote-destination
>>>
>>> and how the content of the snapshot's manifest looks just below it.
>>>
>>> We may go with hierarchical structure as well if this is evaluated to be
>>> a better approach. I just find flat hierarchy simpler. We can not have flat
>>> hierarchy with old / non-unique identifiers so we would need to find a way
>>> how to differentiate one SSTable from another, which naturally leads to
>>> them being placed in keyspace/table/sstable hierarchy but I do not want to
>>> complicated it more to have flat and non-flat hierarchies supported
>>> simultaneously (where a user could pick which one he wants). We should go
>>> just with one solution.
>>>
>>> When it comes to node replacement, I think that it would be just up to
>>> an operator to rename the whole directory to reflect a new path for that
>>> particular node. Imagine an operator has a bucket in Azure which is empty
>>> (/) and it is mounted to /mnt/nfs/cassandra in every node. Then on node1,
>>> Cassandra would automatically start to put SSTables into
>>> /mnt/azure/cassandra/cluster-name/dc-name/node-id-1 and node 2 would put
>>> that into /mnt/nfs/cassandra/cluster-name/dc-name/node-id-2.
>>>
>>> The part of "cluster-name/dc-name/node-id" would be automatically done
>>> by Cassandra itself. It would just append it to /mnt/nfs/cassandra under
>>> which a bucket be mounted.
>>>
>>> If you replaced the node, data would stay, it would just change node's
>>> ID. In that case, all that would need to be necessary would be to rename
>>> "node-id-1" directory to "node-id-3" (id-3 being a host id of the replaced
>>> node). Snapshot m

Re: [VOTE] Release Apache Cassandra Java Driver 3.12.1

2025-01-23 Thread Maxim Muzafarov
+1 (nb)

On Thu, 23 Jan 2025 at 16:35, Josh McKenzie  wrote:
>
> +1
>
> On Thu, Jan 23, 2025, at 9:58 AM, Štefan Miklošovič wrote:
>
> +1
>
> On Sat, Jan 18, 2025 at 10:54 PM Bret McGuire  wrote:
>
> Greetings all!
>
>
>I’m proposing the Cassandra Java Driver 3.12.1 for release.
>
>
> sha1: 873e6f764a499bd9c5a42cafa53dc77184711eea
>
> git: https://github.com/apache/cassandra-java-driver/tree/3.12.1
>
> Maven Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1355
>
>
>The Source release is available here:
>
> https://dist.apache.org/repos/dist/dev/cassandra/cassandra-java-driver/3.12.1/
>
>
>This is the first release of new functionality for the 3.x Java driver 
> since its donation.  Our recent 3.12.0 release was intended to provide an 
> ASF-branded baseline for the 3.x Java driver while this release is intended 
> to get any changes that might have been waiting in the 3.x branch out into 
> the wild.  The full changelog can be found at 
> https://github.com/apache/cassandra-java-driver/tree/3.12.1/changelog#3121
>
>
>The vote will be open for 120 hours (longer if needed) due to the upcoming 
> holiday weekend. Everyone who has tested the build is invited to vote. Votes 
> by PMC members are considered binding. A vote passes if there are at least 
> three binding +1s and no -1's.
>
>
>Thanks!


Re: Patrick McFadin joins the PMC

2025-01-23 Thread Joseph Lynch
Wahoo! Congratulations Patrick!

-Joey

On Wed, Jan 22, 2025 at 11:06 AM Jordan West  wrote:

> The PMC's members are pleased to announce that Patrick McFadin has accepted
> an invitation to become a PMC member.
>
> Thanks a lot, Patrick, for everything you have done for the project all
> these years.
>
> Congratulations and welcome!!
>
> The Apache Cassandra PMC
>


Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-23 Thread Štefan Miklošovič
Interesting, I will need to think about it more. Thanks for chiming in.

On Wed, Jan 22, 2025 at 8:10 PM Blake Eggleston 
wrote:

> Somewhat tangential, but I’d like to see Cassandra provide a backup story
> that doesn’t involve making copies of sstables. They’re constantly
> rewritten by compaction, and intelligent backup systems often need to be
> able to read sstable metadata to optimize storage usage.
>
> An interface purpose built to support incremental backup and restore would
> almost definitely be more efficient since it could account for compaction,
> and would separate operational requirements from storage layer
> implementation details.
>
> On Jan 22, 2025, at 2:33 AM, Štefan Miklošovič 
> wrote:
>
>
>
> On Wed, Jan 22, 2025 at 2:21 AM James Berragan 
> wrote:
>
>> I think this is an idea worth exploring, my guess is that even if the
>> scope is confined to just "copy if not exists" it would still largely be
>> used as a cloud-agnostic backup/restore solution, and so will be shaped
>> accordingly.
>>
>> Some thoughts:
>>
>> - I think it would be worth exploring more what the directory structure
>> looks like. You mention a flat directory hierarchy, but it seems to me it
>> would need to be delimited by node (or token range) in some way as the
>> SSTable identifier will not be unique across the cluster. If we do need to
>> delimit by node, is the configuration burden then on the user to mount
>> individual drives to S3/Azure/wherever to unique per node paths? What do
>> they do in the event of a host replacement, backup to a new empty
>> directory?
>>
>
> It will be unique when "uuid_sstable_identifiers_enabled: true", even
> across the cluster. If we worked with "old identifiers" too, these are
> indeed not unique (even across different tables in the same node). I am not
> completely sure how far we want to go with this, I don't have a problem
> saying that we support this feature only with
> "uuid_sstable_identifiers_enabled: true". If we were to support the older
> SSTable identifier naming as well, that would complicate it more. Esop's
> directory structure of a remote destination is here:
>
>
> https://github.com/instaclustr/esop?tab=readme-ov-file#directory-structure-of-a-remote-destination
>
> and how the content of the snapshot's manifest looks just below it.
>
> We may go with hierarchical structure as well if this is evaluated to be a
> better approach. I just find flat hierarchy simpler. We can not have flat
> hierarchy with old / non-unique identifiers so we would need to find a way
> how to differentiate one SSTable from another, which naturally leads to
> them being placed in keyspace/table/sstable hierarchy but I do not want to
> complicated it more to have flat and non-flat hierarchies supported
> simultaneously (where a user could pick which one he wants). We should go
> just with one solution.
>
> When it comes to node replacement, I think that it would be just up to an
> operator to rename the whole directory to reflect a new path for that
> particular node. Imagine an operator has a bucket in Azure which is empty
> (/) and it is mounted to /mnt/nfs/cassandra in every node. Then on node1,
> Cassandra would automatically start to put SSTables into
> /mnt/azure/cassandra/cluster-name/dc-name/node-id-1 and node 2 would put
> that into /mnt/nfs/cassandra/cluster-name/dc-name/node-id-2.
>
> The part of "cluster-name/dc-name/node-id" would be automatically done by
> Cassandra itself. It would just append it to /mnt/nfs/cassandra under which
> a bucket be mounted.
>
> If you replaced the node, data would stay, it would just change node's ID.
> In that case, all that would need to be necessary would be to rename
> "node-id-1" directory to "node-id-3" (id-3 being a host id of the replaced
> node). Snapshot manifest does not know anything about host id so content of
> the manifest would not need to be changed. If you don't rename the node id
> directory, then snapshots would be indeed made under a new host id
> directory which would be empty at first.
>
>
>> - The challenge often with restore is restoring from snapshots created
>> before a cluster topology change (node replacements, token moves,
>> cluster expansions/shrinks etc). This could be solved by storing the
>> snapshot token information in the manifest somewhere. Ideally the user
>> shouldn't have to scan token information snapshot-wide all SSTables to
>> determine which ones to restore.
>>
>
> Yes, see the content of the snapshot manifest as I mentioned already
> (couple lines below the example of directory hierarchy). We are storing
> "tokens" and "schemaVersion". Each snapshot manifest also contains
> "schemaContent" with CQL representation of a schema all SSTables in a
> logical snapshot belong to so an operator knows what was the schema at the
> time that snapshot was taken plus what were the tokens, plus what was
> schema version.
>
>
>>
>> - I didn't understand the TTL mechanism. If we only copy SSTables that
>> haven't b

Re: [VOTE] Release Apache Cassandra Java Driver 3.12.1

2025-01-23 Thread Bret McGuire
With four +1 votes (3 binding) and zero -1 votes the vote passes.
Thanks all!

- Bret -

On Thu, Jan 23, 2025 at 11:42 AM Maxim Muzafarov  wrote:

> +1 (nb)
>
> On Thu, 23 Jan 2025 at 16:35, Josh McKenzie  wrote:
> >
> > +1
> >
> > On Thu, Jan 23, 2025, at 9:58 AM, Štefan Miklošovič wrote:
> >
> > +1
> >
> > On Sat, Jan 18, 2025 at 10:54 PM Bret McGuire 
> wrote:
> >
> > Greetings all!
> >
> >
> >I’m proposing the Cassandra Java Driver 3.12.1 for release.
> >
> >
> > sha1: 873e6f764a499bd9c5a42cafa53dc77184711eea
> >
> > git: https://github.com/apache/cassandra-java-driver/tree/3.12.1
> >
> > Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1355
> >
> >
> >The Source release is available here:
> >
> >
> https://dist.apache.org/repos/dist/dev/cassandra/cassandra-java-driver/3.12.1/
> >
> >
> >This is the first release of new functionality for the 3.x Java
> driver since its donation.  Our recent 3.12.0 release was intended to
> provide an ASF-branded baseline for the 3.x Java driver while this release
> is intended to get any changes that might have been waiting in the 3.x
> branch out into the wild.  The full changelog can be found at
> https://github.com/apache/cassandra-java-driver/tree/3.12.1/changelog#3121
> >
> >
> >The vote will be open for 120 hours (longer if needed) due to the
> upcoming holiday weekend. Everyone who has tested the build is invited to
> vote. Votes by PMC members are considered binding. A vote passes if there
> are at least three binding +1s and no -1's.
> >
> >
> >Thanks!
>