Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-03-01 Thread Benedict
Another option would be to introduce a second class with the same fields as the first where we simply specify final for immutable fields, and construct it after parsing the Config.We could even generate the non-final version from the one with final fields.Not sure this would be nicer, but it is an alternative.On 1 Mar 2023, at 02:10, Ekaterina Dimitrova  wrote:I agree with David that the annotations seem a bit too many but if I have to choose from the three approaches - the annotations one seems most reasonable to me and I didn’t have the chance to consider any others. Volatile seems fragile and unclear as a differentiator. I agreeOn Tue, 28 Feb 2023 at 17:47, Maxim Muzafarov  wrote:Folks,

If there are no objections to the approach described in this thread,
I'd like to proceed with this change. The change seems to be valuable
for the upcoming release, so any comments are really appreciated.

On Wed, 22 Feb 2023 at 21:51, David Capwell  wrote:
>
> I guess back to the point of the thread, we need a way to know what configs are mutable for the settings virtual table, so need some way to denote that the config replica_filtering_protection.cached_rows_fail_threshold is mutable.  Given the way that the yaml config works, we can’t rely on the presences of “final” or not, so need some way to mark a config is mutable for that table, does anyone want to offer feedback on what works best for them?
>
> Out of all proposals given so far “volatile” is the least verbose but also not explicit (as this thread is showing there is debate on if this should be present), new annotations are a little more verbose but would be explicit (no surprises), and getter/setters in different classes (such as DD) is the most verbose and suffers from not being explicit and ambiguity for mapping back to Config.
>
> Given the above, annotations sounds like the best option, but do we really want our config to look as follows?
>
> @Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = Converters.MILLIS_DURATION_LONG, deprecated = true)
> @Mutable
> public DurationSpec.LongMillisecondsBound native_transport_idle_timeout = new DurationSpec.LongMillisecondsBound("0ms”);
> @Mutable
> public DurationSpec.LongMillisecondsBound transaction_timeout = new DurationSpec.LongMillisecondsBound("30s”);
> @Mutable
> public double phi_convict_threshold = 8.0;
> public String partitioner; // assume immutable by default?
>
>
> > On Feb 22, 2023, at 6:20 AM, Benedict  wrote:
> >
> > Could you describe the issues? Config that is globally exposed should ideally be immutable with final members, in which case volatile is only necessary if you’re using the config parameter in a tight loop that you need to witness a new value - which shouldn’t apply to any of our config.
> >
> > There are some weird niches, like updating long values on some (unsupported by us) JVMs that may tear. Technically you also require it for visibility with the JMM. But in practice it is mostly unnecessary. Often what seems to be a volatile issue is really something else.
> >
> >> On 22 Feb 2023, at 13:18, Benjamin Lerer  wrote:
> >>
> >> I have seen issues with some updatable parameters which were missing the volatile keyword.
> >>
> >> Le mer. 22 févr. 2023 à 11:36, Aleksey Yeshchenko  a écrit :
> >> FWIW most of those volatile fields, if not in fact all of them, should NOT be volatile at all. Someone started the trend and most folks have been copycatting or doing the same for consistency with the rest of the codebase.
> >>
> >> Please definitely don’t rely on that.
> >>
> >>> On 21 Feb 2023, at 21:06, Maxim Muzafarov  wrote:
> >>>
> >>> 1. Rely on the volatile keyword in front of fields in the Config class;
> >>>
> >>> I would say this is the most confusing option for me because it
> >>> doesn't give us all the guarantees we need, and also:
> >>> - We have no explicit control over what exactly we expose to a user.
> >>> When we modify the JMX API, we're implementing a new method for the
> >>> MBean, which in turn makes this action an explicit exposure;
> >>> - The volatile keyword is not the only way to achieve thread safety,
> >>> and looks strange for the public API design point;
> >>> - A good example is the setEnableDropCompactStorage method, which
> >>> changes the volatile field, but is only visible for testing purposes;
> >>
> >>
>



Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-03-01 Thread Miklosovic, Stefan
I am fine with annotations. I am not a big of fan of the generation. From my 
experience whenever we wanted to generate something we had to take care of the 
generator itself and then we had to live with what it generated (yeah, that is 
also a thing) instead of writing it by hand once and have some freedom to tweak 
it however we wanted. Splitting this into the second class ... well, I would 
say that just increases the entropy.

We can parse config class on these annotations and produce the documentation 
easily. I would probably go so far that I would put that annotation on all 
fields. We could have two - Mutable, and Immutable. But that is really optional.


From: Benedict 
Sent: Wednesday, March 1, 2023 9:09
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Allow UPDATE on settings virtual table to change running 
configuration

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Another option would be to introduce a second class with the same fields as the 
first where we simply specify final for immutable fields, and construct it 
after parsing the Config.

We could even generate the non-final version from the one with final fields.

Not sure this would be nicer, but it is an alternative.

On 1 Mar 2023, at 02:10, Ekaterina Dimitrova  wrote:


I agree with David that the annotations seem a bit too many but if I have to 
choose from the three approaches - the annotations one seems most reasonable to 
me and I didn’t have the chance to consider any others. Volatile seems fragile 
and unclear as a differentiator. I agree

On Tue, 28 Feb 2023 at 17:47, Maxim Muzafarov 
mailto:mmu...@apache.org>> wrote:
Folks,

If there are no objections to the approach described in this thread,
I'd like to proceed with this change. The change seems to be valuable
for the upcoming release, so any comments are really appreciated.

On Wed, 22 Feb 2023 at 21:51, David Capwell 
mailto:dcapw...@apple.com>> wrote:
>
> I guess back to the point of the thread, we need a way to know what configs 
> are mutable for the settings virtual table, so need some way to denote that 
> the config replica_filtering_protection.cached_rows_fail_threshold is 
> mutable.  Given the way that the yaml config works, we can’t rely on the 
> presences of “final” or not, so need some way to mark a config is mutable for 
> that table, does anyone want to offer feedback on what works best for them?
>
> Out of all proposals given so far “volatile” is the least verbose but also 
> not explicit (as this thread is showing there is debate on if this should be 
> present), new annotations are a little more verbose but would be explicit (no 
> surprises), and getter/setters in different classes (such as DD) is the most 
> verbose and suffers from not being explicit and ambiguity for mapping back to 
> Config.
>
> Given the above, annotations sounds like the best option, but do we really 
> want our config to look as follows?
>
> @Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = 
> Converters.MILLIS_DURATION_LONG, deprecated = true)
> @Mutable
> public DurationSpec.LongMillisecondsBound native_transport_idle_timeout = new 
> DurationSpec.LongMillisecondsBound("0ms”);
> @Mutable
> public DurationSpec.LongMillisecondsBound transaction_timeout = new 
> DurationSpec.LongMillisecondsBound("30s”);
> @Mutable
> public double phi_convict_threshold = 8.0;
> public String partitioner; // assume immutable by default?
>
>
> > On Feb 22, 2023, at 6:20 AM, Benedict 
> > mailto:bened...@apache.org>> wrote:
> >
> > Could you describe the issues? Config that is globally exposed should 
> > ideally be immutable with final members, in which case volatile is only 
> > necessary if you’re using the config parameter in a tight loop that you 
> > need to witness a new value - which shouldn’t apply to any of our config.
> >
> > There are some weird niches, like updating long values on some (unsupported 
> > by us) JVMs that may tear. Technically you also require it for visibility 
> > with the JMM. But in practice it is mostly unnecessary. Often what seems to 
> > be a volatile issue is really something else.
> >
> >> On 22 Feb 2023, at 13:18, Benjamin Lerer 
> >> mailto:b.le...@gmail.com>> wrote:
> >>
> >> I have seen issues with some updatable parameters which were missing the 
> >> volatile keyword.
> >>
> >> Le mer. 22 févr. 2023 à 11:36, Aleksey Yeshchenko 
> >> mailto:alek...@apple.com>> a écrit :
> >> FWIW most of those volatile fields, if not in fact all of them, should NOT 
> >> be volatile at all. Someone started the trend and most folks have been 
> >> copycatting or doing the same for consistency with the rest of the 
> >> codebase.
> >>
> >> Please definitely don’t rely on that.
> >>
> >>> On 21 Feb 2023, at 21:06, Maxim Muzafarov 
> >>> mailto:mmu...@apache.org>> wrote:
> >>>
> >>> 1. Rely on the vola

Re: [DISCUSS] Next release date

2023-03-01 Thread Mick Semb Wever
>
> My thoughts don't touch on CEPs inflight.
>



For the sake of broadening the discussion, additional questions I think
worthwhile to raise are…

1. What third parties, or other initiatives, are invested and/or working
against the May deadline? and what are their views on changing it?
  1a. If we push branching back to September, how confident are we that
we'll get to GA before the December Summit?
2. What CEPs look like not landing by May that we consider a must-have this
year?
  2a. Is it just tail-end commits in those CEPs that won't make it? Can
these land (with or without a waiver) during the alpha phase?
  2b. If the final components to specified CEPs are not
approved/appropriate to land during alpha, would it be better if the
project commits to a one-off half-year release later in the year?


Re: [DISCUSS] Next release date

2023-03-01 Thread Benedict
It doesn’t look like we agreed to a policy of annual branch dates, only annual 
releases and that we would schedule this for 4.1 based on 4.0’s branch date. 
Given this was the reasoning proposed I can see why folk would expect this 
would happen for the next release. I don’t think there was a strong enough 
commitment here to be bound by, it if we think different maths would work 
better.

I recall the goal for an annual cadence was to ensure we don’t have lengthy 
periods between releases like 3.x to 4.0, and to try to reduce the pressure 
certain contributors might feel to hit a specific release with a given feature.

I think it’s better to revisit these underlying reasons and check how they 
apply than to pick a mechanism and stick to it too closely. 

The last release was quite recent, so we aren’t at risk of slow releases here. 
Similarly, there are some features that the *project* would probably benefit 
from landing prior to release, if this doesn’t push release back too far.




> On 1 Mar 2023, at 13:38, Mick Semb Wever  wrote:
> 
> 
>> My thoughts don't touch on CEPs inflight. 
> 
> 
> 
> For the sake of broadening the discussion, additional questions I think 
> worthwhile to raise are…
> 
> 1. What third parties, or other initiatives, are invested and/or working 
> against the May deadline? and what are their views on changing it?
>   1a. If we push branching back to September, how confident are we that we'll 
> get to GA before the December Summit?
> 2. What CEPs look like not landing by May that we consider a must-have this 
> year?
>   2a. Is it just tail-end commits in those CEPs that won't make it? Can these 
> land (with or without a waiver) during the alpha phase?
>   2b. If the final components to specified CEPs are not approved/appropriate 
> to land during alpha, would it be better if the project commits to a one-off 
> half-year release later in the year?


Re: [DISCUSS] Next release date

2023-03-01 Thread Molly Monroy

In the interest of broadening perspectives, thoughts here from two angles: 
community engagement and marketing. We will be discussing what’s coming in 
Cassandra 5.0 at Cassandra Forward in 2 weeks. This is meant to build 
excitement for the next version so having technology for folks to get their 
hands on soon after, while the news is fresh, would be advantageous for a few 
reasons...

1. From a community engagement perspective: We want to both deepen community 
engagement and build a more direct / engaged feedback loop around the release 
in the absence of an in person event (til Dec), we hope to arrange things like 
live (virtual) forums for contributors and users to weigh in on features. A May 
release would give us a runway to create these forums and time to make sure 
voices are heard through them in the lead to GA.

2. From a marketing perspective: We want to not just excite our existing 
community, but also grow as we welcome newcomers to the project. Having a new 
release out there (even in it’s early version) allows us to continue momentum, 
show consistent innovation, and work on bringing new users and contributors 
into the fold in the runway to GA.

All that said #1 could also be achieved if the project is landing features that 
will ultimately benefit the 5.0 release. These forums could be built around new 
 feature updates. 

> On Mar 1, 2023, at 6:59 AM, Benedict  wrote:
> 
> It doesn’t look like we agreed to a policy of annual branch dates, only 
> annual releases and that we would schedule this for 4.1 based on 4.0’s branch 
> date. Given this was the reasoning proposed I can see why folk would expect 
> this would happen for the next release. I don’t think there was a strong 
> enough commitment here to be bound by, it if we think different maths would 
> work better.
> 
> I recall the goal for an annual cadence was to ensure we don’t have lengthy 
> periods between releases like 3.x to 4.0, and to try to reduce the pressure 
> certain contributors might feel to hit a specific release with a given 
> feature.
> 
> I think it’s better to revisit these underlying reasons and check how they 
> apply than to pick a mechanism and stick to it too closely. 
> 
> The last release was quite recent, so we aren’t at risk of slow releases 
> here. Similarly, there are some features that the *project* would probably 
> benefit from landing prior to release, if this doesn’t push release back too 
> far.
> 
> 
> 
> 
>> On 1 Mar 2023, at 13:38, Mick Semb Wever  wrote:
>> 
>>> My thoughts don't touch on CEPs inflight. 
>> 
>> 
>> 
>> For the sake of broadening the discussion, additional questions I think 
>> worthwhile to raise are…
>> 
>> 1. What third parties, or other initiatives, are invested and/or working 
>> against the May deadline? and what are their views on changing it?
>>   1a. If we push branching back to September, how confident are we that 
>> we'll get to GA before the December Summit?
>> 2. What CEPs look like not landing by May that we consider a must-have this 
>> year?
>>   2a. Is it just tail-end commits in those CEPs that won't make it? Can 
>> these land (with or without a waiver) during the alpha phase?
>>   2b. If the final components to specified CEPs are not approved/appropriate 
>> to land during alpha, would it be better if the project commits to a one-off 
>> half-year release later in the year?


Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-03-01 Thread Maxim Muzafarov
Thank you all for your replies. Let me add some comments too,


>From a public API perspective, we have three types of fields in the
Config class: internal use only (e.g. logger, PROPERTY_PREFIX prefix),
read-only use (e.g. cluster_name), and read-write fields that are
currently mutated with JMX. So a single @Mutable annotation is not
enough to have clear Config's field separation. Adding two annotations
@Mutable and @Immutable might solve the problem, but such an approach
leads to code duplication if we want to extend our solution in future
with additional parameters such as "description", besides having two
different annotations for the same thing might confuse developers who
are not familiar with this discussion.

So, from my point of view, the best way for us might be as follows
mentioned in the PR (the annotation name needs to reflect that the
fields are available to the public API and for a user, we can change
the name):
@Exposure(policy = Exposure.Policy.READ_WRITE)
@Exposure(policy = Exposure.Policy.READ_ONLY)

Some other names come into my mind: APIAvailable, APIExposed,
UserAvailable, UserExposed etc.


Stefan mentioned that these annotations could be used to create
documentation pages, it's true, I have the same thoughts in mind, and
you can see what it will look like at the link below (the description
annotation field will be removed from the final version of the PR, but
may still be added as part of another issue):

https://github.com/apache/cassandra/pull/2133/files#diff-e966f41bc2a418becfe687134ec8cf542eb051eead7fb4917e65a3a2e7c9bce3R392

The SettingsTable may have the following columns and be truly
self-descriptive for a user: name, value, default_value, policy, and
description.


Benedict mentioned that we could create a second class to hold such
information. The best candidate for this is the ConfigFields class,
which is based on the Config class and contains all the field names as
constants (I used a small utility class to generate it). But it will
still require some manual work, as there is no rule to distinguish
which config field is mutable and which isn't. So we would have to
update two classes instead of one (the Config class) when adding new
configuration fields, which we don't want to do.

Here it is in the PR:
https://github.com/apache/cassandra/pull/2133/files#diff-fcb4c5bc59d4bb127ffbe9f1ce566b2238c5bb92622da430a4ff879781093d3fR31

On Wed, 1 Mar 2023 at 09:21, Miklosovic, Stefan
 wrote:
>
> I am fine with annotations. I am not a big of fan of the generation. From my 
> experience whenever we wanted to generate something we had to take care of 
> the generator itself and then we had to live with what it generated (yeah, 
> that is also a thing) instead of writing it by hand once and have some 
> freedom to tweak it however we wanted. Splitting this into the second class 
> ... well, I would say that just increases the entropy.
>
> We can parse config class on these annotations and produce the documentation 
> easily. I would probably go so far that I would put that annotation on all 
> fields. We could have two - Mutable, and Immutable. But that is really 
> optional.
>
> 
> From: Benedict 
> Sent: Wednesday, March 1, 2023 9:09
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Allow UPDATE on settings virtual table to change 
> running configuration
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
> Another option would be to introduce a second class with the same fields as 
> the first where we simply specify final for immutable fields, and construct 
> it after parsing the Config.
>
> We could even generate the non-final version from the one with final fields.
>
> Not sure this would be nicer, but it is an alternative.
>
> On 1 Mar 2023, at 02:10, Ekaterina Dimitrova  wrote:
>
> 
> I agree with David that the annotations seem a bit too many but if I have to 
> choose from the three approaches - the annotations one seems most reasonable 
> to me and I didn’t have the chance to consider any others. Volatile seems 
> fragile and unclear as a differentiator. I agree
>
> On Tue, 28 Feb 2023 at 17:47, Maxim Muzafarov 
> mailto:mmu...@apache.org>> wrote:
> Folks,
>
> If there are no objections to the approach described in this thread,
> I'd like to proceed with this change. The change seems to be valuable
> for the upcoming release, so any comments are really appreciated.
>
> On Wed, 22 Feb 2023 at 21:51, David Capwell 
> mailto:dcapw...@apple.com>> wrote:
> >
> > I guess back to the point of the thread, we need a way to know what configs 
> > are mutable for the settings virtual table, so need some way to denote that 
> > the config replica_filtering_protection.cached_rows_fail_threshold is 
> > mutable.  Given the way that the yaml config works, we can’t rely on the 
> > presences of “final” or not, so need some way to m

Re: [DISCUSS] Next release date

2023-03-01 Thread Henrik Ingo
Hi

Those are great questions Mick. It's good to recognize this discussion
impacts a broad range of contributors and users, and not all of them might
be aware of the discussion in the first place.

More generally I would say that your questions brought to mind two
fundamental principles with a "train model" release schedule:

  1. If a feature isn't ready by the cut-off date, there's no reason to
delay the release, because the next release is guaranteed to be just around
the corner.
  2. If there is a really important feature that won't make it, rather than
delaying the planned release, we should (also) consider the opposite: we
can do the next release earlier if there is a compelling feature ready to
go. (Answers question 2b from Mick.)

I have arguments both for and against moving the release date:


The to stick with the current plan, is that we have a lot of big features
now landing in trunk. If we delay the release for one feature, it will
delay the GA of all the other features that were ready by May. For example,
while SAI code is still being massaged based on review comments, we fully
expect it to merge before May. Same for the work on tries, which is on its
final stretch. Arguably Java 17 support can't come soon enough either. And
so on... For some user it can be a simple feature, like just one specific
guardrail, that they are waiting on. So just as we are excited to wait for
that one feature to be ready and make it, I'm just as unexcited about the
prospect of delaying the GA of several other features. If we had just one
big feature that everyone was working on, this would be easier to decide...

Note also that postponing the release for a single feature that is still in
development is a risky bet, because you never know what unknowns are still
ahead once the work is code complete and put to more serious testing. At
first it might sound reasonable to delay 1-3 months, but what if on that
3rd month some unforeseen work is discovered, and now we need to discuss
delaying another 3 months. Such a risk is inherent to any software project,
and we should anticipate it to happen. Scott's re-telling
of CASSANDRA-18110 is a great example: These delays can happen due to a
single issue, and it can be hard to speed things up by e.g. assigning more
engineers to the work. So, when we say that we'd like to move the branching
date from May to August, and specifically in order for some feature to be
ready by then, what do we do if it's not ready in August?`It's presumably
closer to being ready at that point, so the temptation to wait just a
little bit more is always there. (And this is also my answer to Mick's
question 1b.)



Now, let me switch to arguing the opposite opinion:

My instinct here would be to stick to early May as the cut-off date, but
also allow for exceptions. I'm curious to hear how this proposal is
received? If this was a startup, there could be a CEO or let's say a build
manager, that could make these kind of case by case decisions expediently.
But I realize in a consensus based open source project like Cassandra, we
may also have to consider issues like fairness: Why would some feature be
allowed a later date than others? How do we choose which work gets such
exceptions?

Anyway, the fact is that we have several huge bodies of work in flight now.
The Accord patch was about 28k lines of code when I looked at it, and note
that this doesn't even include "accord itself", which is in a different
repository. SAI, Tries (independently for memtable and sstables) and UCS
are all in the 10k range too. And I presume the Java 17 support and
transactional metadata are the same. Each of these pieces of code represent
alone years of engineering work. For context, Cassandra as a whole has
about 1 million lines of code. So each of these features is replacing or
adding about 1-3% of  the codebase.

With that in mind, I feel like having  a hard deadline on a single day
doesn't really serve justice to these features. In fact, most of them are
not merged in a single PR either, but  a series of PRs, each of which
independently is huge too. This makes me ask, what if some feature already
merged 3 patches, but still has 2 to go? Can we allow extra time to merge
the last two, or do we work on reverting the first 3? (Obviously not the
latter...)

So it seems to me we should keep May Xth as the beginning of the cutoff,
but where the actual cutoff is a fuzzy deadline rather than hard. For most
work it would be early may, but for the big features a few weeks or even
months of a window is ok.

This kind of flexible approach would still help advancing toward a release,
since it would quiet down the release branch significantly, and for most
contributors focus would shift to testing. (Alternatively, focus could
shift to help review and test the features that are still being worked on.)

Mick and Benjamin have been good at remind me that we can't expect to merge
all of this work the last week of April anyway. So from my point 

Re: [DISCUSS] Next release date

2023-03-01 Thread J. D. Jordan
We have been talking a lot about the branch cutting date, but I agree with Benedict here, I think we should actually be talking about the expected release date. If we truly believe that we can release within 1-2 months of cutting the branch, and many people I have talked to think that is possible, then a May branch cut means we release by July. That would only be 7 months post 4.1 release, that seems a little fast to me.  IIRC the last time we had release cadence discussions most people were for keeping to a release cadence of around 12 months, and many were against a 6 month cadence.So if we want to have a goal of “around 12 months” and also have a goal of “release before summit in December”. I would suggest we put our release date goal in October to give some runway for being late and still getting out by December.So if the release date goal is October, we can also hedge with the longer 2 month estimate on “time after branching” to again make sure we make our goals. This would put the branching in August. So if we do release in an October that gives us 10 months since 4.1, which while still shorter than 12 much closer than only 7 months would be.If people feel 1 month post branch cut is feasible we could cut the branch in September.-JeremiahOn Mar 1, 2023, at 10:34 AM, Henrik Ingo  wrote:Hi Those are great questions Mick. It's good to recognize this discussion impacts a broad range of contributors and users, and not all of them might be aware of the discussion in the first place.More generally I would say that your questions brought to mind two fundamental principles with a "train model" release schedule:  1. If a feature isn't ready by the cut-off date, there's no reason to delay the release, because the next release is guaranteed to be just around the corner.  2. If there is a really important feature that won't make it, rather than delaying the planned release, we should (also) consider the opposite: we can do the next release earlier if there is a compelling feature ready to go. (Answers question 2b from Mick.)I have arguments both for and against moving the release date:The to stick with the current plan, is that we have a lot of big features now landing in trunk. If we delay the release for one feature, it will delay the GA of all the other features that were ready by May. For example, while SAI code is still being massaged based on review comments, we fully expect it to merge before May. Same for the work on tries, which is on its final stretch. Arguably Java 17 support can't come soon enough either. And so on... For some user it can be a simple feature, like just one specific guardrail, that they are waiting on. So just as we are excited to wait for that one feature to be ready and make it, I'm just as unexcited about the prospect of delaying the GA of several other features. If we had just one big feature that everyone was working on, this would be easier to decide...Note also that postponing the release for a single feature that is still in development is a risky bet, because you never know what unknowns are still ahead once the work is code complete and put to more serious testing. At first it might sound reasonable to delay 1-3 months, but what if on that 3rd month some unforeseen work is discovered, and now we need to discuss delaying another 3 months. Such a risk is inherent to any software project, and we should anticipate it to happen. Scott's re-telling of CASSANDRA-18110 is a great example: These delays can happen due to a single issue, and it can be hard to speed things up by e.g. assigning more engineers to the work. So, when we say that we'd like to move the branching date from May to August, and specifically in order for some feature to be ready by then, what do we do if it's not ready in August?`It's presumably closer to being ready at that point, so the temptation to wait just a little bit more is always there. (And this is also my answer to Mick's question 1b.)Now, let me switch to arguing the opposite opinion:My instinct here would be to stick to early May as the cut-off date, but also allow for exceptions. I'm curious to hear how this proposal is received? If this was a startup, there could be a CEO or let's say a build manager, that could make these kind of case by case decisions expediently. But I realize in a consensus based open source project like Cassandra, we may also have to consider issues like fairness: Why would some feature be allowed a later date than others? How do we choose which work gets such exceptions?Anyway, the fact is that we have several huge bodies of work in flight now. The Accord patch was about 28k lines of code when I looked at it, and note that this doesn't even include "accord itself", which is in a different repository. SAI, Tries (independently for memtable and sstables) and UCS are all in the 10k range too. And I presume the Java 17 support and transactional metadata are the same. Each of these pieces of code represent alone years of engineerin

Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-03-01 Thread David Capwell
> Another option would be to introduce a second class with the same fields as 
> the first where we simply specify final for immutable fields, and construct 
> it after parsing the Config.
> 
> We could even generate the non-final version from the one with final fields.
> 
> Not sure this would be nicer, but it is an alternative.


Correct me if I misunderstood, but I read this as Config acts like a Builder, 
so after creating Config we do .build (logically) to get a new TheRealConfig 
which “may” have “final” fields and non-final fields (mutable)?  This logically 
would be a perfect place to merge the DatabaseDescriptor logic that normalizes, 
implements some form of backwards compatibility (one of the annoying issues 
with settings table), and validation (also annoying for settings table)…. Would 
allow DD to be a simple POJO (though still static) class….

Assuming TheRealConfig is auto generated and not human maintained this could be 
another solution, we would just have to be very careful not to fall into the 
trap that Lombok put people in and make upgrading JDKs a nightmare…

> Splitting this into the second class ... well, I would say that just 
> increases the entropy.

Yeah, this is a real issue with that solution, something we would have to be 
caution of if we go down that route.

> From a public API perspective, we have three types of fields in the
> Config class: internal use only (e.g. logger, PROPERTY_PREFIX prefix)


You do not have this in Config, but you do in subtypes; transient.  SnakeYaml 
ignores transient fields, and this is used in types such as EncryptionOptions 
for state that must be exposed internally (normally derived from user config) 
but must not be exposed to users.  In I think 3.11 and earlier you could define 
sslContextFactoryInstance in yaml (though it would always fail as SnakeYaml 
couldn’t figure out how to build that type), and in 4.0+ you get an unknown 
field exception if you try to define it (as we marked it transient to hide it)

In normalizing yaml loading and settings table (first release they both had 
different logic, which meant settings table would show things not allowed in 
yaml), I also added support for com.fasterxml.jackson.annotation.JsonIgnore, so 
users may define that to also hide things as well.

So, from our config point of view, if your field/methods match any of the 
following rules, you are “internal” and not accessible from yaml or settings 
table

1) static
2) have com.fasterxml.jackson.annotation.JsonIgnore
3) have transient
4) not public (private, package, or package-private)

> @Exposure(policy = Exposure.Policy.READ_WRITE)
> @Exposure(policy = Exposure.Policy.READ_ONLY)


I believe this is trying to address the concern you listed on internal configs 
correct?  My issue here is verbosity and how easy it would be for authors to 
forget to add, causing dead code (if you are not read_write or read_only then 
logically yaml/settings table shouldn’t touch you).  I feel that no annotation 
should be “immutable” by default as that is the common case, the majority of 
configs are immutable and a small subset are mutable.

> Stefan mentioned that these annotations could be used to create documentation 
> pages, it's true


We could go this route, but I wonder if its better to use JavaDoc here (we 
already build/publish, though there were questions in slack recently to stop 
publishing…)… Majority of docs are in conf/cassandra.yaml, so if we wish to 
move to code we would need to address how that file is maintained and how to 
document “groups” of configs (rather than documenting each config, we have a 
pattern of documenting a feature or pair of configs (such as min/max targets) 
and showing the configs that can be tweaked).  We did talk about moving to a 
“nested” config model, but there was concerns about nesting at a feature level 
as some features are cross cutting (so the “group” of configs may be in 
different areas), so how we define these “groups” isn’t too clear to me… its 
also not the common case so maybe less of a concern (if we document 
row_index_read_size_warn_threshold but not row_index_read_size_fail_threshold, 
is this still clear?)?

> The SettingsTable may have the following columns and be truly 
> self-descriptive for a user: name, value, default_value, policy, and 
> description.


If we wish to expose docs in the settings table, this would push us to define 
these in code and no longer in conf/cassandra.yml… I am ok with this, but this 
does increase the scope as it needs to address the existing models.  We also 
need better clarity on compatibility with column additions… there is another 
dev@ thread pointing out that durable tables cause downgrade issues… but do 
vtables?  Is it safe to add columns?  I should really bring this question to 
another thread and have us document….

> On Mar 1, 2023, at 6:33 AM, Maxim Muzafarov  wrote:
> 
> Thank you all for your replies. Let me add some comments too,
> 
> 
> From a public API

Re: [DISCUSS] Next release date

2023-03-01 Thread David Capwell
I am cool with defining target release date and working backwards from there.  
If we do want to go this route, I think we do need to answer why 4.1 cut -> 
release took so much time, and if people could start validation “before” we 
branch?  If we know trunk is stable today then we could release today, but I 
don’t believe we have this level of testing today, so I don’t know if I could 
say we can release in 1-4 months.

> On Mar 1, 2023, at 9:21 AM, J. D. Jordan  wrote:
> 
> We have been talking a lot about the branch cutting date, but I agree with 
> Benedict here, I think we should actually be talking about the expected 
> release date. 
> 
> If we truly believe that we can release within 1-2 months of cutting the 
> branch, and many people I have talked to think that is possible, then a May 
> branch cut means we release by July. That would only be 7 months post 4.1 
> release, that seems a little fast to me.  IIRC the last time we had release 
> cadence discussions most people were for keeping to a release cadence of 
> around 12 months, and many were against a 6 month cadence.
> 
> So if we want to have a goal of “around 12 months” and also have a goal of 
> “release before summit in December”. I would suggest we put our release date 
> goal in October to give some runway for being late and still getting out by 
> December.
> 
> So if the release date goal is October, we can also hedge with the longer 2 
> month estimate on “time after branching” to again make sure we make our 
> goals. This would put the branching in August. So if we do release in an 
> October that gives us 10 months since 4.1, which while still shorter than 12 
> much closer than only 7 months would be.
> 
> If people feel 1 month post branch cut is feasible we could cut the branch in 
> September.
> 
> -Jeremiah
> 
>> On Mar 1, 2023, at 10:34 AM, Henrik Ingo  wrote:
>> 
>> 
>> Hi 
>> 
>> Those are great questions Mick. It's good to recognize this discussion 
>> impacts a broad range of contributors and users, and not all of them might 
>> be aware of the discussion in the first place.
>> 
>> More generally I would say that your questions brought to mind two 
>> fundamental principles with a "train model" release schedule:
>> 
>>   1. If a feature isn't ready by the cut-off date, there's no reason to 
>> delay the release, because the next release is guaranteed to be just around 
>> the corner.
>>   2. If there is a really important feature that won't make it, rather than 
>> delaying the planned release, we should (also) consider the opposite: we can 
>> do the next release earlier if there is a compelling feature ready to go. 
>> (Answers question 2b from Mick.)
>> 
>> I have arguments both for and against moving the release date:
>> 
>> 
>> The to stick with the current plan, is that we have a lot of big features 
>> now landing in trunk. If we delay the release for one feature, it will delay 
>> the GA of all the other features that were ready by May. For example, while 
>> SAI code is still being massaged based on review comments, we fully expect 
>> it to merge before May. Same for the work on tries, which is on its final 
>> stretch. Arguably Java 17 support can't come soon enough either. And so 
>> on... For some user it can be a simple feature, like just one specific 
>> guardrail, that they are waiting on. So just as we are excited to wait for 
>> that one feature to be ready and make it, I'm just as unexcited about the 
>> prospect of delaying the GA of several other features. If we had just one 
>> big feature that everyone was working on, this would be easier to decide...
>> 
>> Note also that postponing the release for a single feature that is still in 
>> development is a risky bet, because you never know what unknowns are still 
>> ahead once the work is code complete and put to more serious testing. At 
>> first it might sound reasonable to delay 1-3 months, but what if on that 3rd 
>> month some unforeseen work is discovered, and now we need to discuss 
>> delaying another 3 months. Such a risk is inherent to any software project, 
>> and we should anticipate it to happen. Scott's re-telling of CASSANDRA-18110 
>> is a great example: These delays can happen due to a single issue, and it 
>> can be hard to speed things up by e.g. assigning more engineers to the work. 
>> So, when we say that we'd like to move the branching date from May to 
>> August, and specifically in order for some feature to be ready by then, what 
>> do we do if it's not ready in August?`It's presumably closer to being ready 
>> at that point, so the temptation to wait just a little bit more is always 
>> there. (And this is also my answer to Mick's question 1b.)
>> 
>> 
>> 
>> Now, let me switch to arguing the opposite opinion:
>> 
>> My instinct here would be to stick to early May as the cut-off date, but 
>> also allow for exceptions. I'm curious to hear how this proposal is 
>> received? If this was a startup, there could be a CEO or l

New episode of The Apache Cassandra (R) Corner podcast!

2023-03-01 Thread Aaron Ploetz
Link to the next episode:
https://drive.google.com/file/d/1nvHs7o4JJC2P18mtR5MrbnNoeW5f44j1/view?usp=sharing

s2Ep1 - Patrick McFadin

(You may have to download it to listen)

It will remain in staging for 72 hours, going live (assuming no objections)
by Saturday, March 4th (19:00 UTC).

If anyone should have any questions or comments, or if you want to be a
guest, please reach out to me.

For my guest pipeline, I have recording sessions scheduled with:
- Aaron Morton
- Loren Sands-Ramshaw (Temporal)

And I'm currently trying to nail down a time with Valeri:
- Valeri Karpov (MeanIT Software)

Thanks, everyone!

Aaron Ploetz


[DISCUSSION] Cassandra + Java 17

2023-03-01 Thread Ekaterina Dimitrova
Hi everyone,
Some updates and questions around JDK 17 below.
First of all, I wanted to let people know that currently Cassandra trunk
can be already compiled and run with J8 + J11 + J17. This is the product of
the realization that the feature branch makes it harder for working on
JDK17 related tickets due to the involvement of too many moving parts.
Agreement reached in [1] that new JDK introduction can be done
incrementally. Scripted UDFs removed, hooks to be added in a follow up
ticket.
What does this mean?
- Currently you can compile and run Cassandra trunk  with JDK 17(further to
8+11). You can run unit and java distributed tests already with JDK17
- CASSANDRA-18106 in progress,  enabling CCM to handle JDK8, 11 and 17 with
trunk and when that is ready we will be able to run also Python tests;
After that one lands it comes CASSANDRA-18247 ; its goal is to add CircleCI
config (separate of the one we have for 8+11) for 11+17 which can be used
from people who work on JDK17 related issues. Patch proposal already in the
ticket. Final version we will have when we do the switch 8+11 to 11+17,
things will go through evolution.

What does this mean? Anyone who is interested to test or to help with JDK17
effort can easily do it directly from trunk. Jenkins and CircleCI are not
switched from 8+11 to 11+17 until we are ready. Only test experimental
additional CircleCI config will be added, temporary to make it easier for
testing

To remind you - the umbrella ticket for the JDK17 builds is CASSANDRA-16895.
Good outstanding candidate still not assigned - CASSANDRA-18180, if anyone
has cycles, please, take a look at it. CASSANDRA-18263 might be also of
interest to someone.

In other news, I added already to the JDK17 jvm options certain
imports/exports which are needed at this point but as we agreed in the past
- it will be good to try to eliminate as many of them as we can. Consider
those experimental in my opinion.
Some of them though are related to:
-- some were added already from 11; thoughts?
*-- *some will be eliminated with some maintenance in progress
*-- *some are related to
https://chronicle.software/chronicle-support-java-17. I guess we are
cornered with those until Chronicle eliminates the need for those.
(CASSANDRA-18049)
-- Find a way to get FileDescriptor.fd and sun.nio.ch.FileChannelImpl.fd
without opening internals (CASSANDRA-17850)
-- we also use setAccessible at numerous places.
And I am sure our CI will tell me I am missing something, especially when
trunk is alive...

A few other questions:
- thoughts around the usage/future of Unsafe? History around the choice of
using it in C* and future plans I might not know of?
- ECJ - It seems the compiler artifacts are moved from here
 to
here  and there is
change of license from EPL1.0 to EPL2.0 too. But if I read correctly here
 that
should not affect us. I am dealing with this in CASSANDRA-18190. Please let
me know if you see any problem with this that I might be missing.
- Looking at the history of tickets around JMXServerUtils class I guess it
was accepted that we might have breakages (and we already had
CASSANDRA-14173) - JmxRegistry extends sun.rmi.registry.RegistryImpl?

Best regards,
Ekaterina

[1] https://lists.apache.org/thread/c39yrbdszgz9s34vr17wpjdhv6h2oo61


Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-03-01 Thread Maxim Muzafarov
David,

I think we're getting a bit off-topic now, but it seems we already
have a consensus on the main question. You're right about column
descriptions, adding new columns to vtables, and moving fields
description from yaml to the source code - all that will be part of
other changes anyway, so let's keep it all in mind for further
discussion for now.

I still think that relying on keywords is bad when we are trying to
expose something for public API and we should explicitly mark all
fields with "something", but you are right about the following - if we
have a single @Mutable (or  so) annotation:
- are we able to distinguish mutable fields in the Config class - yes;
- are we able to make the SettingsTable updatable with it - yes;
- can we change/extend it in the future without backward compatibility
issues - I think, yes;

So, for now, we have everything we need to continue with SettingsTable
and we can implement other our ideas iteratively later discussing the
details first.

On Wed, 1 Mar 2023 at 19:26, David Capwell  wrote:
>
> > Another option would be to introduce a second class with the same fields as 
> > the first where we simply specify final for immutable fields, and construct 
> > it after parsing the Config.
> >
> > We could even generate the non-final version from the one with final fields.
> >
> > Not sure this would be nicer, but it is an alternative.
>
>
> Correct me if I misunderstood, but I read this as Config acts like a Builder, 
> so after creating Config we do .build (logically) to get a new TheRealConfig 
> which “may” have “final” fields and non-final fields (mutable)?  This 
> logically would be a perfect place to merge the DatabaseDescriptor logic that 
> normalizes, implements some form of backwards compatibility (one of the 
> annoying issues with settings table), and validation (also annoying for 
> settings table)…. Would allow DD to be a simple POJO (though still static) 
> class….
>
> Assuming TheRealConfig is auto generated and not human maintained this could 
> be another solution, we would just have to be very careful not to fall into 
> the trap that Lombok put people in and make upgrading JDKs a nightmare…
>
> > Splitting this into the second class ... well, I would say that just 
> > increases the entropy.
>
> Yeah, this is a real issue with that solution, something we would have to be 
> caution of if we go down that route.
>
> > From a public API perspective, we have three types of fields in the
> > Config class: internal use only (e.g. logger, PROPERTY_PREFIX prefix)
>
>
> You do not have this in Config, but you do in subtypes; transient.  SnakeYaml 
> ignores transient fields, and this is used in types such as EncryptionOptions 
> for state that must be exposed internally (normally derived from user config) 
> but must not be exposed to users.  In I think 3.11 and earlier you could 
> define sslContextFactoryInstance in yaml (though it would always fail as 
> SnakeYaml couldn’t figure out how to build that type), and in 4.0+ you get an 
> unknown field exception if you try to define it (as we marked it transient to 
> hide it)
>
> In normalizing yaml loading and settings table (first release they both had 
> different logic, which meant settings table would show things not allowed in 
> yaml), I also added support for com.fasterxml.jackson.annotation.JsonIgnore, 
> so users may define that to also hide things as well.
>
> So, from our config point of view, if your field/methods match any of the 
> following rules, you are “internal” and not accessible from yaml or settings 
> table
>
> 1) static
> 2) have com.fasterxml.jackson.annotation.JsonIgnore
> 3) have transient
> 4) not public (private, package, or package-private)
>
> > @Exposure(policy = Exposure.Policy.READ_WRITE)
> > @Exposure(policy = Exposure.Policy.READ_ONLY)
>
>
> I believe this is trying to address the concern you listed on internal 
> configs correct?  My issue here is verbosity and how easy it would be for 
> authors to forget to add, causing dead code (if you are not read_write or 
> read_only then logically yaml/settings table shouldn’t touch you).  I feel 
> that no annotation should be “immutable” by default as that is the common 
> case, the majority of configs are immutable and a small subset are mutable.
>
> > Stefan mentioned that these annotations could be used to create 
> > documentation pages, it's true
>
>
> We could go this route, but I wonder if its better to use JavaDoc here (we 
> already build/publish, though there were questions in slack recently to stop 
> publishing…)… Majority of docs are in conf/cassandra.yaml, so if we wish to 
> move to code we would need to address how that file is maintained and how to 
> document “groups” of configs (rather than documenting each config, we have a 
> pattern of documenting a feature or pair of configs (such as min/max targets) 
> and showing the configs that can be tweaked).  We did talk about moving to a 
> “nested” config model, but th