Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-12-12 Thread Maxim Muzafarov
> Technically it can be two commits which would be merged / pushed at once.

I'll prepare a new pull request containing both of the changes. My
previous experience says me that it's really hard to find a reviewer
who will be able to go through huge pull requests, that's why
initially I've split this into AvoidStarImport and CustomImportOrder
rules. So, if you'll help with the review I'm happy to proceed the way
you suggested :-)

> One thing which needs extra care for ordering imports is that if you order it 
> in IDEA by right-clicking on a package and choosing organising imports, it 
> will remove special comments

You're right, but this is quite unusual behaviour for me and this
seems to be a bug, that hasn't been fixed for a long time [1]. I've
tested the same thing for Eclipse and NetBeans and `optimize imports`
working there as we expect (no comments removes), so the issue exists
only for the IntelliJ IDEA [1].
Despite all of that, we are still on the safe side here - if these
comments will be removed by the `optimized import` procedure the build
with checkstyle will fail.

> I think this is a great time to revisit this ordering.

I would say that the imports order is pretty good (probably, except
for the blank lines) and the imports order is not as important as it
is important that it be the same in all files and automation `optimize
imports`.
I suggest going through a "minimum change" strategy here. The IntelliJ
IDEA has the following configuration with the imports order that most
of the classes already fit:

import java
import javax
[blank line]
import com.google.common
import org.apache.log4j
import org.apache.commons
import org.cliffc.high_scale_lib
import org.junit
import org.slf4j
[blank line]
import all other imports
[blank line]
import static all other imports

We can update the documentation page [2] with this order and implement
the same for NetBeans and Eclipse IDE configuration files as well as
for the checkstyle config.


If everyone is OK with the plan above I'll prepare everything for it.

Suggested summary:
- use current IntelliJ IDEA imports order as defaults for other IDEs;
- update the documentation page;
- prepare a single pull request with AvoidStarImport and CustomImportOrder;



[1] 
https://youtrack.jetbrains.com/issue/IDEA-128133/Optimize-Imports-disregards-line-comments
[2] https://cassandra.apache.org/_/development/code_style.html

On Sun, 11 Dec 2022 at 00:03, Miklosovic, Stefan
 wrote:
>
> Should the source code obey the AvoidStarImport rule?
>
> yes
>
>  Should we implement AvoidStarImport and CustomImportOrder in a
> single pull request or do it one by one?
>
> Technically it can be two commits which would be merged / pushed at once.
>
> One thing which needs extra care for ordering imports is that if you order it 
> in IDEA by right-clicking on a package and choosing organising imports, it 
> will remove special comments which are put at the end of the import 
> statement. We need to be sure you put them back.  Look at changes in 
> CASSANDRA-17055. We need to preserve this.
>
> Also, we need to be sure that the importing style can be (roughly) set in 
> each major IDE. (eclipse / netbeans / idea) so if we require some import 
> style it can be set in IDE like that. I do not know if we have any strong 
> preference when it comes to this but it definitely does not hurt.
>
> Also, I see that the current import style is
>
> java
> [blank line]
> com.google.common
> org.apache.commons
> org.junit
> org.slf4j
> [blank line]
> everything else alphabetically
>
> I think this is a great time to revisit this ordering. I am not particularly 
> persuaded on this order and why it was choosen. Where has that decision come 
> from?
>
> 
> From: Maxim Muzafarov 
> Sent: Wednesday, December 7, 2022 18:29
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
>
> Dear community,
>
>
> I have created the epic with code-style activities to track the progress:
> https://issues.apache.org/jira/browse/CASSANDRA-18090
>
> In my understanding, there is no need to format whole the code base at
> once according to the code style described on the page [1], and the
> best strategy here is to go forward with small evolutionary changes.
> Thus eventually we will come up with a set of rules convenient for all
> members of the community. In my mind, having one commit per an added
> code style rule should be easy to look at for a reviewer, the git
> commits history as well as rebasing/m

Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-12-22 Thread Maxim Muzafarov
Hello everyone, have a great vacation and happy holidays to all!


I've completed a small research about how the classe's import order
rule are spread in the Apache projects. Some of the projects don't
have any restrictions over the imports even if they are using the
checkstyle configuration. The other ones may have only the consensus
over the imports, but they are not reflected in the checkstyle yet
(e.g. Kafka). The conclusion here can only be that there is a very
large variability in the classe's import order, so we have to agree on
the order on our own.

You can find the projects, IDEs and frameworks and their corresponding
classe's import order below:
https://mmuzaf.github.io/blog/Java_Import_Orders.html


Most of the time during development in an IDE the classe's imports
remains collapsed, so from my point of view the following things
related to the classe's import comes into the first place to consider:

- a PR review: newly imports must be clearly visible;
- try to minimize the total amount of affected files;
- the import order rule must be implemented in a simple way and well
supported by IDEs and its plugins;

In addition to the last mentioned option, the checkstyle itself has
some limitations also. For instance, the ImportOrder has a limitation
by design to enforce an empty line between groups ("java", "javax"),
or CustomImportOrder may have only up to 4 custom groups separated by
a blank line.



Based on all of the above I can propose the following classe's order.
All of them are tested on the latest changes from the trunk branch
(commit hash: b171b4ba294126e985d0ee629744516f19c8644e)


1. Total 2 groups, 3072 files to change

```
all other imports
[blank line]
static all other imports
```

2. Total 3 groups, 2345 files to change

```
java.*
javax.*
[blank line]
all other imports
[blank line]
static all other imports
```

3. Total 5 groups, 2968 files to change

```
org.apache.cassandra.*
[blank line]
java.*
[blank line]
javax.*
[blank line]
all other imports
[blank line]
static all other imports
```

4. Total 5 groups, 1792 files to change

```
java.*
javax.*
[blank line]
com.*
net.*
org.*
[blank line]
org.apache.cassandra.*
[blank line]
all other imports
[blank line]
static all other imports
```

5. Total 2 groups, 3114 files to change

```
java.*
javax.*
org.apache.cassandra.*
all other imports
[blank line]
static all other imports
```


Of course, any suggestions are really appreciated.
Please, share your thoughts.

On Thu, 15 Dec 2022 at 17:48, Mick Semb Wever  wrote:
>>
>> Another angle I forgot to mention is that this is quite a big patch and 
>> there are quite big pieces of work coming, being it CEP-15, for example. So 
>> I am trying to figure out if we are ok to just merge this work first and 
>> devs doing CEP-15 will need to rework their imports or we merge this after 
>> them so we will fix their stuff. I do not know what is more preferable.
>
>
>
> Thank you for bringing this point up Stefan.
>
> I would be actively reaching out to all those engaged with current CEPs, 
> asking them the rebase impact this would cause and if they are ok with it. 
> The CEPs are our priority, and we have a significant amount of them in 
> progress compared to anything we've had for many years.
>
>
>


Re: [DISCUSSION] Cassandra's code style and source code analysis

2023-01-03 Thread Maxim Muzafarov
Folks,

Let me update the voting status and put together everything we have so
far. We definitely need more votes to have a solid foundation for this
change, so I encourage everyone to consider the options above and
share them in this thread.


Total for each applicable option:

4-th option -- 4 votes
3-rd option -- 3 votes
5-th option -- 1 vote
1-st option -- 0 votes
2-nd option -- 0 votes

On Thu, 22 Dec 2022 at 22:06, Mick Semb Wever  wrote:
>>
>>
>> 3. Total 5 groups, 2968 files to change
>>
>> ```
>> org.apache.cassandra.*
>> [blank line]
>> java.*
>> [blank line]
>> javax.*
>> [blank line]
>> all other imports
>> [blank line]
>> static all other imports
>> ```
>
>
>
> 3, then 5.
> There's lots under com.*, net.*, org.* that is essentially the same as "all 
> other imports", what's the reason to separate those?
>
> My preference for 3 is simply that imports are by default collapsed, and if I 
> expand them it's the dependencies on other cassandra stuff I'm first 
> grokking. It's also our only imports that lead to cyclic dependencies (which 
> we're not good at).


Re: [DISCUSSION] Cassandra's code style and source code analysis

2023-01-16 Thread Maxim Muzafarov
Stefan,

Thank you for bringing this topic up. I'll prepare the PR shortly with
option 4, so everyone can take a look at the amount of changes. This
does not force us to go exactly this path, but it may shed light on
changes in general.

What exactly we're planning to do in the PR:

1. Checkstyle AvoidStarImport rule, so no star imports will be allowed.
2. Checkstyle ImportOrder rule, for controlling the order.
3. The IDE code style configuration for Intellij IDEA, NetBeans, and
Eclipse (it doesn't exist for Eclipse yet).
4. The import order according to option 4:

```
java.*
javax.*
[blank line]
com.*
net.*
org.*
[blank line]
org.apache.cassandra.*
[blank line]
all other imports
[blank line]
static all other imports
```



On Mon, 16 Jan 2023 at 12:39, Miklosovic, Stefan
 wrote:
>
> Based on the voting we should go with option 4?
>
> Two weeks passed without anybody joining so I guess folks are all happy with 
> that or this just went unnoticed?
>
> Let's give it time until the end of this week (Friday 12:00 UTC).
>
> Regards
>
> ____
> From: Maxim Muzafarov 
> Sent: Tuesday, January 3, 2023 14:31
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
>
> Folks,
>
> Let me update the voting status and put together everything we have so
> far. We definitely need more votes to have a solid foundation for this
> change, so I encourage everyone to consider the options above and
> share them in this thread.
>
>
> Total for each applicable option:
>
> 4-th option -- 4 votes
> 3-rd option -- 3 votes
> 5-th option -- 1 vote
> 1-st option -- 0 votes
> 2-nd option -- 0 votes
>
> On Thu, 22 Dec 2022 at 22:06, Mick Semb Wever  wrote:
> >>
> >>
> >> 3. Total 5 groups, 2968 files to change
> >>
> >> ```
> >> org.apache.cassandra.*
> >> [blank line]
> >> java.*
> >> [blank line]
> >> javax.*
> >> [blank line]
> >> all other imports
> >> [blank line]
> >> static all other imports
> >> ```
> >
> >
> >
> > 3, then 5.
> > There's lots under com.*, net.*, org.* that is essentially the same as "all 
> > other imports", what's the reason to separate those?
> >
> > My preference for 3 is simply that imports are by default collapsed, and if 
> > I expand them it's the dependencies on other cassandra stuff I'm first 
> > grokking. It's also our only imports that lead to cyclic dependencies 
> > (which we're not good at).


Re: [DISCUSSION] Cassandra's code style and source code analysis

2023-01-23 Thread Maxim Muzafarov
Hello everyone,

You can find the changes here:
https://issues.apache.org/jira/browse/CASSANDRA-17925

While preparing the code style configuration for the Eclipse IDE, I
discovered that there was no easy way to have complex grouping options
for the set of packages. So we need to add extra blank lines between
each group of packages so that all the configurations for Eclipse,
NetBeans, IntelliJ IDEA and checkstyle are aligned. I should have
checked this earlier for sure, but I only did it for static imports
and some groups, my bad. The resultant configuration looks like this:

java.*
[blank line]
javax.*
[blank line]
com.*
[blank line]
net.*
[blank line]
org.*
[blank line]
org.apache.cassandra.*
[blank line]
all other imports
[blank line]
static all other imports

The pull request is here:
https://github.com/apache/cassandra/pull/2108

The configuration-related changes are placed in a dedicated commit, so
it should be easy to make a review:
https://github.com/apache/cassandra/pull/2108/commits/84e292ddc9671a0be76ceb9304b2b9a051c2d52a



Another important thing to mention is that the total amount of changes
for organising imports is really big (more than 2000 files!), so we
need to decide the right time to merge this PR. Although rebasing or
merging changes to development branches should become much easier
("Accept local" + "Organize imports"), we still need to pay extra
attention here to minimise the impact on major patches for the next
release.

On Mon, 16 Jan 2023 at 13:16, Maxim Muzafarov  wrote:
>
> Stefan,
>
> Thank you for bringing this topic up. I'll prepare the PR shortly with
> option 4, so everyone can take a look at the amount of changes. This
> does not force us to go exactly this path, but it may shed light on
> changes in general.
>
> What exactly we're planning to do in the PR:
>
> 1. Checkstyle AvoidStarImport rule, so no star imports will be allowed.
> 2. Checkstyle ImportOrder rule, for controlling the order.
> 3. The IDE code style configuration for Intellij IDEA, NetBeans, and
> Eclipse (it doesn't exist for Eclipse yet).
> 4. The import order according to option 4:
>
> ```
> java.*
> javax.*
> [blank line]
> com.*
> net.*
> org.*
> [blank line]
> org.apache.cassandra.*
> [blank line]
> all other imports
> [blank line]
> static all other imports
> ```
>
>
>
> On Mon, 16 Jan 2023 at 12:39, Miklosovic, Stefan
>  wrote:
> >
> > Based on the voting we should go with option 4?
> >
> > Two weeks passed without anybody joining so I guess folks are all happy 
> > with that or this just went unnoticed?
> >
> > Let's give it time until the end of this week (Friday 12:00 UTC).
> >
> > Regards
> >
> > 
> > From: Maxim Muzafarov 
> > Sent: Tuesday, January 3, 2023 14:31
> > To: dev@cassandra.apache.org
> > Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
> >
> > NetApp Security WARNING: This is an external email. Do not click links or 
> > open attachments unless you recognize the sender and know the content is 
> > safe.
> >
> >
> >
> >
> > Folks,
> >
> > Let me update the voting status and put together everything we have so
> > far. We definitely need more votes to have a solid foundation for this
> > change, so I encourage everyone to consider the options above and
> > share them in this thread.
> >
> >
> > Total for each applicable option:
> >
> > 4-th option -- 4 votes
> > 3-rd option -- 3 votes
> > 5-th option -- 1 vote
> > 1-st option -- 0 votes
> > 2-nd option -- 0 votes
> >
> > On Thu, 22 Dec 2022 at 22:06, Mick Semb Wever  wrote:
> > >>
> > >>
> > >> 3. Total 5 groups, 2968 files to change
> > >>
> > >> ```
> > >> org.apache.cassandra.*
> > >> [blank line]
> > >> java.*
> > >> [blank line]
> > >> javax.*
> > >> [blank line]
> > >> all other imports
> > >> [blank line]
> > >> static all other imports
> > >> ```
> > >
> > >
> > >
> > > 3, then 5.
> > > There's lots under com.*, net.*, org.* that is essentially the same as 
> > > "all other imports", what's the reason to separate those?
> > >
> > > My preference for 3 is simply that imports are by default collapsed, and 
> > > if I expand them it's the dependencies on other cassandra stuff I'm first 
> > > grokking. It's also our only imports that lead to cyclic dependencies 
> > > (which we're not good at).


[DISCUSSION] Framework for Internal Collection Exposure and Monitoring API Alignment

2023-01-25 Thread Maxim Muzafarov
Hello Cassandra Community,


I've been faced with a number of inconsistencies in the user APIs of
the internal data collections representation exposed through the
Cassandra monitoring interfaces that need to be fully aligned from an
operator perspective. First of all, I'm highlighting JMX, Dropwizard
Metrics, and Virtual Tables user interfaces. In order to address all
these inconsistencies, I have created a draft enhancement proposal
that describes everything I have found and how we can fix it once and
for all.

I'd like to hear your opinion and thoughts on it. Please take a look:
https://docs.google.com/document/d/1j4J3bPWjQkAU9x4G-zxKObxPrKg36jLRT6xpUoNJa8Q


-- 
Maxim Muzafarov


Re: [DISCUSSION] Framework for Internal Collection Exposure and Monitoring API Alignment

2023-01-30 Thread Maxim Muzafarov
t; I took a look and I see the result is an interface that looks like the 
>>>> vtable interface, that is then used by vtables and JMX?  My first thought 
>>>> is why not just use the vtable logic?
>>>>
>>>> I also wonder about if we should care about JMX?  I know many wish to 
>>>> migrate (its going to be a very long time) away from JMX, so do we need a 
>>>> wrapper to make JMX and vtables consistent?  I am cool with something like 
>>>> the following
>>>>
>>>> registerWithJMX(jmxName, query(“SELECT * FROM system_views.streaming”));
>>>>
>>>>
>>>> So if we want to have a JMX view that matches the table then that’s cool 
>>>> by me, but one thing that has been brought up in reviews is backwards 
>>>> compatibility with regard to adding columns… If we add a column to the end 
>>>> of the JMX row did we just break users?
>>>>
>>>> Considering that JMX is usually not used and disabled in production 
>>>> environments for various performance and security reasons, the operator 
>>>> may not see the same picture from various of Dropwizard's metrics exporters
>>>>
>>>> If this is a real problem people are hitting, we can always add the 
>>>> ability to push metrics to common systems with a pluggable way to add 
>>>> non-standard solutions.  Dropwizard already support this so would be low 
>>>> hanging fruit to address this.
>>>>
>>>> To make the proposed changes backwards compatible with the previous 
>>>> version of Cassandra, all MBeans and Virtual Tables we already have will 
>>>> remain unchanged
>>>>
>>>>
>>>> If this is for new JMX endpoints moving forward, I am not sure of the 
>>>> benefit for the same reason listed above; we wish to move away from JMX
>>>>
>>>> On Jan 25, 2023, at 10:51 AM, Maxim Muzafarov  wrote:
>>>>
>>>> Hello Cassandra Community,
>>>>
>>>>
>>>> I've been faced with a number of inconsistencies in the user APIs of
>>>> the internal data collections representation exposed through the
>>>> Cassandra monitoring interfaces that need to be fully aligned from an
>>>> operator perspective. First of all, I'm highlighting JMX, Dropwizard
>>>> Metrics, and Virtual Tables user interfaces. In order to address all
>>>> these inconsistencies, I have created a draft enhancement proposal
>>>> that describes everything I have found and how we can fix it once and
>>>> for all.
>>>>
>>>> I'd like to hear your opinion and thoughts on it. Please take a look:
>>>> https://docs.google.com/document/d/1j4J3bPWjQkAU9x4G-zxKObxPrKg36jLRT6xpUoNJa8Q
>>>>
>>>>
>>>> --
>>>> Maxim Muzafarov
>>>>
>>>>


Re: [DISCUSS] API modifications and when to raise a thread on the dev ML

2023-02-02 Thread Maxim Muzafarov
Hello everyone,


I would say that having a CEP and a well-defined set of major public
API changes is a must, and the corresponding discussion of CEP is also
well-defined here [1]. This ensures that we do not miss any important
changes. Everything related to the public API is also described in the
CEP template [2].

However, if a patch adds, say, a single JMX method to expose the
metric, having an ML thread for it may seem redundant, and may shift
the focus away from the really important issues on the dev list. In
this case, I think we can add to the JIRA issue the `public API
changed` label and mention all these issues on a weekly or monthly
basis in a Cassandra status update e-mail. This will help keep the
balance between important changes and routine.


[1] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652201#CassandraEnhancementProposals(CEP)-TheProcess
[2] 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-Template#CEPTemplate-NeworChangedPublicInterfaces

On Thu, 2 Feb 2023 at 16:56, Jeremiah D Jordan
 wrote:
>
> I think we need a DISCUSS thread at minimum for API changes.  And for 
> anything changing CQL syntax, I think a CEP is warranted.  Even if it is only 
> a small change to the syntax.
>
> On Feb 2, 2023, at 9:32 AM, Patrick McFadin  wrote:
>
> API changes are near and dear to my world. The scope of changes could be 
> minor or major, so I think B is the right way forward.
>
> Not to throw off the momentum, but could this even warrant a separate CEP in 
> some cases? For example, CEP-15 is a huge change, but the CQL syntax will 
> continuously evolve with more use. Being judicious in those changes is good 
> for end users. It's also a good reference to point back to after the fact.
>
> Patrick
>
> On Thu, Feb 2, 2023 at 6:01 AM Ekaterina Dimitrova  
> wrote:
>>
>> “ Only that it locks out of the conversation anyone without a Jira login”
>> Very valid point I forgot about - since recently people need invitation in 
>> order to create account…
>> Then I would say C until we clarify the scope. Thanks
>>
>> On Thu, 2 Feb 2023 at 8:54, Benedict  wrote:
>>>
>>> I think lazy consensus is fine for all of these things. If a DISCUSS thread 
>>> is crickets, or just positive responses, then definitely it can proceed 
>>> without further ceremony.
>>>
>>> I think “with heads-up to the mailing list” is very close to B? Only that 
>>> it locks out of the conversation anyone without a Jira login.
>>>
>>> On 2 Feb 2023, at 13:46, Ekaterina Dimitrova  wrote:
>>>
>>> 
>>>
>>> While I do agree with you, I am thinking that if we include many things 
>>> that we would expect lazy consensus on I would probably have different 
>>> preference.
>>>
>>> I definitely don’t mean to stall this though so in that case:
>>> I’d say combination of A+C (jira with heads up on the ML if someone is 
>>> interested into the jira) and regular log on API changes separate from 
>>> CHANGES.txt or we can just add labels to entries in CHANGES.txt as some 
>>> other projects. (I guess this is a detail we can agree on later on, how to 
>>> implement it, if we decide to move into that direction)
>>>
>>> On Thu, 2 Feb 2023 at 8:12, Benedict  wrote:

 I think it’s fine to separate the systems from the policy? We are agreeing 
 a policy for systems we want to make guarantees about to our users 
 (regarding maintenance and compatibility)

 For me, this is (at minimum) CQL and virtual tables. But I don’t think the 
 policy differs based on the contents of the list, and given how long this 
 topic stalled for. Given the primary point of contention seems to be the 
 *policy* and not the list, I think it’s time to express our opinions 
 numerically so we can move the conversation forwards.

 This isn’t binding, it just reifies the community sentiment.

 On 2 Feb 2023, at 13:02, Ekaterina Dimitrova  wrote:

 

 “ So we can close out this discussion, let’s assume we’re only discussing 
 any interfaces we want to make promises for. We can have a separate 
 discussion about which those are if there is any disagreement.”
 May I suggest we first clear this topic and then move to voting? I would 
 say I see confusion, not that much of a disagreement. Should we raise a 
 discussion for every feature flag for example? In another thread virtual 
 tables were brought in. I saw also other examples where people expressed 
 uncertainty. I personally feel I’ll be able to take a more informed 
 decision and vote if I first see this clarified.

 I will be happy to put down a document and bring it for discussion if 
 people agree with that



 On Thu, 2 Feb 2023 at 7:33, Aleksey Yeshchenko  wrote:
>
> Bringing light to new proposed APIs no less important - if not more, for 
> reasons already mentioned in this thread. For it’s not easy to change 
> them later.
>
> Voting B.
>

Re: Implicitly enabling ALLOW FILTERING on virtual tables

2023-02-03 Thread Maxim Muzafarov
Hello Stefan,

Regarding the decision to implicitly enable ALLOW FILTERING for
virtual tables, which also makes sense to me, it may be necessary to
consider changing the clustering columns in the virtual table metadata
to regular columns as well. The reasons are the same as mentioned
earlier: the virtual tables hold their data in memory, thus we do not
benefit from the advantages of ordered data (e.g. the ClientsTable and
its ClusteringColumn(PORT)).

Changing the clustering column to a regular column may simplify the
virtual table data model, but I'm afraid it may affect users who rely
on the table metadata.



On Fri, 3 Feb 2023 at 12:32, Andrés de la Peña  wrote:
>
> I think removing the need for ALLOW FILTERING on virtual tables makes sense 
> and would be quite useful for operators.
>
> That guard exists for performance issues that shouldn't occur on virtual 
> tables. We also have a flag in case some future virtual table implementation 
> has limitations regarding filtering, although it seems it's not the case with 
> any of the existing virtual tables.
>
> It is not like we would promote bad habits because virtual tables are meant 
> to be queried by operators / administrators only.
>
>
> It might even be quite the opposite, since in the current situation users 
> might get used to routinely use ALLOW FILTERING for querying their virtual 
> tables.
>
> It has been mentioned on the #cassandra-dev Slack thread where this started 
> (1) that it's kind of an API inconsistency to allow querying by non-primary 
> keys on virtual tables without ALLOW FILTERING, whereas it's required for 
> regular tables. I think that a simply doc update saying that virtual tables, 
> which are not regular tables, support filtering would be enough. Virtual 
> tables are well identified by both the keyspace they belong to and doc, so 
> users shouldn't have trouble knowing whether a table is virtual. It would be 
> similar to the current exception for ALLOW FILTERING, where one needs to use 
> it unless the table has an index for the queried column.
>
> (1) https://the-asf.slack.com/archives/CK23JSY2K/p1675352759267329
>
> On Fri, 3 Feb 2023 at 09:09, Miklosovic, Stefan 
>  wrote:
>>
>> Hi list,
>>
>> the content of virtual tables is held in memory (and / or is fetched every 
>> time upon request). While doing queries against such table for a column 
>> outside of primary key, normally, users are required to specify ALLOW 
>> FILTERING. This makes total sense for "ordinary tables" for applications to 
>> have performant and effective queries but it kinds of loses the 
>> applicability for virtual tables when it literally holds just handful of 
>> entries in memory and it just does not matter, does it?
>>
>> What do you think about implicitly allowing filtering for virtual tables so 
>> we save ourselves from these pesky errors when we want to query arbitrary 
>> column and we need to satisfy CQL spec just to do that?
>>
>> It is not like we would promote bad habits because virtual tables are meant 
>> to be queried by operators / administrators only.
>>
>> We can also explicitly document this behavior.
>>
>> Among other options, we may try to implement secondary indices on virtual 
>> tables but I am not completely sure this is what we want because its 
>> complexity etc. Is it even necessary to put such complex logic in place just 
>> to be able to select any column on few entries in memory?
>>
>> I put together a draft here (1). It would be ever possible to implicitly 
>> allow filtering on virtual tables only and it would be implementator's 
>> responsibility to decide that, per table.
>>
>> For all virtual tables we currently have, I would enable this everywhere. I 
>> do not think there is any virtual table where we would not want to enable it 
>> or where people HAVE TO specify that.
>>
>> (1) https://github.com/apache/cassandra/pull/2131


[DISCUSS] Moving system property names to the CassandraRelevantProperties

2023-02-08 Thread Maxim Muzafarov
Hello everyone,


We are trying to clean up the source code around the direct use of
system properties and make this use more manageable and transparent.
To achieve this, I have prepared a patch that moves all system
property names to the CassandraRelevantProperties, which in turn makes
some of the properties visible to a user through the
SystemPropertiesTable virtual table.

The patch has passed a few rounds of review, but we still need another
pair of eyes to make sure we are not missing anything valuable.
Please, take a look at the patch.

You can find all the changes here:
https://issues.apache.org/jira/browse/CASSANDRA-17797


I'd also like to share the names of the properties that will appear in
the SystemPropertiesTable, the appearance of which is related to the
public API changes we agreed to discuss on the dev list.


The public API changes

Newly production system properties added:

io.netty.eventLoopThreads
io.netty.transport.estimateSizeOnSubmit
java.security.auth.login.config
javax.rmi.ssl.client.enabledCipherSuites
javax.rmi.ssl.client.enabledProtocols
ssl.enable
log4j2.disable.jmx
log4j2.shutdownHookEnabled
logback.configurationFile

Newly added and used for tests only:

invalid-legacy-sstable-root
legacy-sstable-root
org.apache.cassandra.tools.UtilALLOW_TOOL_REINIT_FOR_TEST
org.caffinitas.ohc.segmentCount
suitename
sun.stderr.encoding
sun.stdout.encoding
test.bbfailhelper.enabled
write_survey


Re: [DISCUSS] Moving system property names to the CassandraRelevantProperties

2023-02-09 Thread Maxim Muzafarov
David,

> Should” we detect this was violated and fail the build?

You are asking a good question! Sure, a new checkstyle rule was added
to address this case for production and test classes.

On Thu, 9 Feb 2023 at 19:40, David Capwell  wrote:
>
> All properties meant to be used only for tests would have a prefix like 
> "cassandra.test.name.of.property" and production properties would be 
> "cassandra.xyx". Once this is done, we can filter them out in vtable so there 
> would not be any test-related properties in production. Test properties 
> should be visible only when developing / testing Cassandra, in my opinion.
>
>
> Good point, I wouldn’t want to expose properties meant to break C* for 
> testing… that implies to users we should be using it!
>
> I understand that there is a lot of legacy in place and we can not rename 
> properties just like that for people.
>
>
> We could always look to do things like we did in Config (which you call out), 
> add a way to “migrate” to the new naming/format.  I am not sure if there is 
> enough configs to justify this, but with 5.0 happening it may be a good time 
> to think about normalizing these.
>
> We are trying to clean up the source code around the direct use of system 
> properties and make this use more manageable and transparent.
>
>
> I didn’t review, but one question I have is inline with "I would like to 
> describe the ideal state / end goal”, but from the point of view of 
> maintaining things…. If someone adds a new system property “should” they use 
> this enum?  “Should” we detect this was violated and fail the build?
>
> If we migrate now, nothing stops us from adding new, causing someone else to 
> be forced to migrate after…. Its one of the issues with the current enum, 
> once it was created authors didn’t add to it always causing patches like this 
> to try to migrate…
>
> I am not trying to block your patch, if you don’t deal with this I am +0… 
> just saying that maintaince is something we must think about
>
> On Feb 9, 2023, at 6:37 AM, Miklosovic, Stefan  
> wrote:
>
> Hi Maxim,
>
> I would like to describe the ideal state / end goal, from my perspective.
>
> All properties meant to be used only for tests would have a prefix like 
> "cassandra.test.name.of.property" and production properties would be 
> "cassandra.xyx". Once this is done, we can filter them out in vtable so there 
> would not be any test-related properties in production. Test properties 
> should be visible only when developing / testing Cassandra, in my opinion.
>
> All other system properties should also have some consistent naming in place.
>
> I understand that there is a lot of legacy in place and we can not rename 
> properties just like that for people.
>
> The approach I like is what was done to properties in cassandra.yaml. There 
> is @Replaces annotation put on properties in Config which enables users to 
> still use the old names.
>
> I can imagine that something like this would used here. If an old name is 
> specified, it would internally translate to a new name and only new names 
> would be returned by vtable. There might be also a column for old names so 
> people would know what new property the old one translates to and we should 
> also emit warning for users that the system properties they are using are in 
> the old format and they should move to the new ones.
>
> Anyway, I am glad this is happening and we are making progress. It will be 
> also way easier to dump all properties to the website when everything is 
> centralized at once place.
>
> Regards
>
> 
> From: Maxim Muzafarov 
> Sent: Wednesday, February 8, 2023 19:48
> To: dev@cassandra.apache.org
> Subject: [DISCUSS] Moving system property names to the 
> CassandraRelevantProperties
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
>
> Hello everyone,
>
>
> We are trying to clean up the source code around the direct use of
> system properties and make this use more manageable and transparent.
> To achieve this, I have prepared a patch that moves all system
> property names to the CassandraRelevantProperties, which in turn makes
> some of the properties visible to a user through the
> SystemPropertiesTable virtual table.
>
> The patch has passed a few rounds of review, but we still need another
> pair of eyes to make sure we are not missing anything valuable.
> Please, take a look at the patch.
>
> You can find all the changes here:
> https://issues.apache.org/jira/browse/CASSANDRA-17797
&g

[DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-02-21 Thread Maxim Muzafarov
Hello everyone,


I would like to share and discuss the key point of the solution design
with you before I finalise a pull request with tedious changes
remaining so that we are all on the same page with the changes to the
valuable Config class and its accessors.

Here is the issue I'm working on:
"Allow UPDATE on settings virtual table to change running configurations".
https://issues.apache.org/jira/browse/CASSANDRA-15254

Below is the restricted solution design at a very high level, all the
details have been discussed in the related JIRA issue.


= What we have now =

- We use JMX MBeans to mutate this runtime configuration during the
node run or to view the configuration values. Some of the JMX MBean
methods use camel case to match configuration field names;
- We use the SettingsTable only to view configuration values at
runtime, but we are not able to mutate the configuration through it;
- We load the configuration from cassandra.yaml into the Config class
instance during the node bootstrap (is accessed with
DatabaseDescriptor, GuardrailsOptions);
- The Config class itself has nested configurations such as
ReplicaFilteringProtectionOptions (it is important to keep this always
in mind);


= What we want to achieve =

We want to use the SettingsTable virtual table to change the runtime
configuration, as we do it now with JMX MBeans, and:
- If the affected component is updated (or the component's logic is
executed) before or after the property change, we want to keep this
behaviour for the virtual table for the same configuration property;
- We want to ensure consistency of such changes between the virtual
table API and the JMX API used;


= The main question =

To enable configuration management with the virtual table, we need to
know the answer to the following key question:
- How can we be sure to determine at runtime which of the properties
we can change and which we can't?


= Options for an answer to the question above =

1. Rely on the volatile keyword in front of fields in the Config class;

I would say this is the most confusing option for me because it
doesn't give us all the guarantees we need, and also:
- We have no explicit control over what exactly we expose to a user.
When we modify the JMX API, we're implementing a new method for the
MBean, which in turn makes this action an explicit exposure;
- The volatile keyword is not the only way to achieve thread safety,
and looks strange for the public API design point;
- A good example is the setEnableDropCompactStorage method, which
changes the volatile field, but is only visible for testing purposes;

2. Annotation-based exposition.

I have created Exposure(Exposure.Policy.READ_ONLY),
Exposure(Exposure.Policy.READ_WRITE) annotations to mark all the
configuration fields we are going to expose to the public API (JMX, as
well as the SettingsTable) in the Config class. All the configuration
fields (in the Config class and any nested classes) that we want to
expose (and already are used by JMX) need to tag with an annotation of
the appropriate type.

The most confusing thing here, apart from the number of tedious
changes: we are using reflection to mutate configuration field values
at runtime, which makes some of the fields look "unused" in the IDE.
This can be not very pleasant for developers looking at the Config
class for the first time.

You can find the PR related to this type of change here (only a few
configuration fields have been annotated for the visibility of all
changes):
https://github.com/apache/cassandra/pull/2133/files


3. Enforce setter/getter method name rules by converting these methods
in camel case to the field name with underscores.

To rely on setter methods, we need to enforce the naming rules of the
setters. I have collected information about which field names match
their camel case getter/setter methods:

total: 345
setters: 109, missed 236
volatile setters: 90, missed 255
jmx setters: 35, missed 310
getters: 139, missed 206
volatile getters: 107, missed 238
jmx getters: 63, missed 282

The most confusing part of this type of change is the number of
changes in additional classes according to the calculation above and
some difficulties with enforcing this rule for nested configuration
classes.

Find out what this change is about here:
https://github.com/apache/cassandra/pull/2172/files


= Summary =

In summary, from my point of view, the annotation approach will be the
most robust solution for us, so I'd like to continue with it. It also
provides an easy way to extend the SettingTable with additional
columns such as runtime type (READ_ONLY, READ_WRITE) and a description
column. This ends up looking much more user-friendly.

Another advantage of the annotation approach is that we can rely on
this annotation to generate dedicated dynamic JMX beans that only
respond to node configuration management to avoid any inconsistencies
like those mentioned here [2] (I have described a similar approach
here [1], but for metrics). But al

Re: [DISCUSSION] Cassandra's code style and source code analysis

2023-02-22 Thread Maxim Muzafarov
Hello everyone,

I have created an issue CASSANDRA-18277 that may help us move forward
with code style changes. It only affects the way we store the IntelliJ
code style configuration and has no effect on any current (or any)
releases, so it should be safe to merge. So, once the issue is
resolved, every developer that checkouts a release branch will use the
same code style stored in that branch. This in turn makes rebasing a
big change like the import order [1] a really straightforward matter
(by pressing Crtl + Opt + O in their local branch to organize
imports).

See:

Move the IntelliJ Idea code style and inspections configuration to the
project's root .idea directory
https://issues.apache.org/jira/browse/CASSANDRA-18277



[1] https://issues.apache.org/jira/browse/CASSANDRA-17925

On Wed, 25 Jan 2023 at 13:05, Miklosovic, Stefan
 wrote:
>
> Thank you Maxim for doing this.
>
> It is nice to see this effort materialized in a PR.
>
> I would wait until bigger chunks of work are committed to trunk (like CEP-15) 
> to not collide too much. I would say we can postpone doing this until the 
> actual 5.0 release, last weeks before it so we would not clash with any work 
> people would like to include in 5.0. This can go in anytime, basically.
>
> Are people on the same page?
>
> Regards
>
> ____
> From: Maxim Muzafarov 
> Sent: Monday, January 23, 2023 19:46
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
>
> Hello everyone,
>
> You can find the changes here:
> https://issues.apache.org/jira/browse/CASSANDRA-17925
>
> While preparing the code style configuration for the Eclipse IDE, I
> discovered that there was no easy way to have complex grouping options
> for the set of packages. So we need to add extra blank lines between
> each group of packages so that all the configurations for Eclipse,
> NetBeans, IntelliJ IDEA and checkstyle are aligned. I should have
> checked this earlier for sure, but I only did it for static imports
> and some groups, my bad. The resultant configuration looks like this:
>
> java.*
> [blank line]
> javax.*
> [blank line]
> com.*
> [blank line]
> net.*
> [blank line]
> org.*
> [blank line]
> org.apache.cassandra.*
> [blank line]
> all other imports
> [blank line]
> static all other imports
>
> The pull request is here:
> https://github.com/apache/cassandra/pull/2108
>
> The configuration-related changes are placed in a dedicated commit, so
> it should be easy to make a review:
> https://github.com/apache/cassandra/pull/2108/commits/84e292ddc9671a0be76ceb9304b2b9a051c2d52a
>
> 
>
> Another important thing to mention is that the total amount of changes
> for organising imports is really big (more than 2000 files!), so we
> need to decide the right time to merge this PR. Although rebasing or
> merging changes to development branches should become much easier
> ("Accept local" + "Organize imports"), we still need to pay extra
> attention here to minimise the impact on major patches for the next
> release.
>
> On Mon, 16 Jan 2023 at 13:16, Maxim Muzafarov  wrote:
> >
> > Stefan,
> >
> > Thank you for bringing this topic up. I'll prepare the PR shortly with
> > option 4, so everyone can take a look at the amount of changes. This
> > does not force us to go exactly this path, but it may shed light on
> > changes in general.
> >
> > What exactly we're planning to do in the PR:
> >
> > 1. Checkstyle AvoidStarImport rule, so no star imports will be allowed.
> > 2. Checkstyle ImportOrder rule, for controlling the order.
> > 3. The IDE code style configuration for Intellij IDEA, NetBeans, and
> > Eclipse (it doesn't exist for Eclipse yet).
> > 4. The import order according to option 4:
> >
> > ```
> > java.*
> > javax.*
> > [blank line]
> > com.*
> > net.*
> > org.*
> > [blank line]
> > org.apache.cassandra.*
> > [blank line]
> > all other imports
> > [blank line]
> > static all other imports
> > ```
> >
> >
> >
> > On Mon, 16 Jan 2023 at 12:39, Miklosovic, Stefan
> >  wrote:
> > >
> > > Based on the voting we should go with option 4?
> > >
> > > Two weeks passed without anybody joining so I guess folks are all happy 
> > > with that or this just went unnoticed?
> > >
> > > Let

Re: [DISCUSSION] Cassandra's code style and source code analysis

2023-02-27 Thread Maxim Muzafarov
> I suppose it can be easy for the existing feature branches if they have a 
> single commit. Don't we need to adjust each commit for multi-commit feature 
> branches?

It depends on how feature branches are maintained and developed, I
guess. My thoughts here are that the IDE's hotkeys should just work to
resolve any code-style issues that arise during rebase/maintenance.
I'm not talking about enforcing all our code-style rules but giving
developers good flexibility. The classes import order rule might be a
good example here.

On Wed, 22 Feb 2023 at 21:27, Jacek Lewandowski
 wrote:
>
> I suppose it can be easy for the existing feature branches if they have a 
> single commit. Don't we need to adjust each commit for multi-commit feature 
> branches?
>
> śr., 22 lut 2023, 19:48 użytkownik Maxim Muzafarov  
> napisał:
>>
>> Hello everyone,
>>
>> I have created an issue CASSANDRA-18277 that may help us move forward
>> with code style changes. It only affects the way we store the IntelliJ
>> code style configuration and has no effect on any current (or any)
>> releases, so it should be safe to merge. So, once the issue is
>> resolved, every developer that checkouts a release branch will use the
>> same code style stored in that branch. This in turn makes rebasing a
>> big change like the import order [1] a really straightforward matter
>> (by pressing Crtl + Opt + O in their local branch to organize
>> imports).
>>
>> See:
>>
>> Move the IntelliJ Idea code style and inspections configuration to the
>> project's root .idea directory
>> https://issues.apache.org/jira/browse/CASSANDRA-18277
>>
>>
>>
>> [1] https://issues.apache.org/jira/browse/CASSANDRA-17925
>>
>> On Wed, 25 Jan 2023 at 13:05, Miklosovic, Stefan
>>  wrote:
>> >
>> > Thank you Maxim for doing this.
>> >
>> > It is nice to see this effort materialized in a PR.
>> >
>> > I would wait until bigger chunks of work are committed to trunk (like 
>> > CEP-15) to not collide too much. I would say we can postpone doing this 
>> > until the actual 5.0 release, last weeks before it so we would not clash 
>> > with any work people would like to include in 5.0. This can go in anytime, 
>> > basically.
>> >
>> > Are people on the same page?
>> >
>> > Regards
>> >
>> > 
>> > From: Maxim Muzafarov 
>> > Sent: Monday, January 23, 2023 19:46
>> > To: dev@cassandra.apache.org
>> > Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
>> >
>> > NetApp Security WARNING: This is an external email. Do not click links or 
>> > open attachments unless you recognize the sender and know the content is 
>> > safe.
>> >
>> >
>> >
>> >
>> > Hello everyone,
>> >
>> > You can find the changes here:
>> > https://issues.apache.org/jira/browse/CASSANDRA-17925
>> >
>> > While preparing the code style configuration for the Eclipse IDE, I
>> > discovered that there was no easy way to have complex grouping options
>> > for the set of packages. So we need to add extra blank lines between
>> > each group of packages so that all the configurations for Eclipse,
>> > NetBeans, IntelliJ IDEA and checkstyle are aligned. I should have
>> > checked this earlier for sure, but I only did it for static imports
>> > and some groups, my bad. The resultant configuration looks like this:
>> >
>> > java.*
>> > [blank line]
>> > javax.*
>> > [blank line]
>> > com.*
>> > [blank line]
>> > net.*
>> > [blank line]
>> > org.*
>> > [blank line]
>> > org.apache.cassandra.*
>> > [blank line]
>> > all other imports
>> > [blank line]
>> > static all other imports
>> >
>> > The pull request is here:
>> > https://github.com/apache/cassandra/pull/2108
>> >
>> > The configuration-related changes are placed in a dedicated commit, so
>> > it should be easy to make a review:
>> > https://github.com/apache/cassandra/pull/2108/commits/84e292ddc9671a0be76ceb9304b2b9a051c2d52a
>> >
>> > 
>> >
>> > Another important thing to mention is that the total amount of changes
>> > for organising imports is really big (more than 2000 files!), so we
>> > need to decide the right time to merge this PR. Although rebasing or
>

Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-02-28 Thread Maxim Muzafarov
Folks,

If there are no objections to the approach described in this thread,
I'd like to proceed with this change. The change seems to be valuable
for the upcoming release, so any comments are really appreciated.

On Wed, 22 Feb 2023 at 21:51, David Capwell  wrote:
>
> I guess back to the point of the thread, we need a way to know what configs 
> are mutable for the settings virtual table, so need some way to denote that 
> the config replica_filtering_protection.cached_rows_fail_threshold is 
> mutable.  Given the way that the yaml config works, we can’t rely on the 
> presences of “final” or not, so need some way to mark a config is mutable for 
> that table, does anyone want to offer feedback on what works best for them?
>
> Out of all proposals given so far “volatile” is the least verbose but also 
> not explicit (as this thread is showing there is debate on if this should be 
> present), new annotations are a little more verbose but would be explicit (no 
> surprises), and getter/setters in different classes (such as DD) is the most 
> verbose and suffers from not being explicit and ambiguity for mapping back to 
> Config.
>
> Given the above, annotations sounds like the best option, but do we really 
> want our config to look as follows?
>
> @Replaces(oldName = "native_transport_idle_timeout_in_ms", converter = 
> Converters.MILLIS_DURATION_LONG, deprecated = true)
> @Mutable
> public DurationSpec.LongMillisecondsBound native_transport_idle_timeout = new 
> DurationSpec.LongMillisecondsBound("0ms”);
> @Mutable
> public DurationSpec.LongMillisecondsBound transaction_timeout = new 
> DurationSpec.LongMillisecondsBound("30s”);
> @Mutable
> public double phi_convict_threshold = 8.0;
> public String partitioner; // assume immutable by default?
>
>
> > On Feb 22, 2023, at 6:20 AM, Benedict  wrote:
> >
> > Could you describe the issues? Config that is globally exposed should 
> > ideally be immutable with final members, in which case volatile is only 
> > necessary if you’re using the config parameter in a tight loop that you 
> > need to witness a new value - which shouldn’t apply to any of our config.
> >
> > There are some weird niches, like updating long values on some (unsupported 
> > by us) JVMs that may tear. Technically you also require it for visibility 
> > with the JMM. But in practice it is mostly unnecessary. Often what seems to 
> > be a volatile issue is really something else.
> >
> >> On 22 Feb 2023, at 13:18, Benjamin Lerer  wrote:
> >>
> >> I have seen issues with some updatable parameters which were missing the 
> >> volatile keyword.
> >>
> >> Le mer. 22 févr. 2023 à 11:36, Aleksey Yeshchenko  a 
> >> écrit :
> >> FWIW most of those volatile fields, if not in fact all of them, should NOT 
> >> be volatile at all. Someone started the trend and most folks have been 
> >> copycatting or doing the same for consistency with the rest of the 
> >> codebase.
> >>
> >> Please definitely don’t rely on that.
> >>
> >>> On 21 Feb 2023, at 21:06, Maxim Muzafarov  wrote:
> >>>
> >>> 1. Rely on the volatile keyword in front of fields in the Config class;
> >>>
> >>> I would say this is the most confusing option for me because it
> >>> doesn't give us all the guarantees we need, and also:
> >>> - We have no explicit control over what exactly we expose to a user.
> >>> When we modify the JMX API, we're implementing a new method for the
> >>> MBean, which in turn makes this action an explicit exposure;
> >>> - The volatile keyword is not the only way to achieve thread safety,
> >>> and looks strange for the public API design point;
> >>> - A good example is the setEnableDropCompactStorage method, which
> >>> changes the volatile field, but is only visible for testing purposes;
> >>
> >>
>


Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-03-01 Thread Maxim Muzafarov
Thank you all for your replies. Let me add some comments too,


>From a public API perspective, we have three types of fields in the
Config class: internal use only (e.g. logger, PROPERTY_PREFIX prefix),
read-only use (e.g. cluster_name), and read-write fields that are
currently mutated with JMX. So a single @Mutable annotation is not
enough to have clear Config's field separation. Adding two annotations
@Mutable and @Immutable might solve the problem, but such an approach
leads to code duplication if we want to extend our solution in future
with additional parameters such as "description", besides having two
different annotations for the same thing might confuse developers who
are not familiar with this discussion.

So, from my point of view, the best way for us might be as follows
mentioned in the PR (the annotation name needs to reflect that the
fields are available to the public API and for a user, we can change
the name):
@Exposure(policy = Exposure.Policy.READ_WRITE)
@Exposure(policy = Exposure.Policy.READ_ONLY)

Some other names come into my mind: APIAvailable, APIExposed,
UserAvailable, UserExposed etc.


Stefan mentioned that these annotations could be used to create
documentation pages, it's true, I have the same thoughts in mind, and
you can see what it will look like at the link below (the description
annotation field will be removed from the final version of the PR, but
may still be added as part of another issue):

https://github.com/apache/cassandra/pull/2133/files#diff-e966f41bc2a418becfe687134ec8cf542eb051eead7fb4917e65a3a2e7c9bce3R392

The SettingsTable may have the following columns and be truly
self-descriptive for a user: name, value, default_value, policy, and
description.


Benedict mentioned that we could create a second class to hold such
information. The best candidate for this is the ConfigFields class,
which is based on the Config class and contains all the field names as
constants (I used a small utility class to generate it). But it will
still require some manual work, as there is no rule to distinguish
which config field is mutable and which isn't. So we would have to
update two classes instead of one (the Config class) when adding new
configuration fields, which we don't want to do.

Here it is in the PR:
https://github.com/apache/cassandra/pull/2133/files#diff-fcb4c5bc59d4bb127ffbe9f1ce566b2238c5bb92622da430a4ff879781093d3fR31

On Wed, 1 Mar 2023 at 09:21, Miklosovic, Stefan
 wrote:
>
> I am fine with annotations. I am not a big of fan of the generation. From my 
> experience whenever we wanted to generate something we had to take care of 
> the generator itself and then we had to live with what it generated (yeah, 
> that is also a thing) instead of writing it by hand once and have some 
> freedom to tweak it however we wanted. Splitting this into the second class 
> ... well, I would say that just increases the entropy.
>
> We can parse config class on these annotations and produce the documentation 
> easily. I would probably go so far that I would put that annotation on all 
> fields. We could have two - Mutable, and Immutable. But that is really 
> optional.
>
> 
> From: Benedict 
> Sent: Wednesday, March 1, 2023 9:09
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Allow UPDATE on settings virtual table to change 
> running configuration
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
> Another option would be to introduce a second class with the same fields as 
> the first where we simply specify final for immutable fields, and construct 
> it after parsing the Config.
>
> We could even generate the non-final version from the one with final fields.
>
> Not sure this would be nicer, but it is an alternative.
>
> On 1 Mar 2023, at 02:10, Ekaterina Dimitrova  wrote:
>
> 
> I agree with David that the annotations seem a bit too many but if I have to 
> choose from the three approaches - the annotations one seems most reasonable 
> to me and I didn’t have the chance to consider any others. Volatile seems 
> fragile and unclear as a differentiator. I agree
>
> On Tue, 28 Feb 2023 at 17:47, Maxim Muzafarov 
> mailto:mmu...@apache.org>> wrote:
> Folks,
>
> If there are no objections to the approach described in this thread,
> I'd like to proceed with this change. The change seems to be valuable
> for the upcoming release, so any comments are really appreciated.
>
> On Wed, 22 Feb 2023 at 21:51, David Capwell 
> mailto:dcapw...@apple.com>> wrote:
> >
> > I guess back to the point of the thread, we need a way to know what configs 
> > are mutable for the settings virtual table, so need some way 

Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-03-01 Thread Maxim Muzafarov
file is maintained and how to 
> document “groups” of configs (rather than documenting each config, we have a 
> pattern of documenting a feature or pair of configs (such as min/max targets) 
> and showing the configs that can be tweaked).  We did talk about moving to a 
> “nested” config model, but there was concerns about nesting at a feature 
> level as some features are cross cutting (so the “group” of configs may be in 
> different areas), so how we define these “groups” isn’t too clear to me… its 
> also not the common case so maybe less of a concern (if we document 
> row_index_read_size_warn_threshold but not 
> row_index_read_size_fail_threshold, is this still clear?)?
>
> > The SettingsTable may have the following columns and be truly 
> > self-descriptive for a user: name, value, default_value, policy, and 
> > description.
>
>
> If we wish to expose docs in the settings table, this would push us to define 
> these in code and no longer in conf/cassandra.yml… I am ok with this, but 
> this does increase the scope as it needs to address the existing models.  We 
> also need better clarity on compatibility with column additions… there is 
> another dev@ thread pointing out that durable tables cause downgrade issues… 
> but do vtables?  Is it safe to add columns?  I should really bring this 
> question to another thread and have us document….
>
> > On Mar 1, 2023, at 6:33 AM, Maxim Muzafarov  wrote:
> >
> > Thank you all for your replies. Let me add some comments too,
> >
> >
> > From a public API perspective, we have three types of fields in the
> > Config class: internal use only (e.g. logger, PROPERTY_PREFIX prefix),
> > read-only use (e.g. cluster_name), and read-write fields that are
> > currently mutated with JMX. So a single @Mutable annotation is not
> > enough to have clear Config's field separation. Adding two annotations
> > @Mutable and @Immutable might solve the problem, but such an approach
> > leads to code duplication if we want to extend our solution in future
> > with additional parameters such as "description", besides having two
> > different annotations for the same thing might confuse developers who
> > are not familiar with this discussion.
> >
> > So, from my point of view, the best way for us might be as follows
> > mentioned in the PR (the annotation name needs to reflect that the
> > fields are available to the public API and for a user, we can change
> > the name):
> > @Exposure(policy = Exposure.Policy.READ_WRITE)
> > @Exposure(policy = Exposure.Policy.READ_ONLY)
> >
> > Some other names come into my mind: APIAvailable, APIExposed,
> > UserAvailable, UserExposed etc.
> >
> >
> > Stefan mentioned that these annotations could be used to create
> > documentation pages, it's true, I have the same thoughts in mind, and
> > you can see what it will look like at the link below (the description
> > annotation field will be removed from the final version of the PR, but
> > may still be added as part of another issue):
> >
> > https://github.com/apache/cassandra/pull/2133/files#diff-e966f41bc2a418becfe687134ec8cf542eb051eead7fb4917e65a3a2e7c9bce3R392
> >
> > The SettingsTable may have the following columns and be truly
> > self-descriptive for a user: name, value, default_value, policy, and
> > description.
> >
> >
> > Benedict mentioned that we could create a second class to hold such
> > information. The best candidate for this is the ConfigFields class,
> > which is based on the Config class and contains all the field names as
> > constants (I used a small utility class to generate it). But it will
> > still require some manual work, as there is no rule to distinguish
> > which config field is mutable and which isn't. So we would have to
> > update two classes instead of one (the Config class) when adding new
> > configuration fields, which we don't want to do.
> >
> > Here it is in the PR:
> > https://github.com/apache/cassandra/pull/2133/files#diff-fcb4c5bc59d4bb127ffbe9f1ce566b2238c5bb92622da430a4ff879781093d3fR31
> >
> > On Wed, 1 Mar 2023 at 09:21, Miklosovic, Stefan
> >  wrote:
> >>
> >> I am fine with annotations. I am not a big of fan of the generation. From 
> >> my experience whenever we wanted to generate something we had to take care 
> >> of the generator itself and then we had to live with what it generated 
> >> (yeah, that is also a thing) instead of writing it by hand once and have 
> >> some freedom to tweak it however we wanted. Splitting this into the second 
> >> class ... 

Re: [EXTERNAL] Re: [DISCUSS] Next release date

2023-03-09 Thread Maxim Muzafarov
When I was a release manager for another Apache project, I found it
useful to create confluence pages for the upcoming release, both for
transparency of release dates and for benchmarks. Of course, the dates
can be updated when we will have a better understanding of the scope
of the release.
Do we want something similar?

Here is an example:
https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+2.10

I've also found some useful Cassandra's JIRA dashboards for previous
releases to track progress and scope, but we don't have anything
similar for the next release. Should we create it?
Cassandra 4.0GAScope
Cassandra 4.1 GA scope

Example:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=546

On Thu, 9 Mar 2023 at 10:13, Branimir Lambov  wrote:
>
> CEPs 25 (trie-indexed sstables) and 26 (unified compaction strategy) should 
> both be ready for review by mid-April.
>
> Both are around 10k LOC, fairly isolated, and in need of a committer to 
> review.
>
> Regards,
> Branimir
>
> On Mon, Mar 6, 2023 at 11:25 AM Benjamin Lerer  wrote:
>>
>> Sorry, I realized that when I started the discussion I probably did not 
>> frame it enough as I see that it is now going into different directions.
>> The concerns I am seeing are:
>> 1) A too small amount of time between releases  is inefficient from a 
>> development perspective and from a user perspective. From a development 
>> point of view because we are missing time to deliver some features. From a 
>> user perspective because they cannot follow with the upgrade.
>> 2) Some features are so anticipated (Accord being the one mentioned) that 
>> people would prefer to delay the release to make sure that it is available 
>> as soon as possible.
>> 3) We do not know how long we need to go from the freeze to GA. We hope for 
>> 2 months but our last experience was 6 months. So delaying the release could 
>> mean not releasing this year.
>> 4) For people doing marketing it is really hard to promote a product when 
>> you do not know when the release will come and what features might be there.
>>
>> All those concerns are probably even made worse by the fact that we do not 
>> have a clear visibility on where we are.
>>
>> Should we clarify that part first by getting an idea of the status of the 
>> different CEPs and other big pieces of work? From there we could agree on 
>> some timeline for the freeze. We could then discuss how to make predictable 
>> the time from freeze to GA.
>>
>>
>>
>> Le sam. 4 mars 2023 à 18:14, Josh McKenzie  a écrit :
>>>
>>> (for convenience sake, I'm referring to both Major and Minor semver 
>>> releases as "major" in this email)
>>>
>>> The big feature from our perspective for 5.0 is ACCORD (CEP-15) and I would 
>>> advocate to delay until this has sufficient quality to be in production.
>>>
>>> This approach can be pretty unpredictable in this domain; often unforeseen 
>>> things come up in implementation that can give you a long tail on something 
>>> being production ready. For the record - I don't intend to single Accord 
>>> out at all on this front, quite the opposite given how much rigor's gone 
>>> into the design and implementation. I'm just thinking from my personal 
>>> experience: everything I've worked on, overseen, or followed closely on 
>>> this codebase always has a few tricks up its sleeve along the way to having 
>>> edge-cases stabilized.
>>>
>>> Much like on some other recent topics, I think there's a nuanced middle 
>>> ground where we take things on a case-by-case basis. Some factors that have 
>>> come up in this thread that resonated with me:
>>>
>>> For a given potential release date 'X':
>>> 1. How long has it been since the last release?
>>> 2. How long do we expect qualification to take from a "freeze" (i.e. no new 
>>> improvement or features, branch) point?
>>> 3. What body of merged production ready work is available?
>>> 4. What body of new work do we have high confidence will be ready within Y 
>>> time?
>>>
>>> I think it's worth defining a loose "minimum bound and upper bound" on 
>>> release cycles we want to try and stick with barring extenuating 
>>> circumstances. For instance: try not to release sooner than maybe 10 months 
>>> out from a prior major, and try not to release later than 18 months out 
>>> from a prior major. Make exceptions if truly exceptional things land, are 
>>> about to land, or bugs are discovered around those boundaries.
>>>
>>> Applying the above framework to what we have in flight, our last release 
>>> date, expectations on CI, etc - targeting an early fall freeze (pending CEP 
>>> status) and mid to late fall or December release "feels right" to me.
>>>
>>> With the exception, of course, that if something merges earlier, is stable, 
>>> and we feel is valuable enough to cut a major based on that, we do it.
>>>
>>> ~Josh
>>>
>>> On Fri, Mar 3, 2023, at 7:37 PM, German Eichberger via dev wrote:
>>>
>>> Hi,
>>>
>>> We shouldn't release just for releases sake.

Re: [DISCUSS] Moving system property names to the CassandraRelevantProperties

2023-03-21 Thread Maxim Muzafarov
Hello everyone,

This a friendly reminder that some help is still needed with the review :-)
I have resolved all the conflicts that have arisen in the last month or two.

If you'd like to invest some time in code clarity, please take a look:
https://github.com/apache/cassandra/pull/2046/files

On Wed, 8 Feb 2023 at 19:48, Maxim Muzafarov  wrote:
>
> Hello everyone,
>
>
> We are trying to clean up the source code around the direct use of
> system properties and make this use more manageable and transparent.
> To achieve this, I have prepared a patch that moves all system
> property names to the CassandraRelevantProperties, which in turn makes
> some of the properties visible to a user through the
> SystemPropertiesTable virtual table.
>
> The patch has passed a few rounds of review, but we still need another
> pair of eyes to make sure we are not missing anything valuable.
> Please, take a look at the patch.
>
> You can find all the changes here:
> https://issues.apache.org/jira/browse/CASSANDRA-17797
>
>
> I'd also like to share the names of the properties that will appear in
> the SystemPropertiesTable, the appearance of which is related to the
> public API changes we agreed to discuss on the dev list.
>
>
> The public API changes
>
> Newly production system properties added:
>
> io.netty.eventLoopThreads
> io.netty.transport.estimateSizeOnSubmit
> java.security.auth.login.config
> javax.rmi.ssl.client.enabledCipherSuites
> javax.rmi.ssl.client.enabledProtocols
> ssl.enable
> log4j2.disable.jmx
> log4j2.shutdownHookEnabled
> logback.configurationFile
>
> Newly added and used for tests only:
>
> invalid-legacy-sstable-root
> legacy-sstable-root
> org.apache.cassandra.tools.UtilALLOW_TOOL_REINIT_FOR_TEST
> org.caffinitas.ohc.segmentCount
> suitename
> sun.stderr.encoding
> sun.stdout.encoding
> test.bbfailhelper.enabled
> write_survey


Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-03-29 Thread Maxim Muzafarov
Hello everyone,


It seems to me that we need another consensus to make the
SettingsTable virtual table updatable. There is an issue with
validating configuration properties that blocks our implementation
with the virtual table.

A short example of validating the values loaded from the YAML file:
- the DurationSpec.LongMillisecondsBound itself requires input quantity >= 0;
- the read_request_timeout Config field with type
DurationSpec.LongMillisecondsBound requires quantity >=
LOWEST_ACCEPTED_TIMEOUT (10ms);

When the read_request_timeout field is modified using JMX, only a
DurationSpec.LongMillisecondsBound type validation is performed and
there is no LOWEST_ACCEPTED_TIMEOUT validation. If we implement the
SettingsTable properties validation in the same way, we just add
another discrepancy.


If we go a little deeper, we are currently validating a configuration
property in the following parts of the code, which makes things even
worse:
- in a property type itself if it's not primitive, e.g.
DataStorageSpec#validateQuantity;
- rarely in nested configuration classes e.g.
AuditLogOptions#validateCategories;
- during the configuration load from yaml-file for null, and non-null,
see YamlConfigurationLoader.PropertiesChecker#check;
- during applying the configuration, e.g. DatabaseDescriptor#applySimpleConfig;
- in DatabaseDescriptor setter methods e.g.
DatabaseDescriptor#setDenylistMaxKeysTotal;
- inside custom classes e.g. SSLFactory#validateSslContext;
- rarely inside JMX methods itself, e.g. StorageService#setRepairRpcTimeout;


To use the same validation path for configuration properties that are
going to be changed through SettingsTable, we need to arrange a common
validation process for each property to rely on, so that the
validation path will be the same regardless of the public interface
used (YAML, JMX, or Virtual Table).

In general, I'd like to avoid building a complex validation framework
for Cassandra's configuration, as the purpose of the project is not to
maintain the configuration itself, so the simpler the validation of
the properties will be, the easier the configuration will be to
maintain.


We might have the following options for building the validation
process, and each of them has its pros and cons:

== 1. ==

Add new annotations to build the property's validation rules (as it
was suggested by David)
@Max, @Min, @NotNull, @Size, @Nullable (already have this one), as
well as custom validators etc.

@Min(5.0) @Max(16.0)
public volatile double phi_convict_threshold = 8.0;

An example of such an approach is the Dropwizard Configuration library
(or Hibernate, Spring)
https://www.dropwizard.io/en/latest/manual/validation.html#annotations


== 2. ==

Add to the @Mutable the set (or single implementation) of validations
it performs, which is closer to what we have now.
As an alternative to having a single class for each constraint, we can
have an enumeration list that stores the same implementations.

public @interface Mutable {
  Class>[] constraints() default {};
}

public class NotNullConstraint implements ConfigurationConstraint {
public void validate(Object newValue) {
if (newValue == null)
throw new IllegalArgumentException("Value cannot be null");
}
}

public class PositiveConstraint implements ConfigurationConstraint {
public void validate(Object newValue) {
if (newValue instanceof Number && ((Number) newValue).intValue() <= 0)
throw new IllegalArgumentException("Value must be positive");
}
}

@Mutable(constraints = { NotNullConstraint.class, PositiveConstraint.class })
public volatile Integer concurrent_compactors;

Something similar is performed for Custom Constraints in Hibernate.
https://docs.jboss.org/hibernate/stable/validator/reference/en-US/html_single/#section-constraint-validator


== 3. ==

Enforce setter method names to match the corresponding property name.
This will allow us to call the setter method with reflection, and the
method itself will do all the validation we need.

public volatile int key_cache_keys_to_save;

public static void setKeyCacheKeysToSave(int keyCacheKeysToSave)
{
if (keyCacheKeysToSave < 0)
throw new IllegalArgumentException("key_cache_keys_to_save
must be positive");
conf.key_cache_keys_to_save = keyCacheKeysToSave;
}


I think the options above are the most interesting for us, but if you
have something more appropriate, please share. From my point of view,
option 2 is the most appropriate here, as it fits with everything we
have discussed in this thread. However, they are all fine to go with.

I'm looking forward to hearing your thoughts.

On Tue, 21 Feb 2023 at 22:06, Maxim Muzafarov  wrote:
>
> Hello everyone,
>
>
> I would like to share and discuss the key point of the solution design
> with you before I finalise a pull request with tedious changes
> remaining so that we are 

Re: [COMPRESSION PARAMETERS] Question

2023-04-19 Thread Maxim Muzafarov
Hello Claude,

I have seen two options and the option you mentioned is probably the
third from ways of disabling a feature :-)

So, we have

1.
public class TransparentDataEncryptionOptions
{
public boolean enabled = false;
public ParameterizedClass key_provider;
}

2.
public boolean cdc_enabled = false;
public boolean materialized_views_enabled = false;


So, in my humble opinion, I guess both approaches are used for now and
as the discussion [1] is not finished we can probably use one of them
for the case you mentioned, so either create a nested wrapper class or
keep it plain with the right prefix e.g. hints_compression_enabled.


Move cassandra.yaml toward a nested structure around major database concepts
[1] https://issues.apache.org/jira/browse/CASSANDRA-17292

On Wed, 19 Apr 2023 at 14:07, Claude Warren, Jr via dev
 wrote:
>
> Currently the compression parameters has an option called enable.  When 
> enable=false all the other options have to be removed.  But it seems to me 
> that we should support enabled=false without removing all the other 
> parameters so that users can disable the compression for testing or problem 
> resolution without losing an of the other parameter settings.  So to be clear:
>
> The following is valid:
> hints_compression:
> - class_name: foo
>   parameters:
>- chunk_length_in_kb : 16 ;
>
> But this is not:
> hints_compression:
> - class_name: foo
>   parameters:
>- chunk_length_in_kb : 16 ;
>   enabled : false ;
>
> Currently when enabled is set false is constructs a default CompressionParam 
> object with the class name set to null.
>
> Is there a reason to keep this or should we accept the parameters and 
> construct a CompressionParam with the parameters while continuing to set the 
> class name to null?
>
> Claude


Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-05-01 Thread Maxim Muzafarov
- We have custom configuration datatypes such as DataStorageSpec,
DataStorageSpec;
- We have custom DurationSpec, so we either move them to Duration,
preserving backwards compatibility for all supported APIs (yaml, JMX),
or extend a considered framework with new types, we have to provide
data type converters in the latter case;
- An additional dependency, so the key component (configuration) of
the project becomes dependent on an external library version;
- We have to deal with configuration defaults calculated during
initialisation to maintain backward compatibility;

The frameworks I have looked at:
- commons-configuration
https://github.com/apache/commons-configuration
- lightbend config
https://github.com/lightbend/config
- Netflix archaius
https://github.com/Netflix/archaius


The Apache Commons configuration from this list sounds less risky to
us as we already have dependencies like commons-codec, commons-cli
etc. The approach of how configuration fields are used in the
Cassandra project is closer to the way the commons-configuration
library maintains them, so we can replace the ConfigurationSource
layer from the design with AbstractConfiguration
(commons-configuration), keeping the same properties validation design
concept.

The Apache Commons configuration provides Duration configuration types
that look similar to the DurationSpec in Cassandra. Support/having
both types in the case of we're going this library for the same
abstraction confuses those who will be dealing with the configuration
API in the internal code, so some kind of migration is still required
here as well as creating custom adapters to support backwards
compatibility. This is a HUGE change that helps to create an API for
internal configuration usage for both Cassandra and sub-projects (e.g.
Accord), but still does not solve the problem of availability of
custom configuration datatypes (DataStorageSpec, DataStorageSpec) for
sub-projects.

As a result of trying to implement commons-configuration as an
internal API, I have come to the conclusion that the number of changes
and compromises that need to be agreed upon will be very large in this
case. So unless I'm missing something, the proposed design is our
best.


Thoughts?

On Thu, 30 Mar 2023 at 01:42, Maxim Muzafarov  wrote:
>
> Hello everyone,
>
>
> It seems to me that we need another consensus to make the
> SettingsTable virtual table updatable. There is an issue with
> validating configuration properties that blocks our implementation
> with the virtual table.
>
> A short example of validating the values loaded from the YAML file:
> - the DurationSpec.LongMillisecondsBound itself requires input quantity >= 0;
> - the read_request_timeout Config field with type
> DurationSpec.LongMillisecondsBound requires quantity >=
> LOWEST_ACCEPTED_TIMEOUT (10ms);
>
> When the read_request_timeout field is modified using JMX, only a
> DurationSpec.LongMillisecondsBound type validation is performed and
> there is no LOWEST_ACCEPTED_TIMEOUT validation. If we implement the
> SettingsTable properties validation in the same way, we just add
> another discrepancy.
>
>
> If we go a little deeper, we are currently validating a configuration
> property in the following parts of the code, which makes things even
> worse:
> - in a property type itself if it's not primitive, e.g.
> DataStorageSpec#validateQuantity;
> - rarely in nested configuration classes e.g.
> AuditLogOptions#validateCategories;
> - during the configuration load from yaml-file for null, and non-null,
> see YamlConfigurationLoader.PropertiesChecker#check;
> - during applying the configuration, e.g. 
> DatabaseDescriptor#applySimpleConfig;
> - in DatabaseDescriptor setter methods e.g.
> DatabaseDescriptor#setDenylistMaxKeysTotal;
> - inside custom classes e.g. SSLFactory#validateSslContext;
> - rarely inside JMX methods itself, e.g. StorageService#setRepairRpcTimeout;
>
>
> To use the same validation path for configuration properties that are
> going to be changed through SettingsTable, we need to arrange a common
> validation process for each property to rely on, so that the
> validation path will be the same regardless of the public interface
> used (YAML, JMX, or Virtual Table).
>
> In general, I'd like to avoid building a complex validation framework
> for Cassandra's configuration, as the purpose of the project is not to
> maintain the configuration itself, so the simpler the validation of
> the properties will be, the easier the configuration will be to
> maintain.
>
>
> We might have the following options for building the validation
> process, and each of them has its pros and cons:
>
> == 1. ==
>
> Add new annotations to build the property's validation rules (as it
> was suggested by David)
> @Max, @Min, @NotNull, @Size

Re: [DISCUSS] Moving system property names to the CassandraRelevantProperties

2023-05-18 Thread Maxim Muzafarov
Hello everyone,


Thanks for following this thread and the review, all the system
properties have been moved to CassandraRelevantProperties.
So you can find out what it looks like from the following link:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/config/CassandraRelevantProperties.java#L38


I would like to show you a few more steps in this thread so that the
solution is generally complete. As you may have noticed, we have three
types of system properties: cassandra properties used in production
environments, cassandra properties used for testing only, and
non-cassandra properties. I would like to reuse the @Replaces
annotation to rename cassandra-related properties according to the
following pattern: the 'cassandra.' prefix is for production
properties, and the 'cassandra.test' prefix is for testing properties.

This makes the results of the SystemPropertiesTable virtual table look
more natural to users. I thinks we should include this change for the
5.0 release.
WDYT?


The other code clarity minor improvements to do:

1.
Use WithProperties to ensure that system properties are handled
https://issues.apache.org/jira/browse/CASSANDRA-18453

2.
As a draft agreement, the CassandraRelevantProperties and
CassandraRelevantEnv (and probably DatabaseDescriptor) could share the
same interface to access the system properties, configuration
properties, and/or environment variables. The idea is still in draft
form, so I'm mentioning it here to keep it in context. Will come back
to it when more details are available.
This is what it might look like:
https://github.com/apache/cassandra/pull/2300/files#diff-6b7db8438314143a1b6b1c8c58901a4e3954af8cdd294ca8853a1001c1f4R70

On Fri, 31 Mar 2023 at 07:08, Jacek Lewandowski
 wrote:
>
> I'll do
>
> - - -- --- -  -
> Jacek Lewandowski
>
>
> czw., 30 mar 2023 o 22:09 Miklosovic, Stefan  
> napisał(a):
>>
>> Hi list,
>>
>> we are looking for one more committer to take a look at this patch (1, 2).
>>
>> It looks like there is a lot to go through because of number of files 
>> modified (around 200) but changes are really just about moving everything to 
>> CassandraRelevantProperties. I do not think that it should take more than 1 
>> hour of dedicated effort and we are done!
>>
>> Thanks in advance to whoever reviews this.
>>
>> I want to especially thank Maxim for his perseverance in this matter and I 
>> hope we will eventually deliver this work to trunk.
>>
>> (1) https://github.com/apache/cassandra/pull/2046
>> (2) https://issues.apache.org/jira/browse/CASSANDRA-17797
>>
>> Regards
>>
>> Regards
>>
>> 
>> From: Miklosovic, Stefan 
>> Sent: Wednesday, March 22, 2023 14:34
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSS] Moving system property names to the 
>> CassandraRelevantProperties
>>
>> NetApp Security WARNING: This is an external email. Do not click links or 
>> open attachments unless you recognize the sender and know the content is 
>> safe.
>>
>>
>>
>>
>> Hi Maxim,
>>
>> thanks for letting us know.
>>
>> I reviewed it couple months ago but I can revisit it to double check. We 
>> need the second reviewer. Until we find somebody, we can not merge this.
>>
>> If anybody wants to take a look, it would be awesome. It seems like a lot of 
>> changes / files touched but it is just about centralizing all properties 
>> scattered around the code base into one place.
>>
>> Regards
>>
>> 
>> From: Maxim Muzafarov 
>> Sent: Tuesday, March 21, 2023 22:59
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSS] Moving system property names to the 
>> CassandraRelevantProperties
>>
>> NetApp Security WARNING: This is an external email. Do not click links or 
>> open attachments unless you recognize the sender and know the content is 
>> safe.
>>
>>
>>
>>
>> Hello everyone,
>>
>> This a friendly reminder that some help is still needed with the review :-)
>> I have resolved all the conflicts that have arisen in the last month or two.
>>
>> If you'd like to invest some time in code clarity, please take a look:
>> https://github.com/apache/cassandra/pull/2046/files
>>
>> On Wed, 8 Feb 2023 at 19:48, Maxim Muzafarov  wrote:
>> >
>> > Hello everyone,
>> >
>> >
>> > We are trying to clean up the source code around the direct use of
>> > system properties and make this use more manageable and t

[DISCUSSION] Adding sonar report analysis to the Cassandra project

2023-06-12 Thread Maxim Muzafarov
Hello everyone,

I would like to make the source code of the Cassandra project more
visible to people outside of the Cassandra Community and highlight the
typical known issues in new contributions in the GitHub pull-request
interface as well. This makes it easier for those who are unfamiliar
with the accepted code style and just want to be part of a large and
friendly community to add new contributions.

The ASF provides [1] the SonarClound facilities for the open source
project, which are free to use, and we can also easily add the process
of building and uploading reports to the build using GitHub actions
with almost no maintenance costs for us. Of course, as a
recommendation quality tool, it doesn't reject any changes/pull
requests, so nothing will change from that perspective.

I've prepared everything we need to do this here (we also need to
modify the default Sonar Way profile to suit our needs, which I can't
do as I don't have sufficient privileges):
https://issues.apache.org/jira/browse/CASSANDRA-18390

I look forward to hearing your thoughts on this.


Examples:

I did the same way for the Apache Ignite project, and here is how in
the end it may look like.
For the pull-requests queue:
https://sonarcloud.io/project/pull_requests_list?id=apache_ignite

The report itself for a pull request (the aggregation is used):
https://github.com/apache/ignite/pull/10769

The main branch quality gate profile:
https://sonarcloud.io/summary/overall?id=apache_ignite


In addition to this:

A developer can configure the SonarLint IDE plugin (available for
IntelliJ IDEA, Eclipse) to retrieve Cassandra's quality profiles and
configured rules from the sonarcloud.io resource and highlight any
violated warnings locally, making it easier to develop a new patch.


[1] 
https://cwiki.apache.org/confluence/display/INFRA/SonarCloud+for+ASF+projects


Re: [DISCUSS] Moving system property names to the CassandraRelevantProperties

2023-06-13 Thread Maxim Muzafarov
Hello everyone,

I have created the following JIRA issues to follow up on the
discussion and to improve the user experience with virtual tables.
I'll try to address them before the next release.

CASSANDRA-18586: CQLSH formatting output is incorrect when querying
the system properties virtual table
https://issues.apache.org/jira/browse/CASSANDRA-18586

CASSANDRA-18587: Align all system property names used by the project
to use the 'cassandra' prefix
https://issues.apache.org/jira/browse/CASSANDRA-18587


I want also to clarify and discuss with you some points related to the
SystemPropertiesTable virtual table.

1. Hide non-production environment properties in the SystemProperties
virtual table.

As you may know, the result of the query on the virtual table
currently includes the environment properties related to tests as well
(used internally for our testing purposes) ~ 42 out of 290. This seems
a bit redundant for production use and floods the query output with
the things you don't need to think about. I think we can add a new
property -Dcassandra.vt.show.test.system.properties (false by default)
properties to hide these test-related properties for production
environments while still using them for our test runs and build
scripts. Hiding test properties is not a regression in this case. Any
thoughts?


2. The absence of a description of the system properties.

I have found that there is no good description of the system
properties used to configure production environments. Is there any
documentation for this case? Do we need to keep these pages up to
date?

Here are some links I found:
https://docs.datastax.com/en/dse/6.8/dse-dev/datastax_enterprise/config/cassandraSystemProperties.html
https://cassandra.apache.org/doc/4.1/cassandra/getting_started/configuring.html#environment-variables

On Thu, 18 May 2023 at 16:03, Miklosovic, Stefan
 wrote:
>
> Hi Maxim,
>
> thanks for bringing this up. I am glad you did the heavy-lifting in / around 
> CassandraRelevantProperties and we can build on top of this.
>
> I am fine with @Replaces for Cassandra system properties. After we put 
> everything into CassandraRelevantProperties, one can easily see that there 
> are great inconsistencies in properties' naming. As we still need to support 
> the old names too, using @Replaces, the similar mechanism we used in 
> DatabaseDescriptor, seems like the ideal solution.
>
> By the way, when somebody queries system_views.system_properties, it looks 
> very strange in CQL shell, the formatting is just broken. EXPAND ON; does not 
> help either. It is quite hard to parse this visually if a user wants to see 
> them all. The reason is that there is "java.class.path" property for which 
> the value is so long that it basically breaks the output.
>
> Another solution would be to fix the output but I am not sure how it would 
> look like.
>
> As we are going to rename them to have same prefixes, could not we remodel 
> that table as well? For example:
>
> https://gist.github.com/smiklosovic/de662b7faa25e1fdd56805cdb5ba80a7
>
> Feel free to come up with a different approach.
>
> By doing this, it would be way easier to get just Cassandra properties or 
> just properties for tests or just Java properties and selecting just the 
> first two groups would not break CQLSH. It is nice that it would have same 
> prefix but I am trying to find a way how to utilize the same prefix in CQLSH 
> as well.
>
> 
> From: Maxim Muzafarov 
> Sent: Thursday, May 18, 2023 12:54
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Moving system property names to the 
> CassandraRelevantProperties
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
>
> Hello everyone,
>
>
> Thanks for following this thread and the review, all the system
> properties have been moved to CassandraRelevantProperties.
> So you can find out what it looks like from the following link:
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/config/CassandraRelevantProperties.java#L38
>
>
> I would like to show you a few more steps in this thread so that the
> solution is generally complete. As you may have noticed, we have three
> types of system properties: cassandra properties used in production
> environments, cassandra properties used for testing only, and
> non-cassandra properties. I would like to reuse the @Replaces
> annotation to rename cassandra-related properties according to the
> following pattern: the 'cassandra.' prefix is for production
> properties, and the 'cassandra.test' prefix is for testing properties.
>
> This makes the re

Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-06-23 Thread Maxim Muzafarov
Hello everyone,


As there is a lack of feedback for an option to go on with and having
a discussion for pros and cons for each option I tend to agree with
the vision of this problem proposed by David :-) After a lot of
discussion on Slack, we came to the @ValidatedBy annotation which
points to a validation method of a property and this will address all
our concerns and issues with validation.

I'd like to raise the visibility of these changes and try to find one
more committer to look at them:
https://issues.apache.org/jira/browse/CASSANDRA-15254
https://github.com/apache/cassandra/pull/2334/files

I'd really appreciate any kind of review in advance.


Despite the number of changes +2,043 −302 and the fact that most of
these additions are related to the tests themselves, I would like to
highlight the crucial design points which are required to make the
SettingsTable virtual table updatable. Some of these have already been
discussed in this thread, and I would like to provide a brief outline
of these points to facilitate the PR review.

So, what are the problems that have been solved to make the
SettingsTable updatable?

1. Input validation.

Currently, the JMX, Yaml and DatabaseDescriptor#apply methods perform
the same validation of user input for the same property in their own
ways which fortunately results in a consistent configuration state,
but not always. The CASSANDRA-17734 is a good example of this.

The @ValidatedBy annotations, which point to a validation method have
been added to address this particular problem. So, no matter what API
is triggered the method will be called to validate input and will also
work even if the cassandra.yaml is loaded by the yaml engine in a
pre-parse state, such as we are now checking input properties for
deprecation and nullability.

There are two types of validation worth mentioning:
- stateless - properties do not depend on any other configuration;
- stateful - properties that require a fully-constructed Config
instance to be validated and those values depend on other properties;

For the sake of simplicity, the scope of this task will be limited to
dealing with stateless properties only, but stateful validations are
also supported in the initial PR using property change listeners.

2. Property mutability.

There is no way of distinguishing which parts of a property are
mutable and which are not. This meta-information must be available at
runtime and as we discussed earlier the @Mutable annotation is added
to handle this.

3. Listening for property changes.

Some of the internal components e.g. CommitLog, may perform some
operations and/or calculations just before or just after the property
change. As long as JMX is the only API used to update configuration
properties, there is no problem. To address this issue the observer
pattern has been used to maintain the same behaviour.

4. SettingsTable input/output format.

JMX, SettingsTable and Yaml accept values in different formats which
may not be compatible in some of the cases especially when
representing composite objects. The former uses toString() as an
output, and the latter uses a yaml human-readable format.

So, in order to see the same properties in the same format through
different APIs, the Yaml representation is reused to display the
values and to parse a user input in case of update/set operations.

Although the output format between APIs matches in the vast majority
of cases here is the list of configuration properties that do not
match:
- memtable.configurations
- sstable_formats
- seed_provider.parameters
- data_file_directories

The test illustrates the problem:
https://github.com/apache/cassandra/pull/2334/files#diff-e94bb80f12622412fff9d87b58733e0549ba3024a54714516adc8bc70709933bR319

This could be a regression as the output format is changed, but this
seems to be the correct change to go with. We can keep it as is, or
create SettingsTableV2, which seems to be unlikely.

The examples of the format change:
-
Property 'seed_provider.parameters' expected:
 {seeds=127.0.0.1:7012}
Property 'seed_provider.parameters' actual:
 seeds: 127.0.0.1:7012
-
Property 'data_file_directories' expected:
 [Ljava.lang.String;@436813f3
Property 'data_file_directories' actual:
 [build/test/cassandra/data]
-----

On Mon, 1 May 2023 at 15:11, Maxim Muzafarov  wrote:
>
> Hello everyone,
>
>
> I want to continue this topic and share another properties validation
> option/solution that emerged from my investigation of Cassandra and
> Accord configuration that could be used to make the virtual table
> SettingTable updatable, as each update must move Config from one
> consistent state to another. The solution is based on a few
> assumptions: we don't frequently update the running configuration, and
> we want to ba

Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-06-26 Thread Maxim Muzafarov
Hello everyone,

We can replace RAT with the appropriate checkstyle rule - the HeaderCheck,
I think. This will reduce the number of tools we now use and reduce the
build time as only modified files will be checked, and this, in turn, will
remove some of the concerns mentioned in the first message.
https://checkstyle.org/apidocs/com/puppycrawl/tools/checkstyle/checks/header/HeaderCheck.html



On Mon, 26 Jun 2023 at 13:48, Berenguer Blasi 
wrote:

> Just for awareness if you rebase thanks to CASSANDRA-18588 checkstyle
> shouldn't be a problem anymore. If it is still let me know and I can look
> into it.
> On 26/6/23 13:11, Jacek Lewandowski wrote:
>
> Yes, I've mentioned that there is a property we can set to skip checkstyle.
>
> Currently such a goal is "artifacts" which basically validates everything.
>
>
> - - -- --- -  -
> Jacek Lewandowski
>
>
> pon., 26 cze 2023 o 13:09 Mike Adamson  napisał(a):
>
>> While I like the idea of this because of added time these checks take, I
>> was under the impression that checkstyle (at least) can be disabled with a
>> flag.
>>
>> If we did do this, would it make sense to have a "release"  or "commit"
>> target (or some other name) that ran a full build with all checks that can
>> be used prior to pushing changes?
>>
>> On Mon, 26 Jun 2023 at 08:35, Berenguer Blasi 
>> wrote:
>>
>>> I would prefer sthg that is totally transparent to me and not add one
>>> more step I have to remember. Just to push/run CI to find out I missed it
>>> and rinse and repeat... With the recent fix to checkstyle I am happy as
>>> things stand atm. My 2cts
>>> On 26/6/23 8:43, Jacek Lewandowski wrote:
>>>
>>> Hi,
>>>
>>>
>>> The context is that we currently have 3 checks in the build:
>>>
>>> - Checkstyle,
>>>
>>> - Eclipse-Warnings,
>>>
>>> - RAT
>>>
>>>
>>> CheckStyle and RAT are executed with almost every target we run: build,
>>> jar, test, test-some, testclasslist, etc.; on the other hand,
>>> Eclipse-Warnings is executed automatically only with the artifacts target.
>>>
>>>
>>> Checkstyle currently uses some caching, so subsequent reruns without
>>> cleaning the project validate only the modified files.
>>>
>>>
>>> Both CI - Jenkins and Circle forces running all checks.
>>>
>>>
>>> I want to discuss whether you are ok with extracting all checks to their
>>> distinct target and not running it automatically with the targets which
>>> devs usually run locally. In particular:
>>>
>>>
>>>
>>>- "build", "jar", and all "test" targets would not trigger
>>>CheckStyle, RAT or Eclipse-Warnings
>>>- A new target "check" would trigger all CheckStyle, RAT, and
>>>Eclipse-Warnings
>>>- The new "check" target would be run along with the "artifacts"
>>>target on Jenkins-CI, and it as a separate build step in CircleCI
>>>
>>>
>>> The rationale for that change is:
>>>
>>>- Running all the checks together would be more consistent, but
>>>running all of them automatically with build and test targets could waste
>>>time when we develop something locally, frequently rebuilding and running
>>>tests.
>>>- On the other hand, it would be more consistent if the build did
>>>what we want - as a dev, when prototyping, I don't want to be forced to 
>>> run
>>>analysis (and potentially fix issues) whenever I want to build a project 
>>> or
>>>just run a single test.
>>>- There are ways to avoid running checks automatically by specifying
>>>some build properties. Though, the discussion is about the default 
>>> behavior
>>>- on the flip side, if one wants to run the checks along with the 
>>> specified
>>>target, they could add the "check" target to the command line.
>>>
>>>
>>> The rationale for keeping the checks running automatically with every
>>> target is to reduce the likelihood of not running the checks locally before
>>> pushing the branch and being surprised by failing CI soon after starting
>>> the build.
>>>
>>>
>>> That could be fixed by running checks in a pre-push Git hook. There are
>>> some benefits of this compared to the current behavior:
>>>
>>>- the checks would be run automatically only once
>>>- they would be triggered even for those devs who do everything in
>>>IDE and do not even touch Ant commands directly
>>>
>>>
>>> Checks can take time; to optimize that, they could be enforced locally
>>> to verify only the modified files in the same way as we currently determine
>>> the tests to be repeated for CircleCI.
>>>
>>> Thanks
>>> - - -- --- -  -
>>> Jacek Lewandowski
>>>
>>>
>>
>> --
>> [image: DataStax Logo Square]  *Mike Adamson*
>> Engineering
>>
>> +1 650 389 6000 <16503896000> | datastax.com 
>> Find DataStax Online: [image: LinkedIn Logo]
>> 

[DISCUSS] Maintain backwards compatibility after dependency upgrade in the 5.0

2023-06-27 Thread Maxim Muzafarov
Hello everyone,


We use the Dropwizard Metrics 3.1.5 library, which provides a basic
set of classes to easily expose Cassandra internals to a user through
various interfaces (the most common being JMX). We want to upgrade
this library version in the next major release 5.0 up to the latest
stable 4.2.19 for the following reasons:
- the 3.x (and 4.0.x) Dropwizard Metrics library is no longer
supported, which means that if we face a critical CVE, we'll still
need to upgrade, so it's better to do it sooner and more calmly;
- as of 4.2.5 the library supports jdk11, jdk17, so we will be in-sync
[1] as well as having some of the compatibility fixes mentioned in the
related JIRA [2];
- there have been a few user-related requests [3][4] whose
applications collide with the old version of the library, we want to
help them;


The problem

The problem with simply upgrading is that the JmxReporter class of the
library has moved from the com.codahale.metrics package in the 3.x
release to the com.codahale.metrics.jmx package in the 4.x release.
This is a problem for applications/tools that rely on the cassandra
classpath (lib/jars) as after the upgrade they may be looking for the
JmxReporter class which has changed its location.

A good example of the problem that we (or a user) may face after the
upgrade is our tests and the cassandra-driver-core 3.1.1, which uses
the old 3.x version of the library in tests. Of course, in this case,
we can upgrade the cassandra driver up to 4.x [5][6] to fix the
problem, as the new driver uses a newer version of the library, but
that's another story I won't go into for now. I'm talking more about
visualising the problem a user might face after upgrading to 5.0 if
he/she rely on the cassandra classpath, but on the other hand, they
might not face this problem at all because, as I understand, they will
provide this library in their applications by themselves.


So, since Cassandra has a huge ecosystem and a variety of tools that I
can't even imagine, the main question here is:

Can we move forward with this change without breaking backwards
compatibility with any kind of tools that we have considering the
example above as the main case? Do you have any thoughts on this?

The changes are here:
https://github.com/apache/cassandra/pull/2238/files



[1] 
https://github.com/dropwizard/metrics/pull/2180/files#diff-5dbf1a803ecc13ff945a08ed3eb09149a83615e83f15320550af8e3a91976446R14
[2] https://issues.apache.org/jira/browse/CASSANDRA-14667
[3] https://github.com/dropwizard/metrics/issues/1581#issuecomment-628430870
[4] https://issues.apache.org/jira/browse/STORM-3204
[5] https://issues.apache.org/jira/browse/CASSANDRA-15750
[6] https://issues.apache.org/jira/browse/CASSANDRA-17231


Re: [DISCUSS] Maintain backwards compatibility after dependency upgrade in the 5.0

2023-07-03 Thread Maxim Muzafarov
I'd like to mention the approach we took here: to untangle the driver
update in tests with the dropwizard library version (cassandra-driver
3.11 requires the "old" JMXReporter classes in the classpath) we have
copied the classes into the tests themselves, as it is allowed by the
Apache License 2.0. This way we can update the metrics library itself
and then update the driver used in the tests afterwards.

If there are no objections, we need another committer to take a look
at these changes:
https://issues.apache.org/jira/browse/CASSANDRA-14667
https://github.com/apache/cassandra/pull/2238/files

Thanks in advance for your help!

On Wed, 28 Jun 2023 at 16:04, Bowen Song via dev
 wrote:
>
> IMHO, anyone upgrading software between major versions should expect to
> see breaking changes. Introducing breaking or major changes is the whole
> point of bumping major version numbers.
>
> Since the library upgrade need to happen sooner or later, I don't see
> any reason why it should not happen in the 5.0 release.
>
>
> On 27/06/2023 19:21, Maxim Muzafarov wrote:
> > Hello everyone,
> >
> >
> > We use the Dropwizard Metrics 3.1.5 library, which provides a basic
> > set of classes to easily expose Cassandra internals to a user through
> > various interfaces (the most common being JMX). We want to upgrade
> > this library version in the next major release 5.0 up to the latest
> > stable 4.2.19 for the following reasons:
> > - the 3.x (and 4.0.x) Dropwizard Metrics library is no longer
> > supported, which means that if we face a critical CVE, we'll still
> > need to upgrade, so it's better to do it sooner and more calmly;
> > - as of 4.2.5 the library supports jdk11, jdk17, so we will be in-sync
> > [1] as well as having some of the compatibility fixes mentioned in the
> > related JIRA [2];
> > - there have been a few user-related requests [3][4] whose
> > applications collide with the old version of the library, we want to
> > help them;
> >
> >
> > The problem
> >
> > The problem with simply upgrading is that the JmxReporter class of the
> > library has moved from the com.codahale.metrics package in the 3.x
> > release to the com.codahale.metrics.jmx package in the 4.x release.
> > This is a problem for applications/tools that rely on the cassandra
> > classpath (lib/jars) as after the upgrade they may be looking for the
> > JmxReporter class which has changed its location.
> >
> > A good example of the problem that we (or a user) may face after the
> > upgrade is our tests and the cassandra-driver-core 3.1.1, which uses
> > the old 3.x version of the library in tests. Of course, in this case,
> > we can upgrade the cassandra driver up to 4.x [5][6] to fix the
> > problem, as the new driver uses a newer version of the library, but
> > that's another story I won't go into for now. I'm talking more about
> > visualising the problem a user might face after upgrading to 5.0 if
> > he/she rely on the cassandra classpath, but on the other hand, they
> > might not face this problem at all because, as I understand, they will
> > provide this library in their applications by themselves.
> >
> >
> > So, since Cassandra has a huge ecosystem and a variety of tools that I
> > can't even imagine, the main question here is:
> >
> > Can we move forward with this change without breaking backwards
> > compatibility with any kind of tools that we have considering the
> > example above as the main case? Do you have any thoughts on this?
> >
> > The changes are here:
> > https://github.com/apache/cassandra/pull/2238/files
> >
> >
> >
> > [1] 
> > https://github.com/dropwizard/metrics/pull/2180/files#diff-5dbf1a803ecc13ff945a08ed3eb09149a83615e83f15320550af8e3a91976446R14
> > [2] https://issues.apache.org/jira/browse/CASSANDRA-14667
> > [3] https://github.com/dropwizard/metrics/issues/1581#issuecomment-628430870
> > [4] https://issues.apache.org/jira/browse/STORM-3204
> > [5] https://issues.apache.org/jira/browse/CASSANDRA-15750
> > [6] https://issues.apache.org/jira/browse/CASSANDRA-17231


Re: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-03 Thread Maxim Muzafarov
For me, the biggest benefit of keeping the build scripts and CI
configurations as well in the same project is that these files are
versioned in the same way as the main sources do. This ensures that we
can build past releases without having any annoying errors in the
scripts, so I would say that this is a pretty necessary change.

I'd like to mention the approach that could work for the projects with
a huge amount of tests. Instead of running all the tests through
available CI agents every time we can have presets of tests:
- base tests (to make sure that your design basically works, the set
will not run longer than 30 min);
- pre-commit tests (the number of tests to make sure that we can
safely commit new changes and fit the run into the 1-2 hour build
timeframe);
- nightly builds (scheduled task to build everything we have once a
day and notify the ML if that build fails);


My question here is:
Should we mention in this concept how we will build the sub-projects
(e.g. Accord) alongside Cassandra?

On Fri, 30 Jun 2023 at 23:19, Josh McKenzie  wrote:
>
> Not everyone will have access to such resources, if all you have is 1 such 
> pod you'll be waiting a long time (in theory one month, and you actually need 
> a few bigger pods for some of the more extensive tests, e.g. large upgrade 
> tests)….
>
> One thing worth calling out: I believe we have a lot of low hanging fruit in 
> the domain of "find long running tests and speed them up". Early 2022 I was 
> poking around at our unit tests on CASSANDRA-17371 and found that 2.62% of 
> our tests made up 20.4% of our runtime 
> (https://docs.google.com/spreadsheets/d/1-tkH-hWBlEVInzMjLmJz4wABV6_mGs-2-NNM2XoVTcA/edit#gid=1501761592).
>  This kind of finding is pretty consistent; I remember Carl Yeksigian at NGCC 
> back in like 2015 axing an hour plus of aggregate runtime by just devoting an 
> afternoon to looking at a few badly behaving tests.
>
> I'd like to see us move from "1 pod 1 month" down to something a lot more 
> manageable. :)
>
> Shout-out to Berenger's work on CASSANDRA-16951 for dtest cluster reuse (not 
> yet merged), and I have CASSANDRA-15196 to remove the CDC vs. non segment 
> allocator distinction and axe the test-cdc target entirely.
>
> Ok. Enough of that. Don't want to derail us, just wanted to call out that the 
> state of things today isn't the way it has to be.
>
> On Fri, Jun 30, 2023, at 4:41 PM, Mick Semb Wever wrote:
>
> - There are hw constraints, is there any approximation on how long it will 
> take to run all tests? Or is there a stated goal that we will strive to reach 
> as a project?
>
> Have to defer to Mick on this; I don't think the changes outlined here will 
> materially change the runtime on our currently donated nodes in CI.
>
>
>
> A recent comparison between CircleCI and the jenkins code underneath 
> ci-cassandra.a.o was done (not yet shared) to whether a 'repeatable CI' can 
> be both lower cost and same turn around time.  The exercise undercovered that 
> there's a lot of waste in our jenkins builds, and once the jenkinsfile 
> becomes standalone it can stash and unstash the build results.  From this a 
> conservative estimate was even if we only brought the build time to be double 
> that of circleci it will still be significantly lower cost while still using 
> on-demand ec2 instances. (The goal is to use spot instances.)
>
> The real problem here is that our CI pipeline uses ~1000 containers. 
> ci-cassandra.a.o only has 100 executors (and a few of these at any time are 
> often down for disk self-cleaning).   The idea with 'repeatable CI', and to a 
> broader extent Josh's opening email, is that no one will need to use 
> ci-cassandra.a.o for pre-commit work anymore.  For post-commit we don't care 
> if it takes 7 hours (we care about stability of results, which 'repeatable 
> CI' also helps us with).
>
> While pre-commit testing will be more accessible to everyone, it will still 
> depend on the resources you have access to.  For the fastest turn-around 
> times you will need a k8s cluster that can spawn 1000 pods (4cpu, 8GB ram) 
> which will run for up to 1-30 minutes, or the equivalent.  Not everyone will 
> have access to such resources, if all you have is 1 such pod you'll be 
> waiting a long time (in theory one month, and you actually need a few bigger 
> pods for some of the more extensive tests, e.g. large upgrade tests)….
>
>


Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-07-06 Thread Maxim Muzafarov
In my humble opinion, it is better to have only one plain and
straightforward build pipeline for the whole project, with custom
flags used to skip a particular step, than to have multiple pipelines
under the ant tool with multiple endpoints accordingly. I mean, all
the steps need to be lined up, with each step in the pipeline
executing everything that stands before it unless skip flags are
specified. Meanwhile, I like your idea of grouping all the checks
under the dedicated step (and changing the no-checkstyle flag to
no-checks accordingly as Ekaterina mentioned).


Let me share a simple example of what I'm talking about with one
single endpoint.
Let's assume the following step order:

init -> _build_java (compile) -> checks -> build -> jar -> test ->
artifacts -> publish;

So, the use would be:

ant jar -Dno-checks
ant test -Dno-build
ant publish -Dno-tests -Dno-checks


I'm not saying what you've proposed is bad, in fact, we're not
currently doing the pipeline I'm talking about, but adding an
additional endpoint is something we should consider very carefully as
it may create some difficulties for Maven/Gradle migration if it ever
happens.

So, if I'm not mistaken the following you're trying to add a new
endpoint to the way how we might build the project:

- "ant [check]" = build + all checks (first endpoint)
- "ant jar" = build + make jars + no checks (second endpoint)

And I would suggest running `ant jar -Dno-checks` instead to achieve
the same result assuming the `jar` is still transitively dependent on
`checks`.

On Thu, 6 Jul 2023 at 14:02, Jacek Lewandowski
 wrote:
>
> Great discussion, but I feel we still have no conclusion.
>
>
> I fully support automatically setting up IDE(A) to run the necessary stuff 
> automatically in a developer-friendly environment, but let it be continued in 
> a separate thread.
>
>
> I wouldn't say I like flags, especially if they have to be used on a daily 
> basis. The build script help message does not list them when "ant -p" is run.
>
>
> I'm going to make these changes unless it is vetoed:
>
> "ant [check]" = build + all checks, build everything, and run all the checks; 
> also, this would become the default target if no target is specified
> "ant jar" = build + make jars: build all the jars and tests, no checks
> All "test" commands = build + make jars + run the tests: build all the jars 
> and tests, run the tests, no checks
>
>
> Therefore, a user who wants to validate their branch before running CI would 
> need to run just "ant" without any args. This way, a newcomer who does not 
> know our build targets will likely run the checks.
>
>
> We still need some flags for skipping specific tasks to optimize for CI, but 
> in general, they would not be required for local development.
>
>
> Flags will also be needed to customize some tasks, but they should be 
> optional for newcomers. In addition, a "help" target could display a list of 
> selected tasks and properties with descriptions.
>
>
> I'd be more than happy if we could conclude the discussion somehow and move 
> forward :)
>
>
> thanks,
>
> Jacek
>
>
>
> czw., 29 cze 2023 o 23:34 Ekaterina Dimitrova  
> napisał(a):
>>
>> There is a separate thread started and respective ticket for 
>> generate-idea-files.
>> https://lists.apache.org/thread/o2fdkyv2skvf9ngy9jhpnhvo92qvr17m
>> CASSANDRA-18467
>>
>>
>> On Thu, 29 Jun 2023 at 16:54, Jeremiah Jordan  
>> wrote:
>>>
>>> +100 I support making generate-idea-files auto setup everything in IntelliJ 
>>> for you.  If you post a diff, I will test it.
>>>
>>> On this proposal, I don’t really have an opinion one way or the other about 
>>> what the default is for local "ant jar”, if its slow I will figure out how 
>>> to turn it off, if its fast I will leave it on.
>>> I do care that CI runs checks, and complains loudly if something is wrong 
>>> such that it is very easy to tell during review.
>>>
>>> -Jeremiah
>>>
>>> On Jun 29, 2023 at 1:44:09 PM, Josh McKenzie  wrote:

 In accord I added an opt-out for each hook, and will require such here as 
 well

 On for main branches, off for feature branches seems like it might blanket 
 satisfy this concern? Doesn't fix the "--atomic across 5 branches means 
 style checks and build on hook across those branches" which isn't ideal. I 
 don't think style check failures after push upstream are frequent enough 
 to make the cost/benefit there make sense overall are they?

 Related to this - I have sonarlint, spotbugs, and checkstyle all running 
 inside idea; since pulling those in and tuning the configs a bit I haven't 
 run into a single issue w/our checkstyle build target (go figure). Having 
 the required style checks reflected realtime inside your work environment 
 goes a long way towards making it a more intuitive part of your workflow 
 rather than being an annoying last minute block of your ability to 
 progress that requires circling back into the code.

Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-07-11 Thread Maxim Muzafarov
Thank you for your comments and for sharing the ticket targeting
strategy, I'm really happy to see this page where I have found all the
answers to the questions I had. So, I tend towards your view and will
just land this ticket on the 5.0 release only for now as it makes
sense for me as well.

I didn't add the feature flag for this feature because for 99% of the
source code changes it only works with Cassandra internals leaving the
public API unchanged. A few remarks on this are:
- the display format of the vtable property has changed to match the
yaml configuration style, this doesn't mean that we are displaying
property values in a completely different way in fact the formats
match with only 4 exceptions mentioned in the message above (this
should be fine for the major release I hope);
- a new column, which we've agreed to add (I'll fix the PR shortly);


I would also like to mention the follow-up todos required by this
issue to set the right expectations. Currently, we've brought a few
properties under the framework to make them updateable with the
SettingsTable, so that you can keep focusing on the framework itself
rather than on tagging the configuration properties themselves with
the @Mutable annotation. Although the solution is self-sufficient for
the already tagged properties, we still need to bring the rest of them
under the framework afterwards. I'll create an issue and do it right,
we'll be done with the inital patch.


On Fri, 7 Jul 2023 at 20:37, Josh McKenzie  wrote:
>
> This really is great work Maxim; definitely appreciate all the hard work 
> that's gone into it and I think the users will too.
>
> In terms of where it should land, we discussed this type of question at 
> length on the ML awhile ago and ended up codifying it in the wiki: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/Patching%2C+versioning%2C+and+LTS+releases
>
> When working on a ticket, use the following guideline to determine which 
> branch to apply it to (Note: See How To Commit for details on the commit and 
> merge process)
>
> Bugfix: apply to oldest applicable LTS and merge up through latest GA to trunk
>
> In the event you need to make changes on the merge commit, merge with -s ours 
> and revise the commit via --amend
>
> Improvement: apply to trunk only (next release)
>
> Note: refactoring and removing dead code qualifies as an Improvement; our 
> priority is stability on GA lines
>
> New Feature: apply to trunk only (next release)
>
> Our priority is to keep the 2 LTS releases and latest GA stable while 
> releasing new "latest GA" on a cadence that provides new improvements and 
> functionality to users soon enough to be valuable and relevant.
>
>
> So in this case, target whatever unreleased next feature release (i.e. SEMVER 
> MAJOR || MINOR) we have on deck.
>
> On Thu, Jul 6, 2023, at 1:21 PM, Ekaterina Dimitrova wrote:
>
> Hi,
>
> First of all, thank you for all the work!
> I personally think that it should be ok to add a new column.
>
> I will be very happy to see this landing in 5.0.
> I am personally against porting this patch to 4.1. To be clear, I am sure you 
> did a great job and my response would be the same to every single person - 
> the configuration is quite wide-spread and the devil is in the details. I do 
> not see a good reason for exception here except convenience. There is no 
> feature flag for these changes too, right?
>
> Best regards,
> Ekaterina
>
> На четвъртък, 6 юли 2023 г. Miklosovic, Stefan  
> написа:
>
> Hi Maxim,
>
> I went through the PR and added my comments. I think David also reviewed it. 
> All points you mentioned make sense to me but I humbly think it is necessary 
> to have at least one additional pair of eyes on this as the patch is 
> relatively impactful.
>
> I would like to see additional column in system_views.settings of name 
> "mutable" and of type "boolean" to see what field I am actually allowed to 
> update as an operator.
>
> It seems to me you agree with the introduction of this column (1) but there 
> is no clear agreement where we actually want to put it. You want this whole 
> feature to be committed to 4.1 branch as well which is an interesting 
> proposal. I was thinking that this work will go to 5.0 only. I am not 
> completely sure it is necessary to backport this feature but your 
> argumentation here (2) is worth to discuss further.
>
> If we introduce this change to 4.1, that field would not be there but in 5.0 
> it would. So that way we will not introduce any new column to 
> system_views.settings.
> We could also go with the introduction of this column to 4.1 if people are ok 
> with that.
>
> For the simplicity, I am slightly le

Re: [VOTE] Release Apache Cassandra 4.1.3

2023-07-20 Thread Maxim Muzafarov
+1 (nb)

Checked:

- the rc version
- the branch builds
- the branch version matches the rc version
- downloaded binaries and sources
- checksums and signature verified


I have created the following GitHub Action to automate the process:
https://github.com/apache/cassandra/compare/trunk...Mmuzaf:cassandra:ga-release

You can run it locally with the following command:
act --job check --eventpath rc.event -s GITHUB_TOKEN="$(gh auth
token)" --container-architecture linux/amd64

On Thu, 20 Jul 2023 at 00:27, Brandon Williams  wrote:
>
> +1
>
> Kind Regards,
> Brandon
>
> On Wed, Jul 19, 2023 at 1:28 AM Miklosovic, Stefan
>  wrote:
> >
> > Proposing the test build of Cassandra 4.1.3 for release.
> >
> > sha1: 2a4cd36475de3eb47207cd88d2d472b876c6816d
> > Git: https://github.com/apache/cassandra/tree/4.1.3-tentative
> > Maven Artifacts: 
> > https://repository.apache.org/content/repositories/orgapachecassandra-1304/org/apache/cassandra/cassandra-all/4.1.3/
> >
> > The Source and Build Artifacts, and the Debian and RPM packages and 
> > repositories, are available here: 
> > https://dist.apache.org/repos/dist/dev/cassandra/4.1.3/
> >
> > The vote will be open for 72 hours (longer if needed). Everyone who has 
> > tested the build is invited to vote. Votes by PMC members are considered 
> > binding. A vote passes if there are at least three binding +1s and no -1's.
> >
> > [1]: CHANGES.txt: 
> > https://github.com/apache/cassandra/blob/4.1.3-tentative/CHANGES.txt
> > [2]: NEWS.txt: 
> > https://github.com/apache/cassandra/blob/4.1.3-tentative/NEWS.txt


Re: [DISCUSS] Maintain backwards compatibility after dependency upgrade in the 5.0

2023-07-21 Thread Maxim Muzafarov
Hello everyone,

It still needs a pair of eyes to push it forward.


I came across another good thing that might help us to overcome the
difficulties with the dropwizard metrics dependency upgrade. The
change relates to the driver itself and reuses the same approach that
was used to deal with the driver's netty dependencies. We need to
shade the dropwizard metrics classes and no longer rely on the
cassandra classpath at least for the 3.x version of the java driver,
and make the next 3.11.4 release of the java driver accordingly.

The changes for the driver are here:
https://github.com/datastax/java-driver/pull/1685

This will give us (and users as well) the confidence to move forward
with this change to 5.x alongside the 3.11 version of the driver
usage. Looking forward to your thoughts.

Changes for the Cassandra part are here:
https://github.com/apache/cassandra/pull/2238/files

On Mon, 3 Jul 2023 at 15:15, Maxim Muzafarov  wrote:
>
> I'd like to mention the approach we took here: to untangle the driver
> update in tests with the dropwizard library version (cassandra-driver
> 3.11 requires the "old" JMXReporter classes in the classpath) we have
> copied the classes into the tests themselves, as it is allowed by the
> Apache License 2.0. This way we can update the metrics library itself
> and then update the driver used in the tests afterwards.
>
> If there are no objections, we need another committer to take a look
> at these changes:
> https://issues.apache.org/jira/browse/CASSANDRA-14667
> https://github.com/apache/cassandra/pull/2238/files
>
> Thanks in advance for your help!
>
> On Wed, 28 Jun 2023 at 16:04, Bowen Song via dev
>  wrote:
> >
> > IMHO, anyone upgrading software between major versions should expect to
> > see breaking changes. Introducing breaking or major changes is the whole
> > point of bumping major version numbers.
> >
> > Since the library upgrade need to happen sooner or later, I don't see
> > any reason why it should not happen in the 5.0 release.
> >
> >
> > On 27/06/2023 19:21, Maxim Muzafarov wrote:
> > > Hello everyone,
> > >
> > >
> > > We use the Dropwizard Metrics 3.1.5 library, which provides a basic
> > > set of classes to easily expose Cassandra internals to a user through
> > > various interfaces (the most common being JMX). We want to upgrade
> > > this library version in the next major release 5.0 up to the latest
> > > stable 4.2.19 for the following reasons:
> > > - the 3.x (and 4.0.x) Dropwizard Metrics library is no longer
> > > supported, which means that if we face a critical CVE, we'll still
> > > need to upgrade, so it's better to do it sooner and more calmly;
> > > - as of 4.2.5 the library supports jdk11, jdk17, so we will be in-sync
> > > [1] as well as having some of the compatibility fixes mentioned in the
> > > related JIRA [2];
> > > - there have been a few user-related requests [3][4] whose
> > > applications collide with the old version of the library, we want to
> > > help them;
> > >
> > >
> > > The problem
> > >
> > > The problem with simply upgrading is that the JmxReporter class of the
> > > library has moved from the com.codahale.metrics package in the 3.x
> > > release to the com.codahale.metrics.jmx package in the 4.x release.
> > > This is a problem for applications/tools that rely on the cassandra
> > > classpath (lib/jars) as after the upgrade they may be looking for the
> > > JmxReporter class which has changed its location.
> > >
> > > A good example of the problem that we (or a user) may face after the
> > > upgrade is our tests and the cassandra-driver-core 3.1.1, which uses
> > > the old 3.x version of the library in tests. Of course, in this case,
> > > we can upgrade the cassandra driver up to 4.x [5][6] to fix the
> > > problem, as the new driver uses a newer version of the library, but
> > > that's another story I won't go into for now. I'm talking more about
> > > visualising the problem a user might face after upgrading to 5.0 if
> > > he/she rely on the cassandra classpath, but on the other hand, they
> > > might not face this problem at all because, as I understand, they will
> > > provide this library in their applications by themselves.
> > >
> > >
> > > So, since Cassandra has a huge ecosystem and a variety of tools that I
> > > can't even imagine, the main question here is:
> > >
> > > Can we move forward with this change without breaking backwards
> 

Re: [DISCUSS] Maintain backwards compatibility after dependency upgrade in the 5.0

2023-07-27 Thread Maxim Muzafarov
Bump this topic up for visibility as the code freeze is coming soon.

This seems like a good change to include in 5.0 as this kind of
library upgrade is more natural when the major version changes. It is
still possible to postpone it to 6.0, but the main concern here is
that the current version of dropwizard metrics library is obsolete and
no longer supported and it is better to avoid emergencies that could
arise (like the panic with log4j library upgrade some time ago).

The change itself is straightforward and deserves more eyes on it from
my point of view.

On Fri, 21 Jul 2023 at 14:51, Maxim Muzafarov  wrote:
>
> Hello everyone,
>
> It still needs a pair of eyes to push it forward.
>
>
> I came across another good thing that might help us to overcome the
> difficulties with the dropwizard metrics dependency upgrade. The
> change relates to the driver itself and reuses the same approach that
> was used to deal with the driver's netty dependencies. We need to
> shade the dropwizard metrics classes and no longer rely on the
> cassandra classpath at least for the 3.x version of the java driver,
> and make the next 3.11.4 release of the java driver accordingly.
>
> The changes for the driver are here:
> https://github.com/datastax/java-driver/pull/1685
>
> This will give us (and users as well) the confidence to move forward
> with this change to 5.x alongside the 3.11 version of the driver
> usage. Looking forward to your thoughts.
>
> Changes for the Cassandra part are here:
> https://github.com/apache/cassandra/pull/2238/files
>
> On Mon, 3 Jul 2023 at 15:15, Maxim Muzafarov  wrote:
> >
> > I'd like to mention the approach we took here: to untangle the driver
> > update in tests with the dropwizard library version (cassandra-driver
> > 3.11 requires the "old" JMXReporter classes in the classpath) we have
> > copied the classes into the tests themselves, as it is allowed by the
> > Apache License 2.0. This way we can update the metrics library itself
> > and then update the driver used in the tests afterwards.
> >
> > If there are no objections, we need another committer to take a look
> > at these changes:
> > https://issues.apache.org/jira/browse/CASSANDRA-14667
> > https://github.com/apache/cassandra/pull/2238/files
> >
> > Thanks in advance for your help!
> >
> > On Wed, 28 Jun 2023 at 16:04, Bowen Song via dev
> >  wrote:
> > >
> > > IMHO, anyone upgrading software between major versions should expect to
> > > see breaking changes. Introducing breaking or major changes is the whole
> > > point of bumping major version numbers.
> > >
> > > Since the library upgrade need to happen sooner or later, I don't see
> > > any reason why it should not happen in the 5.0 release.
> > >
> > >
> > > On 27/06/2023 19:21, Maxim Muzafarov wrote:
> > > > Hello everyone,
> > > >
> > > >
> > > > We use the Dropwizard Metrics 3.1.5 library, which provides a basic
> > > > set of classes to easily expose Cassandra internals to a user through
> > > > various interfaces (the most common being JMX). We want to upgrade
> > > > this library version in the next major release 5.0 up to the latest
> > > > stable 4.2.19 for the following reasons:
> > > > - the 3.x (and 4.0.x) Dropwizard Metrics library is no longer
> > > > supported, which means that if we face a critical CVE, we'll still
> > > > need to upgrade, so it's better to do it sooner and more calmly;
> > > > - as of 4.2.5 the library supports jdk11, jdk17, so we will be in-sync
> > > > [1] as well as having some of the compatibility fixes mentioned in the
> > > > related JIRA [2];
> > > > - there have been a few user-related requests [3][4] whose
> > > > applications collide with the old version of the library, we want to
> > > > help them;
> > > >
> > > >
> > > > The problem
> > > >
> > > > The problem with simply upgrading is that the JmxReporter class of the
> > > > library has moved from the com.codahale.metrics package in the 3.x
> > > > release to the com.codahale.metrics.jmx package in the 4.x release.
> > > > This is a problem for applications/tools that rely on the cassandra
> > > > classpath (lib/jars) as after the upgrade they may be looking for the
> > > > JmxReporter class which has changed its location.
> > > >
> > > > A good example of the problem that we (or a user) may face after the
> > > > upgra

Re: [DISCUSSION] Cassandra's code style and source code analysis

2023-07-31 Thread Maxim Muzafarov
Hello everyone,


It's been a long time since the last discussion about the import order
code style, so I want to give these changes a chance as all the major
JIRA issues have already landed on the release branch so we won't
affect anyone. I'd be happy to find any reviewers who are interested
in helping with the next steps :-) I've updated the changes to reflect
the latest checkstyle work, so here they are:

https://issues.apache.org/jira/browse/CASSANDRA-17925
https://github.com/apache/cassandra/pull/2108


The changes look scary at first glance, but they're actually quite
simple and in line with what we've discussed above. In short, we can
divide all the affected files into two parts: the update of the code
style configuration files (checkstyle + IDE configs), and the update
of all the sources to match the code style.

In short:

- "import order" hotkey will work regardless of which IDE you are using;
- updated checkstyle configuration, and IDEA, Eclipse, NetBeans
configurations have been updated;
- AvoidStarImport checkstyle rule applied as well;

The import order we've agreed upon:

java.*
[blank line]
javax.*
[blank line]
com.*
[blank line]
net.*
[blank line]
org.*
[blank line]
org.apache.cassandra.*
[blank line]
all other imports
[blank line]
static all other imports

On Mon, 27 Feb 2023 at 13:26, Maxim Muzafarov  wrote:
>
> > I suppose it can be easy for the existing feature branches if they have a 
> > single commit. Don't we need to adjust each commit for multi-commit feature 
> > branches?
>
> It depends on how feature branches are maintained and developed, I
> guess. My thoughts here are that the IDE's hotkeys should just work to
> resolve any code-style issues that arise during rebase/maintenance.
> I'm not talking about enforcing all our code-style rules but giving
> developers good flexibility. The classes import order rule might be a
> good example here.
>
> On Wed, 22 Feb 2023 at 21:27, Jacek Lewandowski
>  wrote:
> >
> > I suppose it can be easy for the existing feature branches if they have a 
> > single commit. Don't we need to adjust each commit for multi-commit feature 
> > branches?
> >
> > śr., 22 lut 2023, 19:48 użytkownik Maxim Muzafarov  
> > napisał:
> >>
> >> Hello everyone,
> >>
> >> I have created an issue CASSANDRA-18277 that may help us move forward
> >> with code style changes. It only affects the way we store the IntelliJ
> >> code style configuration and has no effect on any current (or any)
> >> releases, so it should be safe to merge. So, once the issue is
> >> resolved, every developer that checkouts a release branch will use the
> >> same code style stored in that branch. This in turn makes rebasing a
> >> big change like the import order [1] a really straightforward matter
> >> (by pressing Crtl + Opt + O in their local branch to organize
> >> imports).
> >>
> >> See:
> >>
> >> Move the IntelliJ Idea code style and inspections configuration to the
> >> project's root .idea directory
> >> https://issues.apache.org/jira/browse/CASSANDRA-18277
> >>
> >>
> >>
> >> [1] https://issues.apache.org/jira/browse/CASSANDRA-17925
> >>
> >> On Wed, 25 Jan 2023 at 13:05, Miklosovic, Stefan
> >>  wrote:
> >> >
> >> > Thank you Maxim for doing this.
> >> >
> >> > It is nice to see this effort materialized in a PR.
> >> >
> >> > I would wait until bigger chunks of work are committed to trunk (like 
> >> > CEP-15) to not collide too much. I would say we can postpone doing this 
> >> > until the actual 5.0 release, last weeks before it so we would not clash 
> >> > with any work people would like to include in 5.0. This can go in 
> >> > anytime, basically.
> >> >
> >> > Are people on the same page?
> >> >
> >> > Regards
> >> >
> >> > 
> >> > From: Maxim Muzafarov 
> >> > Sent: Monday, January 23, 2023 19:46
> >> > To: dev@cassandra.apache.org
> >> > Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
> >> >
> >> > NetApp Security WARNING: This is an external email. Do not click links 
> >> > or open attachments unless you recognize the sender and know the content 
> >> > is safe.
> >> >
> >> >
> >> >
> >> >
> >> > Hello everyone,
> >> >
> >> > You can find the changes here:

Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-03 Thread Maxim Muzafarov
Personally, I find javadocs quite useful, especially when htmls are
indexed by search engines, which in turn increases the chances of
finding the right answer faster (I have seen a lot of useful javadocs
in the source code).

I have done a quick build of the javadocs:

  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Building index for all classes...
  [javadoc] 100 errors
  [javadoc] 100 warnings

100 errors is no big deal and can be easily fixed. From my point of
view, the problem is that the javadoc task is not given the attention
it deserves. The failonerror is currently 'false' and the task itself
is not a part of any build and/or release processes, correct me if I'm
wrong.

So,
1. Fix warnings/errors;
2. Make the javadoc task part of the build (e.g. put it under
'artifacts'), or make it part of the release process that is regularly
checked on the CI;
3. Publish/deploy the javadoc htmls for release in the special
directory of the cassandra website to give them a chance of being
indexed;

On Thu, 3 Aug 2023 at 17:11, Jeremiah Jordan  wrote:
>
> I don’t think anyone wants to remove the javadocs.  This thread is about 
> removing the broken ant task which generates html files from them.
>
> +1 from me on removing the ant task.  If someone feels the task is useful 
> they can always implement one that does not crash and add it back.
>
> -Jeremiah
>
> On Aug 3, 2023 at 9:59:55 AM, "Claude Warren, Jr via dev" 
>  wrote:
>>
>> I think that we can get more developers interested if there are available 
>> javadocs.  While many of the core classes are not going to be touched by 
>> someone just starting, being able to understand what the external touch 
>> points are and how they interact with other bits of the system can be 
>> invaluable, particularly when you don't have the entire code base in front 
>> of you.
>>
>> For example, I just wrote a tool that explores the distribution of keys 
>> across multiple sstables, I needed some of the tools classes but not much 
>> more.  Javadocs would have made that easy if I did not have the source code 
>> in front of me.
>>
>> I am -1 on removing the javadocs.
>>
>> On Thu, Aug 3, 2023 at 4:35 AM Josh McKenzie  wrote:
>>>
>>> If anything, the codebase could use a little more package/class/method 
>>> markup in some places
>>>
>>> I am impressed with how diplomatic and generous you're being here Derek. :D
>>>
>>> On Wed, Aug 2, 2023, at 5:46 PM, Miklosovic, Stefan wrote:
>>>
>>> That is a good idea. I would like to have Javadocs valid when going through 
>>> them in IDE. To enforce it, we would have to fix it first. If we find a way 
>>> how to validate Javadocs without actually rendering them, that would be 
>>> cool.
>>>
>>> There is a lot of legacy and rewriting of some custom-crafted formatting of 
>>> some comments might be quite a tedious task to do if it is required to have 
>>> them valid. I am in general for valid documentation and even enforcing it 
>>> but what to do with what is already there ...
>>>
>>> 
>>> From: Jacek Lewandowski 
>>> Sent: Wednesday, August 2, 2023 23:38
>>> To: dev@cassandra.apache.org
>>> Subject: Re: [DISCUSSION] Shall we remove ant javadoc task?
>>>
>>> NetApp Security WARNING: This is an external email. Do not click links or 
>>> open attachments unless you recognize the sender and know the content is 
>>> safe.
>>>
>>>
>>>
>>> With or without outputting JavaDoc to HTML, there are some errors which we 
>>> should maybe fix. We want to keep the documentation, but there can be 
>>> syntax errors which may prevent IDE generating a proper preview. So, the 
>>> question is - should we validate the JavaDoc comments as a precommit task? 
>>> Can it be done without actually generating HTML output?
>>>
>>> Thanks,
>>> Jacek
>>>
>>> śr., 2 sie 2023, 22:24 użytkownik Derek Chen-Becker 
>>> mailto:de...@chen-becker.org>> napisał:
>>> Oh, whoops, I guess I'm the only one that thinks Javadoc is just the tool 
>>> and/or it's output (not the markup itself) :P If anything, the codebase 
>>> could use a little more package/class/method markup in some places, so I'm 
>>> definitely only in favor of getting rid of the ant task. I should amend my 
>>> statement to be "...I suspect most people are not opening their browsers 
>>> and looking at Javadoc..." :)
>>>
>>> Cheers,
>>>
>>> Derek
>>>
>>>
>>>
>>> On Wed, Aug 2, 2023, 1:30 PM Josh McKenzie 
>>> mailto:jmcken...@apache.org>> wrote:
>>> most people are not looking at Javadoc when working on the codebase.
>>> I definitely use it extensively inside the IDE. But never as a compiled set 
>>> of external docs.
>>>
>>> Which is to say, I'm +1 on removing the target and I'd ask everyone to keep 
>>> javadoccing your classes and methods where things are non-obvious or 
>>> there's a logical coupling with something else in the system. :)
>>>
>>> On Wed, Aug 2, 2023, at 2:08 PM, Derek Chen-Becker 

Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-03 Thread Maxim Muzafarov
Yes, I agree. The javadoc task should be part of our CI if we decide
to keep it, to keep it buildable at all times.


BTW, I have managed to fix all the javadoc errors.
I have tested the task for both jdk11 and jdk17.

Changes are here:
https://github.com/apache/cassandra/compare/trunk...Mmuzaf:cassandra:javadoc_build

On Thu, 3 Aug 2023 at 21:20, Ekaterina Dimitrova  wrote:
>
> Thank you Maxim,
>
> “
>
> From my point of
> view, the problem is that the javadoc task is not given the attention
> it deserves. The failonerror is currently 'false' and the task itself
> is not a part of any build and/or release processes, correct me if I'm
> wrong.
>
> So,
> 1. Fix warnings/errors;
> 2. Make the javadoc task part of the build (e.g. put it under
> 'artifacts'), or make it part of the release process that is regularly
> checked on the CI;
> 3. Publish/deploy the javadoc htmls for release in the special
> directory of the cassandra website to give them a chance of being
> indexed;“
>
> This is aligned with what I saw and the two options mentioned at the 
> beginning - if we decide to keep it we should fix things and add the task to 
> CI, if we don’t because no one wants the html pages - then better to remove 
> it this ant task.
> On your comment about 100 errors - it seems they are more. There is a cap of 
> 100 but when you fix them, more errors appear.
> Further discussion can be found at CASSANDRA-17687
>
> On Thu, 3 Aug 2023 at 14:21, Maxim Muzafarov  wrote:
>>
>> Personally, I find javadocs quite useful, especially when htmls are
>> indexed by search engines, which in turn increases the chances of
>> finding the right answer faster (I have seen a lot of useful javadocs
>> in the source code).
>>
>> I have done a quick build of the javadocs:
>>
>>   [javadoc] Building index for all the packages and classes...
>>   [javadoc] Building index for all classes...
>>   [javadoc] Building index for all classes...
>>   [javadoc] 100 errors
>>   [javadoc] 100 warnings
>>
>> 100 errors is no big deal and can be easily fixed. From my point of
>> view, the problem is that the javadoc task is not given the attention
>> it deserves. The failonerror is currently 'false' and the task itself
>> is not a part of any build and/or release processes, correct me if I'm
>> wrong.
>>
>> So,
>> 1. Fix warnings/errors;
>> 2. Make the javadoc task part of the build (e.g. put it under
>> 'artifacts'), or make it part of the release process that is regularly
>> checked on the CI;
>> 3. Publish/deploy the javadoc htmls for release in the special
>> directory of the cassandra website to give them a chance of being
>> indexed;
>>
>> On Thu, 3 Aug 2023 at 17:11, Jeremiah Jordan  
>> wrote:
>> >
>> > I don’t think anyone wants to remove the javadocs.  This thread is about 
>> > removing the broken ant task which generates html files from them.
>> >
>> > +1 from me on removing the ant task.  If someone feels the task is useful 
>> > they can always implement one that does not crash and add it back.
>> >
>> > -Jeremiah
>> >
>> > On Aug 3, 2023 at 9:59:55 AM, "Claude Warren, Jr via dev" 
>> >  wrote:
>> >>
>> >> I think that we can get more developers interested if there are available 
>> >> javadocs.  While many of the core classes are not going to be touched by 
>> >> someone just starting, being able to understand what the external touch 
>> >> points are and how they interact with other bits of the system can be 
>> >> invaluable, particularly when you don't have the entire code base in 
>> >> front of you.
>> >>
>> >> For example, I just wrote a tool that explores the distribution of keys 
>> >> across multiple sstables, I needed some of the tools classes but not much 
>> >> more.  Javadocs would have made that easy if I did not have the source 
>> >> code in front of me.
>> >>
>> >> I am -1 on removing the javadocs.
>> >>
>> >> On Thu, Aug 3, 2023 at 4:35 AM Josh McKenzie  wrote:
>> >>>
>> >>> If anything, the codebase could use a little more package/class/method 
>> >>> markup in some places
>> >>>
>> >>> I am impressed with how diplomatic and generous you're being here Derek. 
>> >>> :D
>> >>>
>> >>> On Wed, Aug 2, 2023, at 5:46 PM, Miklosovic, Stefan wrote:
>> >>>
>> &g

Re: [DISCUSS] CASSANDRA-18743 Deprecation of metrics-reporter-config

2023-08-11 Thread Maxim Muzafarov
+1

The rationale for deprecating/removing this library is not just that
it is obsolete and doesn't get updates. In fact, when the
metrics-reporter-config [1] was added the dropwizard metrics library
(formerly com.yammer.metrics [2]) didn't support exporting metrics to
files like csv, so it made sense at that time. Now it is fully covered
by the drowpwizrd reporters [3], so users can achieve the same
behaviour without the need for metrics-reporter-config. And that's why
I have a lot of doubts about it being used by anyone, but deprecation
is friendlier because there's no rush to remove it. :-)


[1] https://issues.apache.org/jira/browse/CASSANDRA-4430
[2] https://issues.apache.org/jira/browse/CASSANDRA-5838
[3] https://metrics.dropwizard.io/4.2.0/getting-started.html#other-reporting

On Fri, 11 Aug 2023 at 16:50, Caleb Rackliffe  wrote:
>
> +1
>
> > On Aug 11, 2023, at 8:10 AM, Brandon Williams  wrote:
> >
> > +1
> >
> > Kind Regards,
> > Brandon
> >
> >> On Fri, Aug 11, 2023 at 8:08 AM Ekaterina Dimitrova
> >>  wrote:
> >>
> >>
> >> “ The rationale for this proposed deprecation is that the upcoming 5.0 
> >> release is a good time to evaluate dependencies that are no longer 
> >> receiving updates and will become risks in the future.”
> >>
> >> Thank you for raising it, I support your proposal for deprecation
> >>
> >>> On Fri, 11 Aug 2023 at 8:55, Abe Ratnofsky  wrote:
> >>>
> >>> Hey folks,
> >>>
> >>> Opening a thread to get input on a proposed dependency deprecation in 
> >>> 5.0: metrics-reporter-config has been archived for 3 years and not 
> >>> updated in nearly 6 years.
> >>>
> >>> This project has a minor security issue with its usage of unsafe YAML 
> >>> loading via snakeyaml’s unprotected Constructor: 
> >>> https://nvd.nist.gov/vuln/detail/CVE-2022-1471
> >>>
> >>> This CVE is reasonable to suppress, since operators should be able to 
> >>> trust their YAML configuration files.
> >>>
> >>> The rationale for this proposed deprecation is that the upcoming 5.0 
> >>> release is a good time to evaluate dependencies that are no longer 
> >>> receiving updates and will become risks in the future.
> >>>
> >>> https://issues.apache.org/jira/browse/CASSANDRA-18743
> >>>
> >>> —
> >>> Abe
> >>>


Re: [VOTE] Release Apache Cassandra 3.11.16 - SECOND ATTEMPT

2023-08-16 Thread Maxim Muzafarov
+1 (nb)

verified checksums, signing, and build from sources.

The link is broken :-(
Maven Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1306/org/apache/cassandra/cassandra-all/3.11.16/

On Wed, 16 Aug 2023 at 19:18, Brandon Williams  wrote:
>
> +1
>
> Kind Regards,
> Brandon
>
> On Tue, Aug 15, 2023 at 12:53 PM Miklosovic, Stefan
>  wrote:
> >
> > This is the second attempt to pass the vote after [1] is fixed.
> >
> > Proposing the test build of Cassandra 3.11.16 for release.
> >
> > sha1: 681b6ca103d91d940a9fecb8cd812f58dd2490d0
> > Git: https://github.com/apache/cassandra/tree/3.11.16-tentative
> > Maven Artifacts: 
> > https://repository.apache.org/content/repositories/orgapachecassandra-1306/org/apache/cassandra/cassandra-all/3.11.16/
> >
> > The Source and Build Artifacts, and the Debian and RPM packages and 
> > repositories, are available here: 
> > https://dist.apache.org/repos/dist/dev/cassandra/3.11.16/
> >
> > The vote will be open for 72 hours (longer if needed). Everyone who has 
> > tested the build is invited to vote. Votes by PMC members are considered 
> > binding. A vote passes if there are at least three binding +1s and no -1's.
> >
> > [1]: https://issues.apache.org/jira/browse/CASSANDRA-18751
> > [2]: CHANGES.txt: 
> > https://github.com/apache/cassandra/blob/3.11.16-tentative/CHANGES.txt
> > [3]: NEWS.txt: 
> > https://github.com/apache/cassandra/blob/3.11.16-tentative/NEWS.txt


Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-17 Thread Maxim Muzafarov
We have "artifacts" ant target that depends on "checks" and "gen-doc",
from my point of view, it would be nice to have the "artifacts"
depending on "javadocs" as well. That way we can be sure that
everything related is in good order.

On Thu, 17 Aug 2023 at 18:05, Brandon Williams  wrote:
>
> If everything is good now, I think CI should fail if it regresses so
> we can keep it this way.
>
> Kind Regards,
> Brandon
>
> On Thu, Aug 17, 2023 at 10:49 AM Ekaterina Dimitrova
>  wrote:
> >
> > In CASSANDRA-18717 Maxim posted the javadoc fix. Stefan already made a 
> > first pass of review so it seems we are not removing this ant task as it 
> > was already fixed and there are people who find value of keeping it.
> > My question is do we want to fail CI if this regress or not?
> >
> > On Thu, 3 Aug 2023 at 22:44, Josh McKenzie  wrote:
> >>
> >> the problem is that the javadoc task is not given the attention
> >> it deserves. The failonerror is currently 'false' and the task itself
> >> is not a part of any build and/or release processes
> >>
> >>
> >> I just wrote a tool that explores the distribution of keys across multiple 
> >> sstables, I needed some of the tools classes but not much more.  Javadocs 
> >> would have made that easy
> >>
> >> You know what? I agree with all that. If I had to jump into the source for 
> >> the JDK or other libraries every time I needed to work with them that'd be 
> >> annoying.
> >>
> >> BTW, I have managed to fix all the javadoc errors.
> >>
> >> Of course you have. :) Industrious as usual Maxim; thanks for tackling 
> >> that!
> >>
> >> So yeah. Depending on how long javadocs take to generate, I think having 
> >> them as part of our pre-commit rotation makes sense. Could even add them 
> >> to our site with something like an "API" section (gasp) here: 
> >> https://cassandra.apache.org/doc/latest/.
> >>
> >> Would certainly help motivate us to clarify the whole "what is an external 
> >> API we're committing to or not" discussions.
> >>
> >> On Thu, Aug 3, 2023, at 6:09 PM, Ekaterina Dimitrova wrote:
> >>
> >> Thank you Maxim. There is CASSANDRA-18717, I guess that patch should go 
> >> there. Keeping the task or not, the fix of the docs should go in anyway 
> >> IMHO. I will not be available the next few days, but I can help with 
> >> reviews when I am back.
> >>
> >> On Thu, 3 Aug 2023 at 17:44, Maxim Muzafarov  wrote:
> >>
> >> Yes, I agree. The javadoc task should be part of our CI if we decide
> >> to keep it, to keep it buildable at all times.
> >>
> >>
> >> BTW, I have managed to fix all the javadoc errors.
> >> I have tested the task for both jdk11 and jdk17.
> >>
> >> Changes are here:
> >> https://github.com/apache/cassandra/compare/trunk...Mmuzaf:cassandra:javadoc_build
> >>
> >> On Thu, 3 Aug 2023 at 21:20, Ekaterina Dimitrova  
> >> wrote:
> >> >
> >> > Thank you Maxim,
> >> >
> >> > “
> >> >
> >> > From my point of
> >> > view, the problem is that the javadoc task is not given the attention
> >> > it deserves. The failonerror is currently 'false' and the task itself
> >> > is not a part of any build and/or release processes, correct me if I'm
> >> > wrong.
> >> >
> >> > So,
> >> > 1. Fix warnings/errors;
> >> > 2. Make the javadoc task part of the build (e.g. put it under
> >> > 'artifacts'), or make it part of the release process that is regularly
> >> > checked on the CI;
> >> > 3. Publish/deploy the javadoc htmls for release in the special
> >> > directory of the cassandra website to give them a chance of being
> >> > indexed;“
> >> >
> >> > This is aligned with what I saw and the two options mentioned at the 
> >> > beginning - if we decide to keep it we should fix things and add the 
> >> > task to CI, if we don’t because no one wants the html pages - then 
> >> > better to remove it this ant task.
> >> > On your comment about 100 errors - it seems they are more. There is a 
> >> > cap of 100 but when you fix them, more errors appear.
> >> > Further discussion can be found at CASSANDRA-17687
> >> >
> >> &

Re: [DISCUSSION] CASSANDRA-18772 - removal of commons-codec on trunk

2023-08-18 Thread Maxim Muzafarov
There are a few other dependencies that are probably no longer used
and can be removed. I'm not talking about the netty-related
dependencies, because they seem to be used transitively and required
to be in the classpath, but the others are good candidates, I think.

For example, org.caffinitas.ohc:ohc-core-j8 seems to be related only
to jdk8 only which we have moved away from. I've removed it locally
and the sources still compile without it.

I've created an issue for that:
https://issues.apache.org/jira/browse/CASSANDRA-18777

On Thu, 17 Aug 2023 at 19:43, Mick Semb Wever  wrote:
>
> >
> > I propose we remove commons-codec on trunk.
> > The only usage I found was from CASSANDRA-12790 - Support InfluxDb metrics 
> > reporter configuration, which relied on commons-codec and 
> > metrics-reporter-config, which will be removed as part of CASSANDRA-18743.
> > The only question is whether we can remove those two dependencies on trunk, 
> > considering it is 5.1, or do we need to wait until 6.0.
>
>
>
> Dependencies are not an API (where they're not exposed/leaked), +1 on 
> removing it in 5.0 (when 18743 lands).  If users/operators need it back on 
> the classpath it is for reasons outside of our API concerns.


Re: [DISCUSSION] Dependency management in our first alpha release

2023-08-23 Thread Maxim Muzafarov
Hello everyone,


CASSANDRA-14667 when the 3.11.5 driver version with shaded metrics
dependencies is released, it will be fairly easy to handle the
cassandra-related part and get rid of the old metrics version in
Cassandra itself (the number of changes to the Cassandra part is also
minimal ~10 lines of changes overall). Although I can't do anything to
speed up the driver release process (can I?) I have tested everything
locally and all the dtests that failed when I upgraded the metrics
version started passing locally afterwards. The changes for the driver
are fairly straightforward and simple [1], the main question is how
much effort is needed to prepare and deploy a new driver version which
I'm not aware of. According to the latest discussion in the
corresponding jira ticket we are in a good shape now, but if we still
have a lack of volunteers for this just give me the permissions and a
link to the release documentation and I'll prepare everything we need
to move it forward :-)

Do we need a separate issue for 'org.caffinitas.ohc:ohc-core-j8' or we
should handle everything under [2]?


[1] https://github.com/datastax/java-driver/pull/1685/files
[2] https://issues.apache.org/jira/browse/CASSANDRA-18777

On Wed, 23 Aug 2023 at 16:51, C. Scott Andreas  wrote:
>
> Given how early we are in the cycle with even the branch only recently cut 
> (and how much is not yet present in the alpha), housecleaning seems like a 
> positive impulse.
>
> On Aug 23, 2023, at 7:28 AM, Ekaterina Dimitrova  
> wrote:
>
>
> Hi everyone,
>
> I wanted to clarify something. I understood dependency updates/cleaning can 
> also be done in an alpha release if they lead to minimal user-facing changes, 
> if any at all. I agree with that in our first 5.0 alpha release because we 
> are not yet feature-complete. It is a good time for people to do a bit of 
> housekeeping and tighten some loose ends.
> Do you think this is a valid statement? Thoughts?
> I wanted to clear this topic as we have a few in-flight tickets/discussions:
>
> - CASSANDRA-14667 - upgrade dropwizard metrics, for which to be accommodated, 
> Bret is creating a new 3.11.4 drivers version. So we should update the 
> driver. I am unsure how much effort and change it will be on our side to 
> update the drivers though. Maxim, did you try it? Any thoughts?
>
>
> - CASSANDRA-18789 - commons-lang3, a pretty non-controversial bump with two 
> versions. The one we are on is tested to Java 11, and the newest one tests up 
> to JDK17 and beyond. This is enough reason for me honestly to update it.
>
>
>
> - In [1], Maxim mentioned that we can clean org.caffinitas.ohc:ohc-core-j8.
>
>
> - In [2], Stefan and Mick made a point that we could even remove in 5.0 
> metrics-reporter-config(CASSANDRA-18743) and commons-codec(CASSANDRA-18772)
>  I think this should be a good idea - let's make some noise in the user group 
> to ensure people are aware and no one raises any significant concerns and 
> then clean those two. I also want to hear if Abe still has concerns about not 
> following deprecation process here.
>
> And if we decide, we can find a few more loose ends to deal with. I am sure.
>
> Looking forward to your feedback and thoughts.
>
> Best regards,
> Ekaterina
>
>
>
> [1] https://lists.apache.org/thread/9m1vz5qyows97wlppkwk1fd8386rj9q1
> [2] https://lists.apache.org/thread/9m1vz5qyows97wlppkwk1fd8386rj9q1
>
>


Re: [DISCUSSION] Dependency management in our first alpha release

2023-08-29 Thread Maxim Muzafarov
A few updates.

We've posted a message to the user-list asking the question about the
use of the metrics-reporter-config library to make sure we are on the
safe side with the removal:
https://lists.apache.org/thread/c4m3tc08zhd4d41zs05jcdkr3gjwlhno

The issue for the `org.caffinitas.ohc:ohc-core-j8` is here, we'll try
to handle it:
https://issues.apache.org/jira/browse/CASSANDRA-18799

On Fri, 25 Aug 2023 at 18:39, Ekaterina Dimitrova  wrote:
>
> Thank you all. We are going to continue with those tickets and related 
> problems then.
>
> On Maxim's question:
> "Do we need a separate issue for 'org.caffinitas.ohc:ohc-core-j8' or we 
> should handle everything under [2]?"
>
> It depends on whether someone has the time to sit and deal with the complete 
> list as soon as possible or we should do divide and conquer. It will also 
> require some archeology and potential discussions with users in some cases, 
> etc.
>
> Best regards,
> Ekaterina
>
>
>
> On Wed, 23 Aug 2023 at 17:29, Abe Ratnofsky  wrote:
>>
>> > I also want to hear if Abe still has concerns about not following 
>> > deprecation process here.
>>
>> I support removing the library on an expedited schedule, rather than waiting 
>> for a full major of deprecation. We still have a large surface for metrics 
>> integrations, and users who depended on metrics-reporter-config will have a 
>> path forward if they need similar functionality.
>>
>> On Aug 23, 2023, at 07:28, Ekaterina Dimitrova  wrote:
>>
>> I also want to hear if Abe still has concerns about not following 
>> deprecation process here.


Re: CASSANDRA-18773 compaction speedup

2023-09-27 Thread Maxim Muzafarov
Hello Stefan,

+1
Do we plan to release these changes? I am mostly interested in using
4.0, 4.1 :-)

On Tue, 26 Sept 2023 at 17:49, Miklosovic, Stefan
 wrote:
>
> Hi list,
>
> there is CASSANDRA-18773 we want to merge to 4.0 up to trunk (hence it will 
> be in 5.0 (alpha2)) and I want to be sure we are all OK with that (especially 
> for that 5.0 alpha release).
>
> The patch is significantly speeding up the compaction throughput for cases 
> when you have a lot of SSTables in a key-value table without secondary index.
>
> My colleague Cameron Zemek has identified and fixed the issue together with 
> help of Branimir Lambov.
>
> It is a little bit hard to believe but for cases when your table contains 
> thousands of SSTables and it does not have any 2i's, (tested on around cca 
> 2500 SSTables), we saw the speedup of 50x (fifty times) on compaction 
> throughput for major compactions. It is also, reportedly, affecting 
> operations when switching from STCS to LCS.
>
> As mentioned, we plan to merge this to 4.0, 4.1, 5.0 and trunk.
>
> Any objections to that?
>
> Regards


[DISCUSSION] Drift backwards compatibility from native protocol version growth to feature flags

2023-09-29 Thread Maxim Muzafarov
Hello everyone,


The problem that I'm struggling with is not directly related to the
topic I'm about to discuss now, but it probably illustrates the
greater complexity of backwards compatibility with the drivers we now
support. For instance, I want to replace the algorithm that is used to
calculate a CRC on the message payload with a new one, and since the
v5 of the native protocol has already been fixed there is no way to do
this without bumping the protocol version up to v6, which, in turn,
seems like too big a leap for such a small change. Right? Correct me
if I'm wrong.

>From a broader perspective, we use native protocol versioning to
provide backwards compatibility not only for the protocol changes
themselves, but also for the internal features that do not appear to
be not directly dependent on the protocol specification as well. I
would say a good example of this is the dependency of the
MessagingService version on the storage compatibility mode [2], which
makes these two subcomponents tightly coupled.

Another thing worth mentioning here is the number of Cassandra drivers
[1] that we have, which have to implement a monotonically growing
version of the native protocol in order to support new features. The
main problem with a monotonically growing version is that a driver
(and a driver's developer) can skip v6 if they are only interested in
a new feature that only appears in v7, without having fully
implemented v6. This is probably not a problem for the java, or python
drivers which always get a lot of attention, but could be a problem
for others. The next example here is an urgent fix, that might be
blocked by a heavyweight feature which is difficult to implement in a
particular driver.


= Proposal =

I think we could take a step aside and take a slightly different
approach here to addressing the same backward compatibility issues,
rather than bumping up the native protocol version every time. A
driver could send a bitmask to a server on a connection handshake with
"features" that it is interested in, and the server could then respond
to that driver with the features it supports from that list. I have
checked the handshake protocol and it seems that we have some bits in
reserve [3] in the Initiate message to allow this.

I see the following advantages:

- drivers will have enough flexibility to implement new features they
want, especially those drivers that have a lack of maintainers (not in
the order the native protocol specification grows up);
- it gives security plugins the flexibility to enable/disable features
they want on both the client and server sides;
- we decouple internal components and their internal versions from each other;
- allows us to push out urgent fixes or tuning of internal components,
e.g. tuning FrameEncoders/FradeDecoders in a way that we need;


Any thoughts?


[1] 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation#CEP8:DataStaxDriversDonation-Goals
[2] 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L223
[3] 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/HandshakeProtocol.java#L74


Re: [VOTE] Accept java-driver

2023-10-03 Thread Maxim Muzafarov
+1 (nb)

On Tue, 3 Oct 2023 at 19:48, Vinay Chella  wrote:
>
> +1 (nb)
>
> Thanks,
> Vinay Chella
>
>
> On Tue, Oct 3, 2023 at 10:44 AM Yifan Cai  wrote:
>>
>> +1
>> 
>> From: David Capwell 
>> Sent: Tuesday, October 3, 2023 9:45:02 AM
>> To: dev 
>> Subject: Re: [VOTE] Accept java-driver
>>
>> +1
>>
>> On Oct 3, 2023, at 8:32 AM, Chris Lohfink  wrote:
>>
>> +1
>>
>> On Tue, Oct 3, 2023 at 10:30 AM Jeff Jirsa  wrote:
>>
>> +1
>>
>>
>> On Mon, Oct 2, 2023 at 9:53 PM Mick Semb Wever  wrote:
>>
>> The donation of the java-driver is ready for its IP Clearance vote.
>> https://incubator.apache.org/ip-clearance/cassandra-java-driver.html
>>
>> The SGA has been sent to the ASF.  This does not require acknowledgement 
>> before the vote.
>>
>> Once the vote passes, and the SGA has been filed by the ASF Secretary, we 
>> will request ASF Infra to move the datastax/java-driver as-is to 
>> apache/java-driver
>>
>> This means all branches and tags, with all their history, will be kept.  A 
>> cleaning effort has already cleaned up anything deemed not needed.
>>
>> Background for the donation is found in CEP-8: 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
>>
>> PMC members, please take note of (and check) the IP Clearance requirements 
>> when voting.
>>
>> The vote will be open for 72 hours (or longer). Votes by PMC members are 
>> considered binding. A vote passes if there are at least three binding +1s 
>> and no -1's.
>>
>> regards,
>> Mick
>>
>>


Re: [DISCUSS] Maintain backwards compatibility after dependency upgrade in the 5.0

2023-10-04 Thread Maxim Muzafarov
Hello everyone,

Posting the thread update. We have merged the issue into trunk and
5.0, so basically there should be no problems and backwards
compatibility issues. I'd like to thank all of you for your
cooperation and I'm happy to see this update will be in use soon.

https://issues.apache.org/jira/browse/CASSANDRA-14667


On Thu, 27 Jul 2023 at 16:41, Josh McKenzie  wrote:
>
> +1 to the change pre 5.0.
>
> Any committers have bandwidth to review 
> https://issues.apache.org/jira/projects/CASSANDRA/issues/CASSANDRA-14667?
>
> PR can be found here: https://github.com/apache/cassandra/pull/2238/files
>
> On Thu, Jul 27, 2023, at 7:59 AM, Maxim Muzafarov wrote:
>
> Bump this topic up for visibility as the code freeze is coming soon.
>
> This seems like a good change to include in 5.0 as this kind of
> library upgrade is more natural when the major version changes. It is
> still possible to postpone it to 6.0, but the main concern here is
> that the current version of dropwizard metrics library is obsolete and
> no longer supported and it is better to avoid emergencies that could
> arise (like the panic with log4j library upgrade some time ago).
>
> The change itself is straightforward and deserves more eyes on it from
> my point of view.
>
> On Fri, 21 Jul 2023 at 14:51, Maxim Muzafarov  wrote:
> >
> > Hello everyone,
> >
> > It still needs a pair of eyes to push it forward.
> >
> >
> > I came across another good thing that might help us to overcome the
> > difficulties with the dropwizard metrics dependency upgrade. The
> > change relates to the driver itself and reuses the same approach that
> > was used to deal with the driver's netty dependencies. We need to
> > shade the dropwizard metrics classes and no longer rely on the
> > cassandra classpath at least for the 3.x version of the java driver,
> > and make the next 3.11.4 release of the java driver accordingly.
> >
> > The changes for the driver are here:
> > https://github.com/datastax/java-driver/pull/1685
> >
> > This will give us (and users as well) the confidence to move forward
> > with this change to 5.x alongside the 3.11 version of the driver
> > usage. Looking forward to your thoughts.
> >
> > Changes for the Cassandra part are here:
> > https://github.com/apache/cassandra/pull/2238/files
> >
> > On Mon, 3 Jul 2023 at 15:15, Maxim Muzafarov  wrote:
> > >
> > > I'd like to mention the approach we took here: to untangle the driver
> > > update in tests with the dropwizard library version (cassandra-driver
> > > 3.11 requires the "old" JMXReporter classes in the classpath) we have
> > > copied the classes into the tests themselves, as it is allowed by the
> > > Apache License 2.0. This way we can update the metrics library itself
> > > and then update the driver used in the tests afterwards.
> > >
> > > If there are no objections, we need another committer to take a look
> > > at these changes:
> > > https://issues.apache.org/jira/browse/CASSANDRA-14667
> > > https://github.com/apache/cassandra/pull/2238/files
> > >
> > > Thanks in advance for your help!
> > >
> > > On Wed, 28 Jun 2023 at 16:04, Bowen Song via dev
> > >  wrote:
> > > >
> > > > IMHO, anyone upgrading software between major versions should expect to
> > > > see breaking changes. Introducing breaking or major changes is the whole
> > > > point of bumping major version numbers.
> > > >
> > > > Since the library upgrade need to happen sooner or later, I don't see
> > > > any reason why it should not happen in the 5.0 release.
> > > >
> > > >
> > > > On 27/06/2023 19:21, Maxim Muzafarov wrote:
> > > > > Hello everyone,
> > > > >
> > > > >
> > > > > We use the Dropwizard Metrics 3.1.5 library, which provides a basic
> > > > > set of classes to easily expose Cassandra internals to a user through
> > > > > various interfaces (the most common being JMX). We want to upgrade
> > > > > this library version in the next major release 5.0 up to the latest
> > > > > stable 4.2.19 for the following reasons:
> > > > > - the 3.x (and 4.0.x) Dropwizard Metrics library is no longer
> > > > > supported, which means that if we face a critical CVE, we'll still
> > > > > need to upgrade, so it's better to do it sooner and more calmly;
> > > > > - as of 4.2.5 the library supports jdk11, jdk17, so we

[REVIEW REQUEST] Replace CRC32 with more efficient CRC32C for internode messaging

2023-10-05 Thread Maxim Muzafarov
Hello everyone,


This message is both a review request and my attempt to share with you
some of the benchmark results related to replacing the CRC32 algorithm
for the internode messaging protocol.

When a new connection is initiated between nodes, the corresponding
connection channel is configured with one of the frame encoder/decoder
pairs based on how a user has set a cluster YAML configuration. The
frame encoder is configured on the sender side, and the frame decoder
is configured on the receiver side. They use the CRC24 for frame
header protection and the CRC32 for frame payload protection.

Since we've dropped support for the jdk8 [1] in the upcoming release,
it's now possible to use the more efficient CRC32C algorithm, which
ships with jdk11 by default, without any additional overhead for us. I
have prepared a patch [4] to replace the CRC32 algorithm with the more
efficient CRC32C for frames payload. I've prototyped a new solution
and done some research into the reasons for dropping both the CRC24
and the CRC32, but replacing only the CRC32 for payload and keeping
the CRC24 as it is seems to be more efficient for us, and simplifies
the whole solution (see [2] for details).

Benchmarks seem to point us in the right direction (see [3]).

Decode

CRC  avgt4   382.764 ±  75.327  ns/op
CRC32C  avgt4   272.388 ±  63.618  ns/op

Encode

CRC  avgt4   382.130 ±  12.318  ns/op
CRC32C  avgt4   311.646 ±  17.114  ns/op


So, if anyone is interested in these changes and can verify them, I
would be happy to help and answer any questions.


[1] https://issues.apache.org/jira/browse/CASSANDRA-18255
[2] 
https://issues.apache.org/jira/browse/CASSANDRA-16360?focusedCommentId=17771183&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17771183
[3] https://gist.github.com/Mmuzaf/a7ac6f5759c16ee24683856704e7c941
[4] https://github.com/apache/cassandra/pull/2647


Re: [DISCUSSION] Cassandra's code style and source code analysis

2023-10-06 Thread Maxim Muzafarov
Hello everyone,

Some updates.
There are issues that we have put on hold, waiting for the CEPs to be
finalized. The java imports are one of these issues, let's do not
forget them ^^

I've created a label to track it:
https://issues.apache.org/jira/issues/?jql=labels%20%3D%20code-polishing

On Tue, 1 Aug 2023 at 10:46, Miklosovic, Stefan
 wrote:
>
> I think we might wait for Accord and transactional metadata as the last big 
> contributions in 5.0 (if I have not forgotten something) and then we can just 
> polish it all just before the release. There will be still some room to do 
> the housekeeping like this after these patches lend. It is not like Accord 
> will be in trunk on Monday and we release Tuesday ...
>
> ________
> From: Maxim Muzafarov 
> Sent: Monday, July 31, 2023 23:05
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
>
> Hello everyone,
>
>
> It's been a long time since the last discussion about the import order
> code style, so I want to give these changes a chance as all the major
> JIRA issues have already landed on the release branch so we won't
> affect anyone. I'd be happy to find any reviewers who are interested
> in helping with the next steps :-) I've updated the changes to reflect
> the latest checkstyle work, so here they are:
>
> https://issues.apache.org/jira/browse/CASSANDRA-17925
> https://github.com/apache/cassandra/pull/2108
>
>
> The changes look scary at first glance, but they're actually quite
> simple and in line with what we've discussed above. In short, we can
> divide all the affected files into two parts: the update of the code
> style configuration files (checkstyle + IDE configs), and the update
> of all the sources to match the code style.
>
> In short:
>
> - "import order" hotkey will work regardless of which IDE you are using;
> - updated checkstyle configuration, and IDEA, Eclipse, NetBeans
> configurations have been updated;
> - AvoidStarImport checkstyle rule applied as well;
>
> The import order we've agreed upon:
>
> java.*
> [blank line]
> javax.*
> [blank line]
> com.*
> [blank line]
> net.*
> [blank line]
> org.*
> [blank line]
> org.apache.cassandra.*
> [blank line]
> all other imports
> [blank line]
> static all other imports
>
> On Mon, 27 Feb 2023 at 13:26, Maxim Muzafarov  wrote:
> >
> > > I suppose it can be easy for the existing feature branches if they have a 
> > > single commit. Don't we need to adjust each commit for multi-commit 
> > > feature branches?
> >
> > It depends on how feature branches are maintained and developed, I
> > guess. My thoughts here are that the IDE's hotkeys should just work to
> > resolve any code-style issues that arise during rebase/maintenance.
> > I'm not talking about enforcing all our code-style rules but giving
> > developers good flexibility. The classes import order rule might be a
> > good example here.
> >
> > On Wed, 22 Feb 2023 at 21:27, Jacek Lewandowski
> >  wrote:
> > >
> > > I suppose it can be easy for the existing feature branches if they have a 
> > > single commit. Don't we need to adjust each commit for multi-commit 
> > > feature branches?
> > >
> > > śr., 22 lut 2023, 19:48 użytkownik Maxim Muzafarov  
> > > napisał:
> > >>
> > >> Hello everyone,
> > >>
> > >> I have created an issue CASSANDRA-18277 that may help us move forward
> > >> with code style changes. It only affects the way we store the IntelliJ
> > >> code style configuration and has no effect on any current (or any)
> > >> releases, so it should be safe to merge. So, once the issue is
> > >> resolved, every developer that checkouts a release branch will use the
> > >> same code style stored in that branch. This in turn makes rebasing a
> > >> big change like the import order [1] a really straightforward matter
> > >> (by pressing Crtl + Opt + O in their local branch to organize
> > >> imports).
> > >>
> > >> See:
> > >>
> > >> Move the IntelliJ Idea code style and inspections configuration to the
> > >> project's root .idea directory
> > >> https://issues.apache.org/jira/browse/CASSANDRA-18277
> > >>
> > >>

Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-10 Thread Maxim Muzafarov
Hello everyone,


I've discussed with Stefan some steps we can take to improve the final
solution, so the final version might look like this:

/** @deprecated See CASSANDRA-6504 */
@Deprecated(since = "2.1")
public Integer concurrent_replicates = null;

The issue number will be taken from the git blame comment. I doubt I
can generate and/or create a meaningful comment for every deprecation
annotation, but providing a link to the issue that was retrieved from
the git blame is not too big a problem. This also improves the
visibility.

In addition, we can add two checkstyle rules [1] [2] to ensure that
any future deprecations will have a "since" element and a JavaDoc
comment.
WDYT?

[1] 
https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/annotation/MissingDeprecatedCheck.html
[2] 
https://checkstyle.org/apidocs/com/puppycrawl/tools/checkstyle/checks/coding/MatchXpathCheck.html

On Tue, 10 Oct 2023 at 14:50, Josh McKenzie  wrote:
>
> Sounds like we're relitigating the basics of how @Deprecated, forRemoval, 
> since, and javadoc @link all intersect to make deprecation less painful ;)
>
> So:
>
> Built-in java.lang.Deprecated: required
> Can use since and forRemoval if you have that info handy and think it'd be 
> useful (would make it a lot easier to grep for things to pull before a major)
> If it's being replaced by something, you should {@link #} the javadoc for it 
> so people know where to bounce over to
>
> I've been leaning pretty heavily on the functionality of point 3 for 
> documenting cross-module implicit dependencies as I come across them lately 
> so that one resonates with me.
>
> On Tue, Oct 10, 2023, at 4:38 AM, Miklosovic, Stefan wrote:
>
> OK.
>
> Let's go with in-built java.lang.Deprecated annotation. If somebody wants to 
> document that in more detail, there are Javadocs as mentioned. Let's just 
> stick with the standard stuff.
>
> I will try to implement this for 5.0 (versions since it was deprecated) with 
> my take on what should be removed (forRemoval = true) but that should be 
> definitely cross-checked on review as Mick mentioned.
>
> 
> From: Mick Semb Wever 
> Sent: Monday, October 9, 2023 10:55
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] putting versions into Deprecated annotations
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
> Tangential question to this is if everything we deprecated is eligible for 
> removal? In other words, are there any cases when forRemoval would be false? 
> Could you elaborate on that and give such examples or do you all think that 
> everything which is deprecated will be eventually removed?
>
>
> Removal cannot be default.  This came up in the subtickets of CASSANDRA-18306.
>
> I suggest that adding " forRemoval = true" and the later actual removal of 
> the code both require broader consensus.  I'm open to that being on the 
> ticket or needing a thread on the ML.  Small stuff, common sense says on the 
> ticket is enough, but a few folk have already stated that deprecated code 
> that has minimal maintenance overhead should not be removed.
>
>


Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-11 Thread Maxim Muzafarov
Francisco,

I agree with your vision of the deprecation comments and actually, I
think we should recommend doing it that way for the cases where it is
applicable on our code-style page, but when things get to the
implementation phase there are some obstacles that are not easy to
overcome.

So, adding the MissingDeprecated will emphasize to a developer the
need to describe the deprecation reasons in comments, but
unfortunately, there is no general pattern that we can enforce for
every such description message and/or automatically validate its
meaningfulness. There may be no alternative for a deprecated field, or
it may simply be marked for deletion, so the pattern is slightly
different in this case.

Another problem is how to add meaningful comments to the deprecated
annotations that we already have in the code, since we can't enforce
checkstyle rules only on newly added code. This is a very exhausting
process with no 100% guarantee of accuracy - some of the commits don't
have a good commit message and require a deep archaeology.

All of the above led me to the following which is pretty easy to
achieve and improves the code quality:

/** @deprecated See CASSANDRA-6504 */
@Deprecated(since = "2.1")
public Integer concurrent_replicates = null;

On Wed, 11 Oct 2023 at 09:51, Miklosovic, Stefan
 wrote:
>
> Here (1) it supports check of both Javadoc and annotation at the same time so 
> what you want is possible. What is not possible is to checkstyle the 
> _content_ of deprecated Javadoc nor any format of it. I think that ensuring 
> the presence of both annotation and Javadoc comment is just enough.
>
> (1) 
> https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/annotation/MissingDeprecatedCheck.html
>
> 
> From: Francisco Guerrero 
> Sent: Tuesday, October 10, 2023 23:34
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] putting versions into Deprecated annotations
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
>
> To me this seems insufficient. As a developer, I'd like to see what the 
> alternative is when reading the javadoc without having to go to Jira.
>
> What I would prefer is to know what the alternative is and how to use it. For 
> example:
>
> /** @deprecated Use {@link #alternative} instead. See CASSANDRA-6504 */
> @Deprecated(since = "2.1")
> public Integer concurrent_replicates = null;
>
> I am not sure if checkstyle can enforce the above, so the mechanisms to 
> enforce it would still need to be laid out, unless we can easily support 
> something like the above with checkstyle rules.
>
> On 2023/10/10 20:34:27 Maxim Muzafarov wrote:
> > Hello everyone,
> >
> >
> > I've discussed with Stefan some steps we can take to improve the final
> > solution, so the final version might look like this:
> >
> > /** @deprecated See CASSANDRA-6504 */
> > @Deprecated(since = "2.1")
> > public Integer concurrent_replicates = null;
> >
> > The issue number will be taken from the git blame comment. I doubt I
> > can generate and/or create a meaningful comment for every deprecation
> > annotation, but providing a link to the issue that was retrieved from
> > the git blame is not too big a problem. This also improves the
> > visibility.
> >
> > In addition, we can add two checkstyle rules [1] [2] to ensure that
> > any future deprecations will have a "since" element and a JavaDoc
> > comment.
> > WDYT?
> >
> > [1] 
> > https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/annotation/MissingDeprecatedCheck.html
> > [2] 
> > https://checkstyle.org/apidocs/com/puppycrawl/tools/checkstyle/checks/coding/MatchXpathCheck.html
> >
> > On Tue, 10 Oct 2023 at 14:50, Josh McKenzie  wrote:
> > >
> > > Sounds like we're relitigating the basics of how @Deprecated, forRemoval, 
> > > since, and javadoc @link all intersect to make deprecation less painful ;)
> > >
> > > So:
> > >
> > > Built-in java.lang.Deprecated: required
> > > Can use since and forRemoval if you have that info handy and think it'd 
> > > be useful (would make it a lot easier to grep for things to pull before a 
> > > major)
> > > If it's being replaced by something, you should {@link #} the javadoc for 
> > > it so people know where to bounce over to
> > >
> > > I've been leaning pretty heavily on the functionality of point 3 for 
> > > documenting cross-module implic

Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-13 Thread Maxim Muzafarov
I think the source code can describe the intention better than words :-)

The link to our Code Style with a discussion "summary":
https://github.com/apache/cassandra-website/pull/245/files

The link to the pull request with the proposed changes (only "since"
added and description):
https://github.com/apache/cassandra/pull/2801/files

On Fri, 13 Oct 2023 at 14:45, Benjamin Lerer  wrote:
>
> Ok, thanks Stefan I understand the context better now. Looking at the PR.
> Some make sense also for serialization reasons but some make no sense to me.
>
>
> Le ven. 13 oct. 2023 à 14:26, Benjamin Lerer  a écrit :
>>>
>>> I’ve been told in the past not to remove public methods in a patch release 
>>> though.
>>
>>
>> Then I am curious to get the rationale behind that. If some piece of code is 
>> not used anymore then simplifying the code is the best thing to do. It makes 
>> maintenance easier and avoids mistakes.
>> Le ven. 13 oct. 2023 à 14:11, Miklosovic, Stefan via dev 
>>  a écrit :
>>>
>>> Maybe for better understanding what we talk about, there is the PR which 
>>> implements the changes suggested here (1)
>>>
>>> It is clear that @Deprecated is not used exclusively on JMX / Configuration 
>>> but we use it internally as well. This is a very delicate topic and we need 
>>> to go, basically, one by one.
>>>
>>> I get that there might be some kind of a "nervousness" around this as we 
>>> strive for not breaking it unnecessarily so there might be a lot of 
>>> exceptions etc and I completely understand that but what I lack is clear 
>>> visibility into what we plan to do with it (if anything).
>>>
>>> There is deprecated stuff as old as Cassandra 1.2 / 2.0 (!!!) and it is 
>>> really questionable if we should not just get rid of that once for all. I 
>>> am OK with keeping it there if we decide that, but we should provide some 
>>> additional information like when it was deprecated and why it is necessary 
>>> to keep it around otherwise the code-base will bloat and bloat ...
>>>
>>> (1) https://github.com/apache/cassandra/pull/2801/files
>>>
>>> 
>>> From: Mick Semb Wever 
>>> Sent: Friday, October 13, 2023 13:51
>>> To: dev@cassandra.apache.org
>>> Subject: Re: [DISCUSS] putting versions into Deprecated annotations
>>>
>>> NetApp Security WARNING: This is an external email. Do not click links or 
>>> open attachments unless you recognize the sender and know the content is 
>>> safe.
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 13 Oct 2023 at 13:07, Benjamin Lerer 
>>> mailto:ble...@apache.org>> wrote:
>>> I was asking because outside of configuration parameters and JMX calls, the 
>>> approach as far as I remember was to just change things without using an 
>>> annotation.
>>>
>>>
>>> Yes, it is my understanding that such deprecation is only needed on 
>>> methods/objects that belong to some API/SPI component of ours that requires 
>>> compatibility.  This is much more than configuration and JMX, and can be 
>>> quite subtle in areas.   A failed attempt I started at this is here: 
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/%28wip%29+Compatibility+Planning
>>>
>>> But there will also be internal methods/objects marked as deprecated that 
>>> relate back to these compatibility concerns, annotated because their 
>>> connection and removal might not be so obvious when the time comes.


Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-10-18 Thread Maxim Muzafarov
Hello everyone,

It has been a long time since the last update on this thread, so I
wanted to share some status updates: The issue is still awaiting
review, but all my hopes are pinned on Benjamin :-)

My question here is, is there anything I can do to facilitate the
review for anyone who wants to delve into the patch?

I have a few thoughts to follow:
- CEPify the changes - this will allow us to see the result of the
discussion on a single page without having to re-read the whole
thread;
- Write a blog post with possible design solutions - this will both
reveal the results of the discussion and potentially will draw some
attention to the community;
- Presenting and discussing slides at one of the Cassandra Town Halls;

I tend to the 1-st and/or 2-nd points. What are the best practices we
have here for such cases though? Any thoughts?

On Tue, 11 Jul 2023 at 15:51, Maxim Muzafarov  wrote:
>
> Thank you for your comments and for sharing the ticket targeting
> strategy, I'm really happy to see this page where I have found all the
> answers to the questions I had. So, I tend towards your view and will
> just land this ticket on the 5.0 release only for now as it makes
> sense for me as well.
>
> I didn't add the feature flag for this feature because for 99% of the
> source code changes it only works with Cassandra internals leaving the
> public API unchanged. A few remarks on this are:
> - the display format of the vtable property has changed to match the
> yaml configuration style, this doesn't mean that we are displaying
> property values in a completely different way in fact the formats
> match with only 4 exceptions mentioned in the message above (this
> should be fine for the major release I hope);
> - a new column, which we've agreed to add (I'll fix the PR shortly);
>
>
> I would also like to mention the follow-up todos required by this
> issue to set the right expectations. Currently, we've brought a few
> properties under the framework to make them updateable with the
> SettingsTable, so that you can keep focusing on the framework itself
> rather than on tagging the configuration properties themselves with
> the @Mutable annotation. Although the solution is self-sufficient for
> the already tagged properties, we still need to bring the rest of them
> under the framework afterwards. I'll create an issue and do it right,
> we'll be done with the inital patch.
>
>
> On Fri, 7 Jul 2023 at 20:37, Josh McKenzie  wrote:
> >
> > This really is great work Maxim; definitely appreciate all the hard work 
> > that's gone into it and I think the users will too.
> >
> > In terms of where it should land, we discussed this type of question at 
> > length on the ML awhile ago and ended up codifying it in the wiki: 
> > https://cwiki.apache.org/confluence/display/CASSANDRA/Patching%2C+versioning%2C+and+LTS+releases
> >
> > When working on a ticket, use the following guideline to determine which 
> > branch to apply it to (Note: See How To Commit for details on the commit 
> > and merge process)
> >
> > Bugfix: apply to oldest applicable LTS and merge up through latest GA to 
> > trunk
> >
> > In the event you need to make changes on the merge commit, merge with -s 
> > ours and revise the commit via --amend
> >
> > Improvement: apply to trunk only (next release)
> >
> > Note: refactoring and removing dead code qualifies as an Improvement; our 
> > priority is stability on GA lines
> >
> > New Feature: apply to trunk only (next release)
> >
> > Our priority is to keep the 2 LTS releases and latest GA stable while 
> > releasing new "latest GA" on a cadence that provides new improvements and 
> > functionality to users soon enough to be valuable and relevant.
> >
> >
> > So in this case, target whatever unreleased next feature release (i.e. 
> > SEMVER MAJOR || MINOR) we have on deck.
> >
> > On Thu, Jul 6, 2023, at 1:21 PM, Ekaterina Dimitrova wrote:
> >
> > Hi,
> >
> > First of all, thank you for all the work!
> > I personally think that it should be ok to add a new column.
> >
> > I will be very happy to see this landing in 5.0.
> > I am personally against porting this patch to 4.1. To be clear, I am sure 
> > you did a great job and my response would be the same to every single 
> > person - the configuration is quite wide-spread and the devil is in the 
> > details. I do not see a good reason for exception here except convenience. 
> > There is no feature flag for these changes too, right?
> >
> > Best regards,
> > Ekaterina
> >
> > На четвъртък, 6 юли 2023 г. Miklos

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-26 Thread Maxim Muzafarov
Personally, I think frequent releases (2-3 per year) are better than
infrequent big releases. I can understand all the concerns from a
marketing perspective, as smaller major releases may not shine as
brightly as a single "game changer" release. However, smaller
releases, especially if they don't have backwards compatibility
issues, are better for the engineering and SRE teams because if a
long-awaited feature is delayed for any reason, there should be no
worry about getting it in right into the next release.

An analogy here might be that if you miss your train (small release)
due to circumstances, you can wait right here for the next one, but if
you miss a flight (big release), you will go back home :-) This is why
I think that the 5.0, 5.1, 5.2, etc. are better and I support Mick's
plan with the caveat that we should release 5.1 when we think we are
ready to do so. Here is an example of the Postgres releases [1].

[1] https://bucardo.org/postgres_all_versions.html


Another little thing that I'd like to mention is a release management
story. In the Apache Ignite project, we've got used to creating a
release thread and posting the release status updates and/or problems,
and/or delays there, and maybe some of the benchmarks at the end. Of
course, this was done by the release manager who volunteered to do
this work. I'm not saying we're doing anything wrong here, no, but the
publicity and openness, coupled with regular updates, could help
create a real sense of the remaining work in progress. These are my
personal feelings, and definitely not actions to be taken. The example
is here: [2].

[2] https://lists.apache.org/thread/m11m0nxq701f2cj8xxdcsc4nnn2sm8ql

On Thu, 26 Oct 2023 at 11:15, Benjamin Lerer  wrote:
>>
>> Regarding the release of 5.1, I understood the proposal to be that we cut an 
>> actual alpha, thereby sealing the 5.1 release from new features. Only 
>> features merged before we cut the alpha would be permitted, and the alpha 
>> should be cut as soon as practicable. What exactly would we be waiting for?
>
>
> The problem I believe is about expectations. It seems that your expectation 
> is that a release with only TCM and Accord will reach GA quickly. Based on 
> the time it took us to release 4.1, I am simply expecting more delays (a GA 
> around end of May, June). In which case it seems to me that we could be 
> interested in shipping more stuff in the meantime (thinking of 
> CASSANDRA-15254 or CEP-29 for example).
> I do not have a strong opinion, I just want to make sure that we all share 
> the same understanding and fully understand what we agree upon.
>
> Le jeu. 26 oct. 2023 à 10:59, Benjamin Lerer  a écrit :
>>>
>>> I am surprised this needs to be said, but - especially for long-running 
>>> CEPs - you must involve yourself early, and certainly within some 
>>> reasonable time of being notified the work is ready for broader input and 
>>> review. In this case, more than six months ago.
>>
>>
>> It is unfortunately more complicated than that because six month ago 
>> Ekaterina and I were working on supporting Java 17 and dropping Java 8 which 
>> was needed by different ongoing works. We both missed the announcement that 
>> TCM was ready for review and anyway would not have been available at that 
>> time. Maxim has asked me ages ago for a review of CASSANDRA-15254  more than 
>> 6 months ago and I have not been able to help him so far. We all have a 
>> limited bandwidth and can miss some announcements.
>>
>> The project has grown and a lot of things are going on in parallel. There 
>> are also more interdependencies between the different projects. In my 
>> opinion what we are lacking is a global overview of the different things 
>> going on in the project and some rough ideas of the status of the different 
>> significant pieces. It would allow us to better organize ourselves.
>>
>> Le jeu. 26 oct. 2023 à 00:26, Benedict  a écrit :
>>>
>>> I have spoken privately with Ekaterina, and to clear up some possible 
>>> ambiguity: I realise nobody has demanded a delay to this work to conduct 
>>> additional reviews; a couple of folk have however said they would prefer 
>>> one.
>>>
>>>
>>> My point is that, as a community, we need to work on ensuring folk that 
>>> care about a CEP participate at an appropriate time. If they aren’t able 
>>> to, the consequences of that are for them to bear.
>>>
>>>
>>> We should be working to avoid surprises as CEP start to land. To this end, 
>>> I think we should work on some additional paragraphs for the governance doc 
>>> covering expectations around the landing of CEPs.
>>>
>>>
>>> On 25 Oct 2023, at 21:55, Benedict  wrote:
>>>
>>> 
>>>
>>> I am surprised this needs to be said, but - especially for long-running 
>>> CEPs - you must involve yourself early, and certainly within some 
>>> reasonable time of being notified the work is ready for broader input and 
>>> review. In this case, more than six months ago.
>>>
>>>
>>> This isn’t the first 

Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-10-26 Thread Maxim Muzafarov
Josh,

Technically speaking, it will be the 3rd one, so yeah, the main goal
is to make the review process easier, my hands are trembling when I
look at the long technical discussion in that Jira issue :-) but I
also thought it might be a good idea to share the issue status with
the ML since the thread hasn't been updated for a while, and maybe get
some attention outside of the Community for this improvement by
writing a blog post. Sort of all in one.

On Wed, 25 Oct 2023 at 15:00, Josh McKenzie  wrote:
>
> Is the primary pain point you're trying to solve getting a 2nd committer 
> reviewer Maxim? And / or making the review process simpler / cleaner for 
> someone?
>
> On Wed, Oct 18, 2023, at 5:06 PM, Maxim Muzafarov wrote:
>
> Hello everyone,
>
> It has been a long time since the last update on this thread, so I
> wanted to share some status updates: The issue is still awaiting
> review, but all my hopes are pinned on Benjamin :-)
>
> My question here is, is there anything I can do to facilitate the
> review for anyone who wants to delve into the patch?
>
> I have a few thoughts to follow:
> - CEPify the changes - this will allow us to see the result of the
> discussion on a single page without having to re-read the whole
> thread;
> - Write a blog post with possible design solutions - this will both
> reveal the results of the discussion and potentially will draw some
> attention to the community;
> - Presenting and discussing slides at one of the Cassandra Town Halls;
>
> I tend to the 1-st and/or 2-nd points. What are the best practices we
> have here for such cases though? Any thoughts?
>
> On Tue, 11 Jul 2023 at 15:51, Maxim Muzafarov  wrote:
> >
> > Thank you for your comments and for sharing the ticket targeting
> > strategy, I'm really happy to see this page where I have found all the
> > answers to the questions I had. So, I tend towards your view and will
> > just land this ticket on the 5.0 release only for now as it makes
> > sense for me as well.
> >
> > I didn't add the feature flag for this feature because for 99% of the
> > source code changes it only works with Cassandra internals leaving the
> > public API unchanged. A few remarks on this are:
> > - the display format of the vtable property has changed to match the
> > yaml configuration style, this doesn't mean that we are displaying
> > property values in a completely different way in fact the formats
> > match with only 4 exceptions mentioned in the message above (this
> > should be fine for the major release I hope);
> > - a new column, which we've agreed to add (I'll fix the PR shortly);
> >
> >
> > I would also like to mention the follow-up todos required by this
> > issue to set the right expectations. Currently, we've brought a few
> > properties under the framework to make them updateable with the
> > SettingsTable, so that you can keep focusing on the framework itself
> > rather than on tagging the configuration properties themselves with
> > the @Mutable annotation. Although the solution is self-sufficient for
> > the already tagged properties, we still need to bring the rest of them
> > under the framework afterwards. I'll create an issue and do it right,
> > we'll be done with the inital patch.
> >
> >
> > On Fri, 7 Jul 2023 at 20:37, Josh McKenzie  wrote:
> > >
> > > This really is great work Maxim; definitely appreciate all the hard work 
> > > that's gone into it and I think the users will too.
> > >
> > > In terms of where it should land, we discussed this type of question at 
> > > length on the ML awhile ago and ended up codifying it in the wiki: 
> > > https://cwiki.apache.org/confluence/display/CASSANDRA/Patching%2C+versioning%2C+and+LTS+releases
> > >
> > > When working on a ticket, use the following guideline to determine which 
> > > branch to apply it to (Note: See How To Commit for details on the commit 
> > > and merge process)
> > >
> > > Bugfix: apply to oldest applicable LTS and merge up through latest GA to 
> > > trunk
> > >
> > > In the event you need to make changes on the merge commit, merge with -s 
> > > ours and revise the commit via --amend
> > >
> > > Improvement: apply to trunk only (next release)
> > >
> > > Note: refactoring and removing dead code qualifies as an Improvement; our 
> > > priority is stability on GA lines
> > >
> > > New Feature: apply to trunk only (next release)
> > >
> > > Our priority is to keep the 2 LTS releases and latest G

Re: [EXTERNAL] Re: [VOTE] Release Apache Cassandra 5.0-alpha2

2023-11-02 Thread Maxim Muzafarov
+1 (nb)

On Wed, 1 Nov 2023 at 03:26, guo Maxwell  wrote:
>
> +1
>
> German Eichberger via dev  于2023年11月1日周三 04:58写道:
>>
>> +1
>>
>> Heck, yeah, we already tested the branch (build ourselves) and it works 
>> great so far.
>> 
>> From: Mick Semb Wever 
>> Sent: Tuesday, October 31, 2023 1:43 PM
>> Cc: dev 
>> Subject: [EXTERNAL] Re: [VOTE] Release Apache Cassandra 5.0-alpha2
>>
>> > The vote will be open for 72 hours (longer if needed). Everyone who
>> > has tested the build is invited to vote. Votes by PMC members are
>> > considered binding. A vote passes if there are at least three binding
>> > +1s and no -1's.
>>
>>
>> +1
>>
>> Checked
>> - signing correct
>> - checksums are correct
>> - source artefact builds (JDK 11+17)
>> - binary artefact runs (JDK 11+17)
>> - debian package runs (JDK 11+17)
>> - debian repo runs (JDK 11+17)
>> - redhat* package runs (JDK11+17)
>> - redhat* repo runs (JDK 11+17)


Re: Releasing of Cassandra 3.x / 4.x

2023-11-03 Thread Maxim Muzafarov
+1

you've mentioned some important fixes earlier [1], and we are waiting
for them as well :-)

[1] https://issues.apache.org/jira/browse/CASSANDRA-18773

On Fri, 3 Nov 2023 at 22:55, Miklosovic, Stefan via dev
 wrote:
>
> Hi list,
>
> is anybody against cutting some 3.x and 4.x releases? I think that is nice to 
> do before summit. The last 4.x were released late July, 3.0 in the middle of 
> May. There is quite a lot of changes in these branches.
>
> I can release it all.
>
> What is your opinion?
>
> Regards


[DISCUSSION] CEP-38: CQL Management API

2023-11-13 Thread Maxim Muzafarov
Hello everyone,

While we are still waiting for the review to make the settings virtual
table updatable (CASSANDRA-15254), which will improve the
configuration management experience for users, I'd like to take
another step forward and improve the C* management approach we have as
a whole. This approach aims to make all Cassandra management commands
accessible via CQL, but not only that.

The problem of making commands accessible via CQL presents a complex
challenge, especially if we aim to minimize code duplication across
the implementation of management operations for different APIs and
reduce the overall maintenance burden. The proposal's scope goes
beyond simply introducing a new CQL syntax. It encompasses several key
objectives for C* management operations, beyond their availability
through CQL:
- Ensure consistency across all public APIs we support, including JMX
MBeans and the newly introduced CQL. Users should see consistent
command specifications and arguments, irrespective of whether they're
using an API or a CLI;
- Reduce source code maintenance costs. With this new approach, when a
new command is implemented, it should automatically become available
across JMX MBeans, nodetool, CQL, and Cassandra Sidecar, eliminating
the need for additional coding;
- Maintain backward compatibility, ensuring that existing setups and
workflows continue to work the same way as they do today;

I would suggest discussing the overall design concept first, and then
diving into the CQL command syntax and other details once we've found
common ground on the community's vision. However, regardless of these
details, I would appreciate any feedback on the design.

I look forward to your comments!

Please, see the design document: CEP-38: CQL Management API
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-38%3A+CQL+Management+API


Re: [DISCUSSION] CEP-38: CQL Management API

2023-11-15 Thread Maxim Muzafarov
Hello German,

Thanks for the links, I've seen this project before, but to be honest
I've never delved that deep into it. I'll definitely check it out for
more details, give me a few days to be in context!

As for the admin port, it's already part of the proposal, as discussed
in Slack. This port is needed not only because we are mixing the data
and control planes, but also because the native protocol can be
disabled, e.g. manually via the nodetool disablebinary command, or via
the disk_failure_policy 'stop' policy, which shuts down all
transports, leaving a node only operable via JMX which doesn't match
our goals.



On Wed, 15 Nov 2023 at 18:52, German Eichberger via dev
 wrote:
>
> Hi Maxim,
>
> We have adopted/forked the agent part of the 
> https://github.com/k8ssandra/management-api-for-apache-cassandra project 
> which aims to do similar things. I especially like how they have a local 
> database socket where a sidecar can easily access cassandra and execute cql 
> commands without the need of a service account like your example suggests.
>
> The syntax they adopted (see for instance 
> https://github.com/k8ssandra/management-api-for-apache-cassandra/blob/7cb367eac46a12947bb87486456d3f905f37628b/management-api-server/src/main/java/com/datastax/mgmtapi/resources/NodeOpsResources.java#L115)
>  looks like `CALL NodeOps.decommission(?, ?)", force, false)` which is 
> similar to your execute - just throwing this out as another example.
>
> I definitely like settling on the cql interface since that avoids having to 
> load different jmx bindings for different Cassandra versions making things 
> cleaner and more easily accessible. There is some security concern to mix 
> data and control plane so I would liek to see some way to restrict access 
> like the mgmt api does where the admin commands are only available on the 
> socket. Maybe, have a special admin port or socket?
>
> I  prefer making the agent part of the managment api become part of Cassandra 
> either through your CEP or other means but I can also see this as an adjacent 
> sub project  - let's discuss 🙂
>
> German
>
> 
> From: Maxim Muzafarov 
> Sent: Monday, November 13, 2023 10:08 AM
> To: dev@cassandra.apache.org 
> Subject: [EXTERNAL] [DISCUSSION] CEP-38: CQL Management API
>
> Hello everyone,
>
> While we are still waiting for the review to make the settings virtual
> table updatable (CASSANDRA-15254), which will improve the
> configuration management experience for users, I'd like to take
> another step forward and improve the C* management approach we have as
> a whole. This approach aims to make all Cassandra management commands
> accessible via CQL, but not only that.
>
> The problem of making commands accessible via CQL presents a complex
> challenge, especially if we aim to minimize code duplication across
> the implementation of management operations for different APIs and
> reduce the overall maintenance burden. The proposal's scope goes
> beyond simply introducing a new CQL syntax. It encompasses several key
> objectives for C* management operations, beyond their availability
> through CQL:
> - Ensure consistency across all public APIs we support, including JMX
> MBeans and the newly introduced CQL. Users should see consistent
> command specifications and arguments, irrespective of whether they're
> using an API or a CLI;
> - Reduce source code maintenance costs. With this new approach, when a
> new command is implemented, it should automatically become available
> across JMX MBeans, nodetool, CQL, and Cassandra Sidecar, eliminating
> the need for additional coding;
> - Maintain backward compatibility, ensuring that existing setups and
> workflows continue to work the same way as they do today;
>
> I would suggest discussing the overall design concept first, and then
> diving into the CQL command syntax and other details once we've found
> common ground on the community's vision. However, regardless of these
> details, I would appreciate any feedback on the design.
>
> I look forward to your comments!
>
> Please, see the design document: CEP-38: CQL Management API
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCASSANDRA%2FCEP-38%253A%2BCQL%2BManagement%2BAPI&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7C62051e1eb8964889962d08dbe473d482%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638354958369996874%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XT4LB1CopZy8qCUM6MnUfBhGFwKHmsUO%2B2AUpgv83zI%3D&reserved=0


Re: [EXTERNAL] Re: [DISCUSSION] CEP-38: CQL Management API

2023-11-23 Thread Maxim Muzafarov
ndra exposes several ports - 9042, 9142, 7000 and 7001. The 
> > sidecar runs on port 9043. Thats a lot of ports. I would prefer to allow 
> > users to access management functionality over one of the existing ports.
> >
> > I realize that this would mean a subtle change in behavior for 
> > disablebinary when we offer it over port 9042 and not when the operator 
> > decides to use a dedicated port.
> >
> > More importantly, I think having this functionality exposed over the 
> > storage ports may be even better. The storage ports are typically 
> > firewalled off from the end users. Operators and tooling, however, usually 
> > have access to these ports. This especially makes sense from a security 
> > standpoint where we'd like to limit users from accessing management 
> > functionality.
> >
> > What do others think about this approach?
> >
> > thanks,
> >
> > Dinesh
> >
> > > On Nov 13, 2023, at 10:08 AM, Maxim Muzafarov  wrote:
> > >
> > > Hello everyone,
> > >
> > > While we are still waiting for the review to make the settings virtual
> > > table updatable (CASSANDRA-15254), which will improve the
> > > configuration management experience for users, I'd like to take
> > > another step forward and improve the C* management approach we have as
> > > a whole. This approach aims to make all Cassandra management commands
> > > accessible via CQL, but not only that.
> > >
> > > The problem of making commands accessible via CQL presents a complex
> > > challenge, especially if we aim to minimize code duplication across
> > > the implementation of management operations for different APIs and
> > > reduce the overall maintenance burden. The proposal's scope goes
> > > beyond simply introducing a new CQL syntax. It encompasses several key
> > > objectives for C* management operations, beyond their availability
> > > through CQL:
> > > - Ensure consistency across all public APIs we support, including JMX
> > > MBeans and the newly introduced CQL. Users should see consistent
> > > command specifications and arguments, irrespective of whether they're
> > > using an API or a CLI;
> > > - Reduce source code maintenance costs. With this new approach, when a
> > > new command is implemented, it should automatically become available
> > > across JMX MBeans, nodetool, CQL, and Cassandra Sidecar, eliminating
> > > the need for additional coding;
> > > - Maintain backward compatibility, ensuring that existing setups and
> > > workflows continue to work the same way as they do today;
> > >
> > > I would suggest discussing the overall design concept first, and then
> > > diving into the CQL command syntax and other details once we've found
> > > common ground on the community's vision. However, regardless of these
> > > details, I would appreciate any feedback on the design.
> > >
> > > I look forward to your comments!
> > >
> > > Please, see the design document: CEP-38: CQL Management API
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCASSANDRA%2FCEP-38%253A%2BCQL%2BManagement%2BAPI&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7C510fbe97b579406b389f08dbe7ca5430%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638358628430485779%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aJcomfk5ufDIUqTFmUWzuvR18cFL8qAUS%2F3XwffqVqs%3D&reserved=0
> >
>
>
> --
> http://twitter.com/tjake


Re: Welcome Francisco Guerrero Hernandez as Cassandra Committer

2023-11-30 Thread Maxim Muzafarov
My congratulations, Francisco! :-)

On Wed, 29 Nov 2023 at 13:30, Andrés de la Peña  wrote:
>
> Congrats Francisco!
>
> On Wed, 29 Nov 2023 at 11:37, Benjamin Lerer  wrote:
>>
>> Congratulations!!! Well deserved!
>>
>> Le mer. 29 nov. 2023 à 07:31, Berenguer Blasi  a 
>> écrit :
>>>
>>> Welcome!
>>>
>>> On 29/11/23 2:24, guo Maxwell wrote:
>>>
>>> Congrats!
>>>
>>> Jacek Lewandowski  于2023年11月29日周三 06:16写道:

 Congrats!!!

 wt., 28 lis 2023, 23:08 użytkownik Abe Ratnofsky  napisał:
>
> Congrats Francisco!
>
> > On Nov 28, 2023, at 1:56 PM, C. Scott Andreas  
> > wrote:
> >
> > Congratulations, Francisco!
> >
> > - Scott
> >
> >> On Nov 28, 2023, at 10:53 AM, Dinesh Joshi  wrote:
> >>
> >> The PMC members are pleased to announce that Francisco Guerrero 
> >> Hernandez has accepted
> >> the invitation to become committer today.
> >>
> >> Congratulations and welcome!
> >>
> >> The Apache Cassandra PMC members
>


Re: [VOTE] Release Apache Cassandra 5.0-beta1

2023-11-30 Thread Maxim Muzafarov
I'm gonna take a moment to outline the question. Here we have a point
in time where a time-driven release process clashes with the
alpha/beta release naming convention: we want to have a beta ready
_before_ the Summit.

Here's the Cassandra release lifecycle document [1] that I found
(still under discussion I think) and according to the 'beta'
definition we should have a green CI and no regressions for a beta
release.  This means that there may be known bugs in the new features
we are trying to ship. Unless I'm not missing something, 5.0 currently
meets the 'beta' criteria and the definition itself sounds clear to
me.

So, the question is - should we find a better place for the [1] page
and move it somewhere under the 'officially accepted'? :-)

[1] https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle

On Thu, 30 Nov 2023 at 07:39, Jacek Lewandowski
 wrote:
>>
>> If we end up not releasing a final 5.0 artifact by a Cassandra Summit it 
>> will signal to the community that we’re prioritizing stability and it could 
>> be a good opportunity to get people to test the beta or RC before we stamp 
>> it as production ready.
>
>
> I agree with Paulo's comment
>
> czw., 30 lis 2023 o 04:44 Paulo Motta  napisał(a):
>>
>> > if any contributor has an opinion which is not technically refuted it will 
>> > usually be backed by a PMC via a binding -1
>>
>> clarifying a bit my personal view: if any contributor has an opinion against 
>> a proposal (in this case this release proposal) that is not refuted it will 
>> usually be backed by a PMC via binding -1
>>
>> Opinions supporting the proposal are also valuable, provided there are no 
>> valid claims against a proposal.
>>
>> On Wed, 29 Nov 2023 at 22:27 Paulo Motta  wrote:
>>>
>>> To me, the goal of a beta is to find unknown bugs. If no new bugs are found 
>>> during a beta release, then it can be automatically promoted to RC via 
>>> re-tagging. Likewise, if no new bugs are found during a RC after X time, 
>>> then it can be promoted to final.
>>>
>>> If we end up not releasing a final 5.0 artifact by a Cassandra Summit it 
>>> will signal to the community that we’re prioritizing stability and it could 
>>> be a good opportunity to get people to test the beta or RC before we stamp 
>>> it as production ready.
>>>
>>> WDYT?
>>>
>>> >  Aaron (and anybody who takes the time to follow this list, really), your 
>>> > opinion matters, that's why we discuss it here.
>>>
>>> +1, PMC are just officers who endorse community decisions, so if any 
>>> contributor has an opinion which is not technically refuted it will usually 
>>> be backed by a PMC via a binding -1 (as seen on this thread)
>>>
>>> On Wed, 29 Nov 2023 at 20:04 Nate McCall  wrote:



 On Thu, Nov 30, 2023 at 3:28 AM Aleksey Yeshchenko  
 wrote:
>
> -1 on cutting a beta1 in this state. An alpha2 would be acceptable now, 
> but I’m not sure there is significant value to be had from it. Merge the 
> fixes for outstanding issues listed above, then cut beta1.

 

 Agree with Aleksey. -1 on a beta we know has issues with a top-line new 
 feature.




Re: [VOTE] Release Apache Cassandra 5.0-beta1 (take2)

2023-12-05 Thread Maxim Muzafarov
+1 (nb)

run locally, executed some queries over vts

On Mon, 4 Dec 2023 at 15:15, Brandon Williams  wrote:
>
> +1
>
> Kind Regards,
> Brandon
>
> On Fri, Dec 1, 2023 at 7:32 AM Mick Semb Wever  wrote:
> >
> >
> > Proposing the test build of Cassandra 5.0-beta1 for release.
> >
> > sha1: 87fd1fa88a0c859cc32d9f569ad09ad0b345e465
> > Git: https://github.com/apache/cassandra/tree/5.0-beta1-tentative
> > Maven Artifacts: 
> > https://repository.apache.org/content/repositories/orgapachecassandra-1321/org/apache/cassandra/cassandra-all/5.0-beta1/
> >
> > The Source and Build Artifacts, and the Debian and RPM packages and 
> > repositories, are available here: 
> > https://dist.apache.org/repos/dist/dev/cassandra/5.0-beta1/
> >
> > The vote will be open for 72 hours (longer if needed). Everyone who has 
> > tested the build is invited to vote. Votes by PMC members are considered 
> > binding. A vote passes if there are at least three binding +1s and no -1's.
> >
> > [1]: CHANGES.txt: 
> > https://github.com/apache/cassandra/blob/5.0-beta1-tentative/CHANGES.txt
> > [2]: NEWS.txt: 
> > https://github.com/apache/cassandra/blob/5.0-beta1-tentative/NEWS.txt


Re: [DISCUSSION] Framework for Internal Collection Exposure and Monitoring API Alignment

2023-12-12 Thread Maxim Muzafarov
rue. Cassandra is running in a lot of places 
> in the world, and JMX has been in this ecosystem for a long time; we need 
> data that is basically impossible to get to claim "JMX is usually not used in 
> C* environments in prod".
>
> I also wonder about if we should care about JMX?  I know many wish to migrate 
> (its going to be a very long time) away from JMX, so do we need a wrapper to 
> make JMX and vtables consistent?
>
> If we can move away from a bespoke vtable or JMX based implementation and 
> instead have a templatized solution each of these is generated from, that to 
> me is the superior option. There's little harm in adding new JMX endpoints 
> (or hell, other metrics framework integration?) as a byproduct of adding new 
> vtable exposed metrics; we have the same maintenance obligation to them as we 
> have to the vtables and if it generates from the same base data, we shouldn't 
> have any further maintenance burden due to its presence right?
>
> we wish to move away from JMX
>
> I do, and you do, and many people do, but I don't believe all people on the 
> project do. The last time this came up in slack the conclusion was "Josh 
> should go draft a CEP to chart out a path to moving off JMX while maintaining 
> backwards-compat w/existing JMX metrics for environments that are using them" 
> (so I'm excited to see this CEP pop up before I got to it! ;)). Moving to a 
> system that gives us a 0-cost way to keep JMX and vtable in sync over time on 
> new metrics seems like a nice compromise for folks that have built out 
> JMX-based maintenance infra on top of C*. Plus removing the boilerplate toil 
> on vtables. win-win.
>
> If we add a column to the end of the JMX row did we just break users?
>
> I *think* this is arguably true for a vtable / CQL-based solution as well 
> from the "you don't know how people are using your API" perspective. Unless 
> we have clear guidelines about discretely selecting the columns you want from 
> a vtable and trust users to follow them, if people have brittle greedy 
> parsers pulling in all data from vtables we could very well break them as 
> well by adding a new column right? Could be wrong here; I haven't written 
> anything that consumes vtable metric data and maybe the obvious idiom in the 
> face of that is robust in the presence of column addition. /shrug
>
> It's certainly more flexible and simpler to write to w/out detonating 
> compared to JMX, but it's still an API we'd be revving.
>
> On Sat, Jan 28, 2023, at 4:24 PM, Ekaterina Dimitrova wrote:
>
> Overall I have similar thoughts and questions as David.
>
> I just wanted to add a reminder about this thread from last summer[1]. We 
> already have issues with the alignment of JMX and Settings Virtual Table. I 
> guess this is how Maxim got inspired to suggest this framework proposal which 
> I want to thank him for! (I noticed he assigned CASSANDRA-15254)
>
> Not to open the Pandora box, but to me the most important thing here is to 
> come into agreement about the future of JMX and what we will do or not as a 
> community. Also, how much time people are able to invest. I guess this will 
> influence any directions to be taken here.
>
> [1]
> https://lists.apache.org/thread/8mjcwdyqoobpvw2262bqmskkhs76pp69
>
>
> On Thu, 26 Jan 2023 at 12:41, David Capwell  wrote:
>
> I took a look and I see the result is an interface that looks like the vtable 
> interface, that is then used by vtables and JMX?  My first thought is why not 
> just use the vtable logic?
>
> I also wonder about if we should care about JMX?  I know many wish to migrate 
> (its going to be a very long time) away from JMX, so do we need a wrapper to 
> make JMX and vtables consistent?  I am cool with something like the following
>
> registerWithJMX(jmxName, query(“SELECT * FROM system_views.streaming”));
>
>
> So if we want to have a JMX view that matches the table then that’s cool by 
> me, but one thing that has been brought up in reviews is backwards 
> compatibility with regard to adding columns… If we add a column to the end of 
> the JMX row did we just break users?
>
> Considering that JMX is usually not used and disabled in production 
> environments for various performance and security reasons, the operator may 
> not see the same picture from various of Dropwizard's metrics exporters
>
> If this is a real problem people are hitting, we can always add the ability 
> to push metrics to common systems with a pluggable way to add non-standard 
> solutions.  Dropwizard already support this so would be low hanging fruit to 
> address this.
>
> To make the proposed changes backwards compatible 

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-13 Thread Maxim Muzafarov
Hello Benjamin,

Can you share the reasons why Apache Calcite is not suitable for this
case and why it was rejected? It has custom syntax support, CBO, so I
am interested to see some technical details in the "Rejected
Alternatives" section, I'm pretty sure they exist, but they weren't
mentioned there, and don't take this as an ad, please :-)

In Apache Ignite, I had experience in improving the query execution
engine and one of the reasons for moving from one query engine to
another (to Calcite, to be precise), was that we had a problem with
calculating memory quotas for queries and aborting a query when those
quotas were exceeded the limit. An engine can load and hold rows in
memory, preventing the GC from collecting them, or objects that are
too large, so the JVM can easily run out of memory, and it is
important to have full control over a query execution path.

btw, here is a Calcite adapter for Cassandra:
https://calcite.apache.org/docs/cassandra_adapter.html

On Wed, 13 Dec 2023 at 09:55, Benedict  wrote:
>
> A CBO can only make worse decisions than the status quo for what I presume 
> are the majority of queries - i.e. those that touch only primary indexes. In 
> general, there are plenty of use cases that prefer determinism. So I agree 
> that there should at least be a CBO implementation that makes the same 
> decisions as the status quo, deterministically.
>
>
> I do support the proposal, but would like to see some elements discussed in 
> more detail. The maintenance and distribution of summary statistics in 
> particular is worthy of its own CEP, and it might be preferable to split it 
> out. The proposal also seems to imply we are aiming for coordinators to all 
> make the same decision for a query, which I think is challenging, and it 
> would be worth fleshing out the design here a little (perhaps just in Jira).
>
>
> While I’m not a fan of ALLOW FILTERING, I’m not convinced that this CEP 
> deprecates it. It is a concrete qualitative guard rail, that I expect some 
> users will prefer to a cost-based guard rail. Perhaps this could be left to 
> the CBO to decide how to treat.
>
>
> There’s also not much discussion of the execution model: I think it would 
> make most sense for this to be independent of any cost and optimiser models 
> (though they might want to operate on them), so that EXPLAIN and hints can 
> work across optimisers (a suitable hint might essentially bypass the 
> optimiser, if the optimiser permits it, by providing a standard execution 
> model)
>
>
> I think it would be worth considering providing the execution plan to the 
> client as part of query preparation, as an opaque payload to supply to 
> coordinators on first contact, as this might simplify the problem of ensuring 
> queries behave the same without adopting a lot of complexity for 
> synchronising statistics (which will never provide strong guarantees). Of 
> course, re-preparing a query might lead to a new plan, though any 
> coordinators with the query in their cache should be able to retrieve it 
> cheaply. If the execution model is efficiently serialised this might have the 
> ancillary benefit of improving the occupancy of our prepared query cache.
>
>
> On 13 Dec 2023, at 00:44, Jon Haddad  wrote:
>
> 
> I think it makes sense to see what the actual overhead is of CBO before 
> making the assumption it'll be so high that we need to have two code paths.  
> I'm happy to provide thorough benchmarking and analysis when it reaches a 
> testing phase.
>
> I'm excited to see where this goes.  I think it sounds very forward looking 
> and opens up a lot of possibilities.
>
> Jon
>
> On Tue, Dec 12, 2023 at 4:25 PM guo Maxwell  wrote:
>>
>> Nothing expresses my thoughts better than +1
>> ,It feels like it means a lot to Cassandra.
>>
>> I have a question. Is it easy to turn off cbo's optimizer or by pass in some 
>> way? Because some simple read and write requests will have better 
>> performance without cbo, which is also the advantage of Cassandra compared 
>> to some rdbms.
>>
>>
>> David Capwell 于2023年12月13日 周三上午3:37写道:
>>>
>>> Overall LGTM.
>>>
>>>
>>> On Dec 12, 2023, at 5:29 AM, Benjamin Lerer  wrote:
>>>
>>> Hi everybody,
>>>
>>> I would like to open the discussion on the introduction of a cost based 
>>> optimizer to allow Cassandra to pick the best execution plan based on the 
>>> data distribution.Therefore, improving the overall query performance.
>>>
>>> This CEP should also lay the groundwork for the future addition of features 
>>> like joins, subqueries, OR/NOT and index ordering.
>>>
>>> The proposal is here: 
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
>>>
>>> Thank you in advance for your feedback.
>>>
>>>


Re: Future direction for the row cache and OHC implementation

2023-12-15 Thread Maxim Muzafarov
Ariel,
thank you for bringing this topic to the ML.

I may be missing something, so correct me if I'm wrong somewhere in
the management of the Cassandra ecosystem.  As I see it, the problem
right now is that if we fork the ohc and put it under its own root,
the use of that row cache is still not well tested (the same as it is
now). I am particularly emphasising the dependency management side, as
any version change/upgrade in Cassandra and, as a result of that
change a new set of libraries in the classpath should be tested
against this integration.

So, unless it is being widely used by someone else outside of the
community (which it doesn't seem to be), from a maintenance and
integration testing perspective I think it would be better to keep the
ohc in-tree, so we will be aware of any issues immediately after the
full CI run.

I'm also +1 for not deprecating it, even if it is used in narrow
cases, while the cost of maintaining its source code remains quite low
and it brings some benefits.

On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg  wrote:
>
> Hi,
>
> To add some additional context.
>
> The row cache is disabled by default and it is already pluggable, but there 
> isn’t a Caffeine implementation present. I think one used to exist and could 
> be resurrected.
>
> I personally also think that people should be able to scratch their own itch 
> row cache wise so removing it entirely just because it isn’t commonly used 
> isn’t the right move unless the feature is very far out of scope for 
> Cassandra.
>
> Auto enabling/disabling the cache is a can of worms that could result in 
> performance and reliability inconsistency as the DB enables/disables the 
> cache based on heuristics when you don’t want it to. It being off by default 
> seems good enough to me.
>
> RE forking, we could create a GitHub org for OHC and then add people to it. 
> There are some examples of dependencies that haven’t been contributed to the 
> project that live outside like CCM and JAMM.
>
> Ariel
>
> On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
>
> I would avoid taking away a feature even if it works in narrow set of 
> use-cases. I would instead suggest -
>
> 1. Leave it disabled by default.
> 2. Detect when Row Cache has a low hit rate and warn the operator to turn it 
> off. Cassandra should ideally detect this and do it automatically.
> 3. Move to Caffeine instead of OHC.
>
> I would suggest having this as the middle ground.
>
> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>
>
>
>
> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
> later release
>
>
>
>
> I'm for deprecating and removing it.
> It constantly trips users up and just causes pain.
>
> Yes it works in some very narrow situations, but those situations often 
> change over time and again just bites the user.  Without the row-cache I 
> believe users would quickly find other, more suitable and lasting, solutions.
>
>


Re: Custom FSError and CommitLog Error Handling

2023-12-17 Thread Maxim Muzafarov
Hello Raymond,

Do you have draft changes to look at?

I'd suggest a more general approach, as some interfaces seem to
overlap each other. There is the FSErrorHandler, and the
JVMStabilityInspector both of which are currently not configurable via
user configuration. I think it would be possible to have a public
interface for which users could configure their own handlers via
configuration:

public interface FailureHandler
{
public boolean onFailure(Component type, FailureHandlerContext context);
}

It seems to me that the JVMStabilityInspector is a good candidate for
the default implementation of the FailureHandler API as it already
handles OOM, CommitLog errors, and disk errors as far as I can see.

On Sat, 16 Dec 2023 at 03:43, Josh McKenzie  wrote:
>
> Adding a poison-pill error option on finding of corrupt data makes sense to 
> me. Not sure if there's enough demand / other customization being done in 
> this space to justify the user customizable aspect; any immediate other 
> approaches come to mind? If not, this isn't an area of the code that's 
> changed all that much, so just adding a new option seems surgical and minimal 
> to me.
>
> On Tue, Dec 12, 2023, at 4:21 AM, Claude Warren, Jr via dev wrote:
>
> I can see this as a strong improvement in Cassandra management and support it.
>
> +1 non binding
>
> On Mon, Dec 11, 2023 at 8:28 PM Raymond Huffman  
> wrote:
>
> Hello All,
>
> On our fork of Cassandra, we've implemented some custom behavior for handling 
> CommitLog and SSTable Corruption errors. Specifically, if a node detects one 
> of those errors, we want the node to stop itself, and if the node is 
> restarted, we want initialization to fail. This is useful in Kubernetes when 
> you expect nodes to be restarted frequently and makes our corruption 
> remediation workflows less error-prone. I think we could make this behavior 
> more pluggable by allowing users to provide custom implementations of the 
> FSErrorHandler, and the error handler that's currently implemented at 
> org.apache.cassandra.db.commitlog.CommitLog#handleCommitError via config in 
> the same way one can provide custom Partitioners and 
> Authenticators/Authorizers.
>
> Would you take as a contribution one of the following?
> 1. user provided implementations of FSErrorHandler and CommitLogErrorHandler, 
> set via config; and/or
> 2. new commit failure and disk failure policies that write a poison pill file 
> to disk and fail on startup if that file exists
>
> The poison pill implementation is what we currently use - we call this a "Non 
> Transient Error" and we want these states to always require manual 
> intervention to resolve, including manual action to clear the error. I'd be 
> happy to contribute this if other users would find it beneficial. I had 
> initially shared this question in Slack, but I'm now sharing it here for 
> broader visibility.
>
> -Raymond Huffman
>
>


Re: [DISCUSSION] Framework for Internal Collection Exposure and Monitoring API Alignment

2023-12-22 Thread Maxim Muzafarov
Hello everyone and happy holidays,

The changes below are ready for review!
Benchmarks are also inside.

Expose all table metrics in virtual tables
https://issues.apache.org/jira/browse/CASSANDRA-14572
https://github.com/apache/cassandra/pull/2958/files

On Tue, 12 Dec 2023 at 22:05, Maxim Muzafarov  wrote:
>
> Hello everyone,
>
>
> I still think Cassandra will benefit from having this idea implemented
> and used through the source code, so I've done another round of
> rethinking this concept and it seems I've found a solution. As a
> result, we can significantly reduce the cost of implementing and
> maintaining both new and existing virtual tables and make our users
> happier by seeing everything they need through virtual tables.
>
> So, I think we should limit the scope of the original proposal to the 
> following:
> ## A framework for exposing any internal data collection to virtual
> tables ONLY. ##
>
> As a proof of concept, I took the CASSANDRA-14572 "Expose all table
> metrics in virtual table" JIRA ticket, which provides a good
> opportunity to demonstrate how to export all metrics to VTs at once
> without having boilerplate implementations. Currently, we already have
> CQLMetricsTable, BatchMetricsTable, etc. that expose metrics to VTs in
> a pretty similar way, and most of the metrics groups are located under
> the org.apache.cassandra.metrics package still lacks their
> representation as VTs either. I've used the MetricRegistry collection
> as a view of registered metrics to export them to VT using the
> prototype accordingly.
>
> The prototype is complete. You can run a node locally and check the
> available virtual tables with cqlsh, or you can check the changes
> using the following link to the PR:
> https://github.com/apache/cassandra/pull/2958/files
>
>
> Below are some key points about the design itself:
>
> 1. All new virtual tables with metrics have "metric" as a prefix so
> that they are fairly easy to find using TAB on the cqlsh command line.
> Metrics are split into virtual tables as they are listed in the
> org.apache.cassandra.metrics e.g. metrics_cql, metrics_tcm etc. In
> addition, they are also grouped by metric type e.g.
> metric_type_histogram, metric_type_counter etc. There is a table
> called "metric_all_metric_groups" with all available metric groups.
>
> 2. To create a new virtual table representation of an internal
> collection a developer needs to do two things: create a virtual table
> row representation, and register it using
> CollectionVirtualTableAdapter, which acts as an adapter between
> internal data and a virtual table. Here's how I did it for the thread
> pools VT, this is a fully backward compatible change:
> https://github.com/apache/cassandra/pull/2958/files#diff-5fda13a633723cdf232bba465e6fb7ab74cdc02f7ba55dae4d1cf494ffb71abaR61
>
> 3. The "metrics_keyspace" virtual table ended up being quite large
> since it contains all the metrics for all available keyspaces on a
> local node, so the default implementation provided by
> AbstractVirtualTable is not suitable for the proposal. The
> AbstractVirtualTable materializes a full data collection on the heap
> using SimpleDataSet, regardless of the portion of data that is being
> queried. In this case, we have to use an iterative approach, as the
> CollectionVirtualTableAdapter does (the problem was discussed in
> CASSANDRA-14629 and is now a part of the solution). This also helps to
> keep the memory footprint low.
>
> 4. Another valuable change is the CassandraMetricsRegistry itself. The
> problem here is that the metrics and their aliases are currently
> exported to JMX, but the implemented virtual tables export the metrics
> in their way and most of the cases don't respect the metric aliases
> which are registered in the MetricsRegistry. This should be fixed as a
> part of the CASSANDRA-14572 to avoid ambiguity for all known metrics
> once and for all.
>
> Here are the links to the issue and the PR:
> https://issues.apache.org/jira/browse/CASSANDRA-14572
> https://github.com/apache/cassandra/pull/2958/files
>
>
> I'm excited about how these changes look right now, so please share
> your feedback and thoughts.
> The PR lacks good test coverage, I'll fix it as soon as we have a
> clear vision of the design (or much sooner) :-)
>
> On Mon, 30 Jan 2023 at 17:43, David Capwell  wrote:
> >
> > I *think* this is arguably true for a vtable / CQL-based solution as well 
> > from the "you don't know how people are using your API" perspective.
> >
> >
> > Very fair point and think that justifies a different thread to talk about 
> > backwa

Re: [DISCUSSION] CEP-38: CQL Management API

2024-01-03 Thread Maxim Muzafarov
Happy New Year to everyone! I'd like to thank everyone for their
questions, because answering them forces us to move towards the right
solution, and I also like the ML discussions for the time they give to
investigate the code :-)

I'm deliberately trying to limit the scope of the initial solution
(e.g. exclude the agent part) to keep the discussion short and clear,
but it's also important to have a glimpse of what we can do next once
we've finished with the topic.

My view of the Command<> is that it is an abstraction in the broader
sense of an operation that can be performed on the local node,
involving one of a few internal components. This means that updating a
property in the settings virtual table via an update statement, or
executing e.g. the setconcurrentcompactors command are just aliases of
the same internal command via different APIs. Another example is the
netstats command, which simply aggregates the MessageService metrics
and returns them in a human-readable format (just another way of
looking at key-value metric pairs). More broadly, the command input is
Map and String as the result (or List).

As Abe mentioned, Command and CommandRegistry should be largely based
on the nodetool command set at the beginning. We have a few options
for how we can initially construct command metadata during the
registry implementation (when moving command metadata from the
nodetool to the core part), so I'm planning to consult with the
command representations of the k8cassandra project in the way of any
further registry adoptions have zero problems (by writing a test
openapi registry exporter and comparing the representation results).

So, the MVP is the following:
- Command
- CommandRegistry
- CQLCommandExporter
- JMXCommandExporter
- the nodetool uses the JMXCommandExporter


= Answers =

> What do you have in mind specifically there? Do you plan on rewriting a brand 
> new implementation which would be partially inspired by our agent? Or would 
> the project integrate our agent code in-tree or as a dependency?

Personally, I like the state of the k8ssandra project as it is now. My
understanding is that the server part of a database always lags behind
the client and sidecar parts in terms of the jdk version and the
features it provides. In contrast, sidecars should always be on top of
the market, so if we want to make an agent part in-tree, this should
be carefully considered for the flexibility which we may lose, as we
will not be able to change the agent part within the sidecar. The only
closest change I can see is that we can remove the interceptor part
once the CQL command interface is available. I suggest we move the
agent part to phase 2 and research it. wdyt?


> How are the results of the commands expressed to the CQL client? Since the 
> command is being treated as CQL, I guess it will be rows, right? If yes, some 
> of the nodetool commands output are a bit hierarchical in nature (e.g. 
> cfstats, netstats etc...). How are these cases handled?

I think the result of the execution should be a simple string (or set
of strings), which by its nature matches the nodetool output. I would
avoid building complex output or output schemas for now to simplify
the initial changes.


> Any changes expected at client/driver side?

I'd like to keep the initial changes to a server part only, to avoid
scope inflation. For the driver part, I have checked the ExecutionInfo
interface provided by the java-driver, which should probably be used
as a command execution status holder. We'd like to have a unique
command execution id for each command that is executed on the node, so
the ExecutionInfo should probably hold such an id. Currently it has
the UUID getTracingId(), which is not well suited for our case and I
think further changes and follow-ups will be required here (including
the binary protocol, I think).


> The term COMMAND is a bit abstract I feel (subjective)... And I also feel the 
> settings part is overlapping with virtual tables.

I think we should keep the term Command as broad as it possible. As
long as we have a single implementation of a command, and the cost of
maintaining that piece of the source code is low, it's even better if
we have a few ways to achieve the same result using different APIs.
Personally, the only thing I would vote for is the separation of
command and metric terms (they shouldn't be mixed up).


> How are the responses of different operations expressed through the Command 
> API? If the Command Registry Adapters depend upon the command metadata for 
> invoking/validating the command, then I think there has to be a way for them 
> to interpret the response format also, right?

I'm not sure, that I've got the question correctly. Are you talking
about the command execution result schema and the validation of that
schema?

For now, I see the interface as follows, the result of the execution
is a type that can be converted to the same string as the nodetool has
for the corresponding comm

Re: [DISCUSSION] CEP-38: CQL Management API

2024-01-08 Thread Maxim Muzafarov
 then need to implement a totally different cql protocol interface just 
>>> for administration, which nobody has committed to building yet.
>>>
>>>
>>> I think this is a solvable problem, and I think the benefits of having a 
>>> single, elegant way of interacting with a cluster and configuring it 
>>> justifies the investment for us as a project. Assuming someone has the 
>>> cycles to, you know, actually do the work. :D
>>>
>>> On Sun, Jan 7, 2024, at 10:41 PM, Jon Haddad wrote:
>>>
>>> I like the idea of the ability to execute certain commands via CQL, but I 
>>> think it only makes sense for the nodetool commands that cause an action to 
>>> take place, such as compact or repair.  We already have virtual tables, I 
>>> don't think we need another layer to run informational queries.  I see 
>>> little value in having the following (I'm using exec here for simplicity):
>>>
>>> cqlsh> exec tpstats
>>>
>>> which returns a string in addition to:
>>>
>>> cqlsh> select * from system_views.thread_pools
>>>
>>> which returns structured data.
>>>
>>> I'd also rather see updatable configuration virtual tables instead of
>>>
>>> cqlsh> exec setcompactionthroughput 128
>>>
>>> Fundamentally, I think it's better for the project if administration is 
>>> fully done over CQL and we have a consistent, single way of doing things.  
>>> I'm not dead set on it, I just think less is more in a lot of situations, 
>>> this being one of them.
>>>
>>> Jon
>>>
>>>
>>> On Wed, Jan 3, 2024 at 2:56 PM Maxim Muzafarov  wrote:
>>>
>>> Happy New Year to everyone! I'd like to thank everyone for their
>>> questions, because answering them forces us to move towards the right
>>> solution, and I also like the ML discussions for the time they give to
>>> investigate the code :-)
>>>
>>> I'm deliberately trying to limit the scope of the initial solution
>>> (e.g. exclude the agent part) to keep the discussion short and clear,
>>> but it's also important to have a glimpse of what we can do next once
>>> we've finished with the topic.
>>>
>>> My view of the Command<> is that it is an abstraction in the broader
>>> sense of an operation that can be performed on the local node,
>>> involving one of a few internal components. This means that updating a
>>> property in the settings virtual table via an update statement, or
>>> executing e.g. the setconcurrentcompactors command are just aliases of
>>> the same internal command via different APIs. Another example is the
>>> netstats command, which simply aggregates the MessageService metrics
>>> and returns them in a human-readable format (just another way of
>>> looking at key-value metric pairs). More broadly, the command input is
>>> Map and String as the result (or List).
>>>
>>> As Abe mentioned, Command and CommandRegistry should be largely based
>>> on the nodetool command set at the beginning. We have a few options
>>> for how we can initially construct command metadata during the
>>> registry implementation (when moving command metadata from the
>>> nodetool to the core part), so I'm planning to consult with the
>>> command representations of the k8cassandra project in the way of any
>>> further registry adoptions have zero problems (by writing a test
>>> openapi registry exporter and comparing the representation results).
>>>
>>> So, the MVP is the following:
>>> - Command
>>> - CommandRegistry
>>> - CQLCommandExporter
>>> - JMXCommandExporter
>>> - the nodetool uses the JMXCommandExporter
>>>
>>>
>>> = Answers =
>>>
>>> > What do you have in mind specifically there? Do you plan on rewriting a 
>>> > brand new implementation which would be partially inspired by our agent? 
>>> > Or would the project integrate our agent code in-tree or as a dependency?
>>>
>>> Personally, I like the state of the k8ssandra project as it is now. My
>>> understanding is that the server part of a database always lags behind
>>> the client and sidecar parts in terms of the jdk version and the
>>> features it provides. In contrast, sidecars should always be on top of
>>> the market, so if we want to make an agent part in-tree, this should
>>> be carefully considere

Re: [DISCUSSION] CEP-38: CQL Management API

2024-01-09 Thread Maxim Muzafarov
Jon,

That sounds good.  Let's make these commands rely on the settings
virtual table and keep the initial changes as minimal as possible.

We've also scheduled a Cassandra Contributor Meeting on January 30th
2024, so I'll prepare some slides with everything we've got so far and
try to prepare some drafts to demonstrate the design.
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Contributor+Meeting

On Tue, 9 Jan 2024 at 00:55, Jon Haddad  wrote:
>
> It's great to see where this is going and thanks for the discussion on the ML.
>
> Personally, I think adding two new ways of accomplishing the same thing is a 
> net negative.  It means we need more documentation and creates 
> inconsistencies across tools and users.  The tradeoffs you've listed are 
> worth considering, but in my opinion adding 2 new ways to accomplish the same 
> thing hurts the project more than it helps.
>
> > - I'd like to see a symmetry between the JMX and CQL APIs, so that users 
> > will have a sense of the commands they are using and are less
> likely to check the documentation;
>
> I've worked with a couple hundred teams and I can only think of a few who use 
> JMX directly.  It's done very rarely.  After 10 years, I still have to look 
> up the JMX syntax to do anything useful, especially if there's any quoting 
> involved.  Power users might know a handful of JMX commands by heart, but I 
> suspect most have a handful of bash scripts they use instead, or have a 
> sidecar.  I also think very few users will migrate their management code from 
> JMX to CQL, nor do I imagine we'll move our own tools until the 
> `disablebinary` problem is solved.
>
> > - It will be easier for us to move the nodetool from the jmx client that is 
> > used under the hood to an implementation based on a java-driver and use the 
> > CQL for the same;
>
> I can't imagine this would make a material difference.  If someone's 
> rewriting a nodetool command, how much time will be spent replacing the JMX 
> call with a CQL one?  Looking up a virtual table isn't going to be what 
> consumes someone's time in this process.  Again, this won't be done without 
> solving `nodetool disablebinary`.
>
> > if we have cassandra-15254 merged, it will cost almost nothing to support 
> > the exec syntax for setting properties;
>
> My concern is more about the weird user experience of having two ways of 
> doing the same thing, less about the technical overhead of adding a second 
> implementation.  I propose we start simple, see if any of the reasons you've 
> listed are actually a real problem, then if they are, address the issue in a 
> follow up.
>
> If I'm wrong, it sounds like it's fairly easy to add `exec` for changing 
> configs.  If I'm right, we'll have two confusing syntaxes forever.  It's a 
> lot easier to add something later than take it away.
>
> How does that sound?
>
> Jon
>
>
>
>
> On Mon, Jan 8, 2024 at 7:55 PM Maxim Muzafarov  wrote:
>>
>> > Some operations will no doubt require a stored procedure syntax, but 
>> > perhaps it would be a good idea to split the work into two:
>>
>> These are exactly the first steps I have in mind:
>>
>> [Ready for review]
>> Allow UPDATE on settings virtual table to change running configurations
>> https://issues.apache.org/jira/browse/CASSANDRA-15254
>>
>> This issue is specifically aimed at changing the configuration
>> properties we are talking about (value is in yaml format):
>> e.g. UPDATE system_views.settings SET compaction_throughput = 128Mb/s;
>>
>> [Ready for review]
>> Expose all table metrics in virtual table
>> https://issues.apache.org/jira/browse/CASSANDRA-14572
>>
>> This is to observe the running configuration and all available metrics:
>> e.g. select * from system_views.thread_pools;
>>
>>
>> I hope both of the issues above will become part of the trunk branch
>> before we move on to the CQL management commands. In this topic, I'd
>> like to discuss the design of the CQL API, and gather feedback, so
>> that I can prepare a draft of changes to look at without any
>> surprises, and that's exactly what this discussion is about.
>>
>>
>> cqlsh> UPDATE system.settings SET compaction_throughput = 128;
>> cqlsh> exec setcompactionthroughput 128
>>
>> I don't mind removing the exec command from the CQL command API which
>> is intended to change settings. Personally, I see the second option as
>> just an alias for the first command, and in fact, they will have the
>> same

Re: Welcome Maxim Muzafarov as Cassandra Committer

2024-01-09 Thread Maxim Muzafarov
Thank you all so much, I'm happy to be part of such an active
community and to be able to contribute to the product that is used all
over the world!

On Tue, 9 Jan 2024 at 12:33, Mike Adamson  wrote:
>
> Congrats Maxim!!
>
> On Tue, 9 Jan 2024, 10:41 Andrés de la Peña,  wrote:
>>
>> Congrats, Maxim!
>>
>> On Tue, 9 Jan 2024 at 03:45, guo Maxwell  wrote:
>>>
>>> Congratulations, Maxim!
>>>
>>> Francisco Guerrero  于2024年1月9日周二 09:00写道:
>>>>
>>>> Congratulations, Maxim! Well deserved!
>>>>
>>>> On 2024/01/08 18:19:04 Josh McKenzie wrote:
>>>> > The Apache Cassandra PMC is pleased to announce that Maxim Muzafarov has 
>>>> > accepted
>>>> > the invitation to become a committer.
>>>> >
>>>> > Thanks for all the hard work and collaboration on the project thus far, 
>>>> > and we're all looking forward to working more with you in the future. 
>>>> > Congratulations and welcome!
>>>> >
>>>> > The Apache Cassandra PMC members
>>>> >
>>>> >


Re: [DISCUSSION] Framework for Internal Collection Exposure and Monitoring API Alignment

2024-02-12 Thread Maxim Muzafarov
Hello everyone,


We still need a few eyes to help push the changes forward.
https://issues.apache.org/jira/browse/CASSANDRA-14572

Here's the post I prepared as a result of working on this issue (it
might help to review it):
https://dzone.com/articles/making-dropwizard-metrics-accessible-via-cql-in-ap

On Fri, 22 Dec 2023 at 12:38, Maxim Muzafarov  wrote:
>
> Hello everyone and happy holidays,
>
> The changes below are ready for review!
> Benchmarks are also inside.
>
> Expose all table metrics in virtual tables
> https://issues.apache.org/jira/browse/CASSANDRA-14572
> https://github.com/apache/cassandra/pull/2958/files
>
> On Tue, 12 Dec 2023 at 22:05, Maxim Muzafarov  wrote:
> >
> > Hello everyone,
> >
> >
> > I still think Cassandra will benefit from having this idea implemented
> > and used through the source code, so I've done another round of
> > rethinking this concept and it seems I've found a solution. As a
> > result, we can significantly reduce the cost of implementing and
> > maintaining both new and existing virtual tables and make our users
> > happier by seeing everything they need through virtual tables.
> >
> > So, I think we should limit the scope of the original proposal to the 
> > following:
> > ## A framework for exposing any internal data collection to virtual
> > tables ONLY. ##
> >
> > As a proof of concept, I took the CASSANDRA-14572 "Expose all table
> > metrics in virtual table" JIRA ticket, which provides a good
> > opportunity to demonstrate how to export all metrics to VTs at once
> > without having boilerplate implementations. Currently, we already have
> > CQLMetricsTable, BatchMetricsTable, etc. that expose metrics to VTs in
> > a pretty similar way, and most of the metrics groups are located under
> > the org.apache.cassandra.metrics package still lacks their
> > representation as VTs either. I've used the MetricRegistry collection
> > as a view of registered metrics to export them to VT using the
> > prototype accordingly.
> >
> > The prototype is complete. You can run a node locally and check the
> > available virtual tables with cqlsh, or you can check the changes
> > using the following link to the PR:
> > https://github.com/apache/cassandra/pull/2958/files
> >
> >
> > Below are some key points about the design itself:
> >
> > 1. All new virtual tables with metrics have "metric" as a prefix so
> > that they are fairly easy to find using TAB on the cqlsh command line.
> > Metrics are split into virtual tables as they are listed in the
> > org.apache.cassandra.metrics e.g. metrics_cql, metrics_tcm etc. In
> > addition, they are also grouped by metric type e.g.
> > metric_type_histogram, metric_type_counter etc. There is a table
> > called "metric_all_metric_groups" with all available metric groups.
> >
> > 2. To create a new virtual table representation of an internal
> > collection a developer needs to do two things: create a virtual table
> > row representation, and register it using
> > CollectionVirtualTableAdapter, which acts as an adapter between
> > internal data and a virtual table. Here's how I did it for the thread
> > pools VT, this is a fully backward compatible change:
> > https://github.com/apache/cassandra/pull/2958/files#diff-5fda13a633723cdf232bba465e6fb7ab74cdc02f7ba55dae4d1cf494ffb71abaR61
> >
> > 3. The "metrics_keyspace" virtual table ended up being quite large
> > since it contains all the metrics for all available keyspaces on a
> > local node, so the default implementation provided by
> > AbstractVirtualTable is not suitable for the proposal. The
> > AbstractVirtualTable materializes a full data collection on the heap
> > using SimpleDataSet, regardless of the portion of data that is being
> > queried. In this case, we have to use an iterative approach, as the
> > CollectionVirtualTableAdapter does (the problem was discussed in
> > CASSANDRA-14629 and is now a part of the solution). This also helps to
> > keep the memory footprint low.
> >
> > 4. Another valuable change is the CassandraMetricsRegistry itself. The
> > problem here is that the metrics and their aliases are currently
> > exported to JMX, but the implemented virtual tables export the metrics
> > in their way and most of the cases don't respect the metric aliases
> > which are registered in the MetricsRegistry. This should be fixed as a
> > part of the CASSANDRA-14572 to avoid ambiguity for all known metrics
> > once and for all.
> >
> > Here are the links to th

Re: Welcome Brad Schoening as Cassandra Committer

2024-02-22 Thread Maxim Muzafarov
Congratulations!

On Thu, 22 Feb 2024 at 10:23, Berenguer Blasi  wrote:
>
> Congrats!
>
> On 22/2/24 9:57, Jacek Lewandowski wrote:
>
> Congrats Brad!
>
>
> - - -- --- -  -
> Jacek Lewandowski
>
>
> czw., 22 lut 2024 o 01:29 Štefan Miklošovič  
> napisał(a):
>>
>> Congrats Brad, great work in the Python department :)
>>
>> On Wed, Feb 21, 2024 at 9:46 PM Josh McKenzie  wrote:
>>>
>>> The Apache Cassandra PMC is pleased to announce that Brad Schoening has 
>>> accepted
>>> the invitation to become a committer.
>>>
>>> Your work on the integrated python driver, launch script environment, and 
>>> tests
>>> has been a big help to many. Congratulations and welcome!
>>>
>>> The Apache Cassandra PMC members


[DISCUSSION] Replace the Config class instance with the tree-based framework

2024-03-13 Thread Maxim Muzafarov
Hello everyone,

During the implementation, many valid design comments were made about
making the virtual table SettingTable [1] updatable. So, I've
rethought the whole concept once again, and I want to take another
step forward to make this problem feasible with less effort on our
part.

I want to replace the way we use and store node configuration values
internally, which means I want to replace the Config class instance,
where we store values with a tree-based framework.
>> I propose to use the Lightbend API to do this. <<

The changes themselves are quite limited, they don't require rewriting
the whole codebase. All the DatabaseDescriptor methods will be
retained, and the only thing that would change is the way we store the
values (in the in-memory tree, not in the Config class instance
itself). So I don't expect that it will be a huge change.


All the design details are in the document below, including the
framework comparison, the API, and the way how we will manage the
configuration schema.

Please take a look, I want to move things forward as every important
change pulls on a bigger problem that has been neglected for years :-)
Let's agree on the framework/API we want to use so that I can invest
more time in the implementation.

https://docs.google.com/document/d/11W1Qj-6d9ZqHv86iEKgFutcxY2DTMIofEbr-zQiw930/edit#heading=h.2051pbez4rce

Looking forward to your comments.

[1] https://issues.apache.org/jira/browse/CASSANDRA-15254


Re: [VOTE] Release Apache Cassandra 4.0.13

2024-05-18 Thread Maxim Muzafarov
+1 (nb)

Build from the release branch, and run locally a few tests.

On Fri, 17 May 2024 at 23:06, Mick Semb Wever  wrote:
>>
>> > The vote will be open for 72 hours (longer if needed). Everyone who
>> > has tested the build is invited to vote. Votes by PMC members are
>> > considered binding. A vote passes if there are at least three binding
>> > +1s and no -1's.
>
>
>
> +1
>
> Checked
> - signing correct
> - checksums are correct
> - source artefact builds (JDK 8+11)
> - binary artefact runs (JDK 8+11)
> - debian package runs (JDK 8+11)
> - debian repo runs (JDK 8+11)
> - redhat* package runs (JDK 8+11)
> - redhat* repo runs (JDK 8+11)
>
>


Re: [DISCUSS] Adding experimental vtables and rules around them

2024-05-29 Thread Maxim Muzafarov
Hello everyone,

I like the idea of highlighting some of the experimental virtual
tables whose model might be changed in future releases.

As another option, we could add an @Experimetal annotation (or another
name) and a configuration parameter
experimental_virtula_tables_enabled (default is false). This, in turn,
means if a virtual table is experimental, it won't be registered in a
virtual keyspace unless the corresponding configuration parameter is
enabled. This also means that a user must implicitly enable an
experimental API, and prevent us from spamming the log with warnings.
All of this does not preclude us from specifying the experimental
state of some virtual tables in the documentation.

On Wed, 29 May 2024 at 21:18, Abe Ratnofsky  wrote:
>
> I agree that ClientWarning is the best way to indicate the risk of using an 
> experimental feature directly to the user. Presenting information in the 
> client application's logs directly means that the person who wrote the query 
> is most likely to see the warning, rather than an operator who sees cluster 
> logs.
>
> I don't think it's necessary to attach a ClientWarning to every single client 
> response; a ClientWarning analog to NoSpamLogger would be useful for this 
> ("warn a client at most once per day").
>
> This would also be useful for warning on usage of deprecated features.
>
> > On May 29, 2024, at 3:01 PM, David Capwell  wrote:
> >
> > We agreed a long time ago that all new features are disabled by default, 
> > but I wanted to try to flesh out what we “should” do with something that 
> > might still be experimental and subject to breaking changes; I would prefer 
> > we keep this thread specific to vtables as the UX is different for 
> > different types of things…
> >
> > So, lets say we are adding a set of vtables but we are not 100% sure what 
> > the schema should be and we learn after the release that changes should be 
> > made, but that would end up breaking the table… we currently define 
> > everything as “don’t break this” so if we publish a table that isn’t 100% 
> > baked we are kinda stuck with it for a very very long time… I would like to 
> > define a way to expose vtables that are subject to change (breaking schema 
> > changes) across different release and rules around them (only in minor?  
> > Maybe even in patch?).
> >
> > Lets try to use a concrete example so everyone is on the same page.
> >
> > Accord is disabled by default (it is a new feature), so the vtables to 
> > expose internals would be expected to be undefined and not present on the 
> > instance.
> >
> > When accord is enabled (accord.enabled = true) we add a set of vtables:
> >
> > Epochs - shows what epochs are known to accord
> > Cache - shows how the internal caches are performing
> > Etc.
> >
> > Using epochs as an example it currently only shows a single column: the 
> > long epoch
> >
> > CREATE VIRTUAL TABLE system_accord.epochs (epoch bigint PRIMARY KEY);
> >
> > Lets say we find that this table isn’t enough and we really need to scope 
> > it to each of the “stores” (threads for processing accord tasks)
> >
> > CREATE VIRTUAL TABLE system_accord.epochs (epoch bigint, store_id int, 
> > PRIMARY KEY (epoch, store_id));
> >
> > In this example the table changed the schema in a way that could break 
> > users, so this normally is not allowed.
> >
> > Since we don’t really have a way to define something experimental other 
> > than NEWS.txt, we kinda get stuck with this table and are forced to make 
> > new versions and maintain them for a long time (in this example we would 
> > have epochs and epochs_v2)… it would be nice if we could define a way to 
> > express that tables are free to be changed (modified or even deleted) and 
> > the life cycle for them….
> >
> > I propose that we allow such a case and make changes to the UX (as best as 
> > we can) to warn about this:
> >
> > 1) update NEWS.txt to denote that the feature is experimental
> > 2) when you access an experimental table you get a ClientWarning stating 
> > that this is free to change
> > 3) the tables comments starts with “[EXPERIMENTAL]”
> >
> > What do others think?
> >
> >
>


Re: Cassandra PMC Chair Rotation, 2024 Edition

2024-06-20 Thread Maxim Muzafarov
Congratulations Dinesh!

On Fri, 21 Jun 2024 at 05:12, Abhijeet Dubey  wrote:
>
> Thank you Josh for the amazing work.
>
> Congrats, Dinesh. Welcome to the new role :)
>
> Regards,
> Abhijeet
>
> On Fri, Jun 21, 2024 at 4:09 AM Dinesh Joshi  wrote:
>>
>> Thank you everybody. I hope to do my best in this role. A big thanks to Josh 
>> who has been a great PMC Chair!
>>
>> On Thu, Jun 20, 2024 at 11:40 AM Yifan Cai  wrote:
>>>
>>> Thank you for the service, Josh!
>>> Congrats, Dinesh!
>>>
>>> On Thu, Jun 20, 2024 at 11:32 AM Jean-Armel Luce  wrote:

 Josh, thanks for the job
 Dinesh, congrats!!

 Le jeu. 20 juin 2024 à 19:42, David Capwell  a écrit :
>
> Congrats!
>
> On Jun 20, 2024, at 9:10 AM, Melissa Logan  wrote:
>
> Josh, thank you for your time as chair + congrats Dinesh!
>
> On Thu, Jun 20, 2024 at 9:08 AM Abe Ratnofsky  wrote:
>>
>> Congrats Dinesh! Thank you Josh!
>>
>> On Jun 20, 2024, at 11:53 AM, Jeremiah Jordan 
>>  wrote:
>>
>> Welcome to the Chair role Dinesh!  Congrats!
>>
>> On Jun 20, 2024 at 10:50:37 AM, Josh McKenzie  
>> wrote:
>>>
>>> Another PMC Chair baton pass incoming! On behalf of the Apache 
>>> Cassandra Project Management Committee (PMC) I would like to welcome 
>>> and congratulate our next PMC Chair Dinesh Joshi (djoshi).
>>>
>>> Dinesh has been a member of the PMC for a few years now and many of you 
>>> likely know him from his thoughtful, measured presence on many of our 
>>> collective discussions as we've grown and evolved over the past few 
>>> years.
>>>
>>> I appreciate the project trusting me as liaison with the board over the 
>>> past year and look forward to supporting Dinesh in the role in the 
>>> future.
>>>
>>> Repeating Mick (repeating Paulo's) words from last year: The chair is 
>>> an administrative position that interfaces with the Apache Software 
>>> Foundation Board, by submitting regular reports about project status 
>>> and health. Read more about the PMC chair role on Apache projects:
>>> - https://www.apache.org/foundation/how-it-works.html#pmc
>>> - https://www.apache.org/foundation/how-it-works.html#pmc-chair
>>> - https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
>>>
>>> The PMC as a whole is the entity that oversees and leads the project 
>>> and any PMC member can be approached as a representative of the 
>>> committee. A list of Apache Cassandra PMC members can be found on: 
>>> https://cassandra.apache.org/_/community.html
>>
>>
>
>
>
> --
> Abhijeet Dubey
> Software Engineer @ Apple Inc.
> IIT Bombay Computer Science & Engineering Class of 2019
> Apple Inc. | IIT Bombay


[DISCUSS] Feature branch to update a nodetool obsolete dependency (airline)

2024-06-28 Thread Maxim Muzafarov
Hello everyone,


The nodetool relies on the airlift/airline library to mark up the CLI
commands used to manage Cassandra, which are part of our public API.
This library is no longer maintained, so we need to update it anyway,
and the good news is that we already have several good alternatives:
airline-2 [3] or picocli [2].

In this message, I'm mainly talking about CASSANDRA-17445 [4], which
refers to the problem and is a prerequisite for a larger CEP-38 CQL
Management API [5]. It doesn't make sense to use annotations from the
deprecated library to build a new API, so this is another reason to
update the library as soon as possible and do some inherently small
code refactoring required for the CEP-38.

In addition to being widely used and well supported, the Picocli
library offers the following advantages for us:
- We can detach the jmx-specific parameters from the commands so that
they can be reused in other APIs (e.g. without host, port) while
remaining backwards compatible;
- We can set up nodetool's autocompletion after the migration with
minimal effort;
- There is a good Picocli ecosystem of tools that we can use to
simplify our codebase, e.g. generate man pages tool to make our CLIs
more Unix friendly [7];


= Prototype =

I have a working prototype [8] that shows what the result will look
like. The prototype includes:
- Tests between the execution of commands via the nodetool and nodtoolv2;
- 5 out of 164 nodetool commands have been moved so far, to show the
refactoring we need to do to the command's body;
- The command help output under for the nodetoolv2 is the same as it
is currently for the nodetool and this is the default, however a
"cassandra.cli.picocli.layout" is added to switch to the Picocli
defaults;
- You can also see that the colour scheme is applied by the Picocli
out of the box, and this is how it looks [9];
- The nodetoolv2 is called first when the shell is triggered, and if
the nodetoolv2 doesn't contain the command it needs yet, it falls back
to the nodetool and the old argument parser;


Since the number of commands is quite large (164), I'd like to create
a feature branch and move all the commands one at a time, while
keeping the output backwards by applying additional tests at the same
time and checking that the CI is always green. I think the "feature
branch" approach will be less stressful for us since it focuses on
requiring a review of only tedious changes to the feature branch,
rather than reviewing the 15k line patch.


Anyway, I am open to any suggestions and advice based on your
experience and best practices for this case. Looking forward to your
thoughts and suggestions.



[1] https://github.com/airlift/airline
[2] https://picocli.info/
[3] https://github.com/rvesse/airline
[4] https://issues.apache.org/jira/browse/CASSANDRA-17445
[5] 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-38%3A+CQL+Management+API
[6] 
https://github.com/apache/cassandra/pull/2497/files#diff-acdd5f29d28df5c02f4bfc933528f084508b4923112e312e68a4aff7df973bce
[7] https://picocli.info/man/gen-manpage.html
[8] https://github.com/apache/cassandra/pull/2497/files
[9] 
https://github.com/apache/cassandra/assets/3415046/57b14ae0-ff59-43d2-b542-10d3218ae075


Re: [DISCUSS] Feature branch to update a nodetool obsolete dependency (airline)

2024-07-03 Thread Maxim Muzafarov
Thank you all for your comments,

I want to stress, that these changes won't affect the input/output
formatting of commands, ensuring everything is the same.

We are changing the command markup library, so there are two extra
things to be checked:
- We parse CLI arguments in the same way (as the parser is different
in a new library);
- The command help output is the same so that the user won't see any difference;

Additional tests cover both cases.

On Mon, 1 Jul 2024 at 20:08, Dinesh Joshi  wrote:
>
> I don't personally think there is a strong need for a feature branch. If it 
> makes it easy for you, please go ahead with a feature branch.
>
> One thing I had raised in the past was the desire to have a flag that would 
> generate machine readable output for nodetool commands. If this can be done 
> with a minor incremental effort, it would definitely reduce the burden on 
> operators / integrations that rely on the nodetool output. As I have earlier 
> indicated in the past, relying on human readable output for CLI tools like 
> nodetool is fragile and providing a JSON output as an alternative is a great 
> first step in eliminating that dependency. I'm just curious about the level 
> of effort. If it is too much or too invasive, we can consider producing JSON 
> output for inclusion in the next major release.
>
> On Fri, Jun 28, 2024 at 6:47 AM Maxim Muzafarov  wrote:
>>
>> Hello everyone,
>>
>>
>> The nodetool relies on the airlift/airline library to mark up the CLI
>> commands used to manage Cassandra, which are part of our public API.
>> This library is no longer maintained, so we need to update it anyway,
>> and the good news is that we already have several good alternatives:
>> airline-2 [3] or picocli [2].
>>
>> In this message, I'm mainly talking about CASSANDRA-17445 [4], which
>> refers to the problem and is a prerequisite for a larger CEP-38 CQL
>> Management API [5]. It doesn't make sense to use annotations from the
>> deprecated library to build a new API, so this is another reason to
>> update the library as soon as possible and do some inherently small
>> code refactoring required for the CEP-38.
>>
>> In addition to being widely used and well supported, the Picocli
>> library offers the following advantages for us:
>> - We can detach the jmx-specific parameters from the commands so that
>> they can be reused in other APIs (e.g. without host, port) while
>> remaining backwards compatible;
>> - We can set up nodetool's autocompletion after the migration with
>> minimal effort;
>> - There is a good Picocli ecosystem of tools that we can use to
>> simplify our codebase, e.g. generate man pages tool to make our CLIs
>> more Unix friendly [7];
>>
>>
>> = Prototype =
>>
>> I have a working prototype [8] that shows what the result will look
>> like. The prototype includes:
>> - Tests between the execution of commands via the nodetool and nodtoolv2;
>> - 5 out of 164 nodetool commands have been moved so far, to show the
>> refactoring we need to do to the command's body;
>> - The command help output under for the nodetoolv2 is the same as it
>> is currently for the nodetool and this is the default, however a
>> "cassandra.cli.picocli.layout" is added to switch to the Picocli
>> defaults;
>> - You can also see that the colour scheme is applied by the Picocli
>> out of the box, and this is how it looks [9];
>> - The nodetoolv2 is called first when the shell is triggered, and if
>> the nodetoolv2 doesn't contain the command it needs yet, it falls back
>> to the nodetool and the old argument parser;
>>
>>
>> Since the number of commands is quite large (164), I'd like to create
>> a feature branch and move all the commands one at a time, while
>> keeping the output backwards by applying additional tests at the same
>> time and checking that the CI is always green. I think the "feature
>> branch" approach will be less stressful for us since it focuses on
>> requiring a review of only tedious changes to the feature branch,
>> rather than reviewing the 15k line patch.
>>
>>
>> Anyway, I am open to any suggestions and advice based on your
>> experience and best practices for this case. Looking forward to your
>> thoughts and suggestions.
>>
>>
>>
>> [1] https://github.com/airlift/airline
>> [2] https://picocli.info/
>> [3] https://github.com/rvesse/airline
>> [4] https://issues.apache.org/jira/browse/CASSANDRA-17445
>> [5] 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-38%3A+CQL+Management+API
>> [6] 
>> https://github.com/apache/cassandra/pull/2497/files#diff-acdd5f29d28df5c02f4bfc933528f084508b4923112e312e68a4aff7df973bce
>> [7] https://picocli.info/man/gen-manpage.html
>> [8] https://github.com/apache/cassandra/pull/2497/files
>> [9] 
>> https://github.com/apache/cassandra/assets/3415046/57b14ae0-ff59-43d2-b542-10d3218ae075


Re: [DISCUSS] Feature branch to update a nodetool obsolete dependency (airline)

2024-07-05 Thread Maxim Muzafarov
> Once you are happy with your chosen library, we need a DISCUSS thread to add 
> this new library (current protocol).

Thanks, David. This is a good point, do we need a separate DISCUSS
thread or can we just use this one? I'm in favour of keeping the
discussion in one place, especially when topics are closely related. I
don't think that a separate thread would add extra visibility, but if
that is the way the community has adopted - no problem at all, I'll
repost.


The reasons for replacing the Airlift/Airline [1] with the PicoCli [2]
are as follows (in order of priority):

1. The library is under the Apache-2.0 License
https://github.com/remkop/picocli?tab=Apache-2.0-1-ov-file#readme

2. The project is active and well-maintained (last release on 8 May 2024)
https://github.com/remkop/picocli/releases

3. The library has ZERO dependencies, in some of the cases a single
file can just be dropped into the sources (it's even pointed out in
the documentation)
https://picocli.info/#_add_as_source

4. Compared to the Airlift library, the PicoCLI uses the same markup
design concepts, so we don't have to rewrite our command or make
complex changes, which in turn minimizes the migration.


Is adding the PicoCLI library as a project dependency getting any
objections from the Community? Please, share your thoughts.

There are a few other alternatives (commons-cli, airline2, jcommander)
but they are not as well known and/or not as elegantly suited to our
needs based on what we have now.


[1] https://github.com/airlift/airline
[2] https://github.com/remkop/picocli


On Wed, 3 Jul 2024 at 22:27, David Capwell  wrote:
>
> I don't personally think there is a strong need for a feature branch. If it 
> makes it easy for you, please go ahead with a feature branch.
>
>
> Agree, I don’t see the reason for a feature branch… feature branch just means 
> the branch lives in apache domain rather than your own fork.  You won’t be 
> able to merge until you are done and you will need to keep rebasing over and 
> over again. Even if multiple people are working on this you can work in your 
> fork just fine (assuming you grant permissions).
>
> Another issue is that feature branches require the same level of commit 
> process as every other main branch, where as personal branches don’t.  This 
> actually will slow you down as each commit now must be a JIRA, you go through 
> review of each, must show a success CI, etc.
>
> Now, if you wish to split this into multiple steps that is fine, but the list 
> of places is basically node tool (kinda has to go in at once) and small CLIs. 
>  If you wish to migrate the small ones in isolation first, I am cool with 
> that merging to w/e branch the logic is targeting, but you won’t be able to 
> break up node tool without breaking everything… but if you did this in your 
> own fork then no one cares.
>
> I won’t block a feature branch, but just don’t see a clear “why” and only see 
> cons.
>
> We are changing the command markup library, so there are two extra
> things to be checked:
> - We parse CLI arguments in the same way (as the parser is different
> in a new library);
> - The command help output is the same so that the user won't see any 
> difference;
>
>
> Personally I would POC a limited node tool change with JVM dtest as we 
> require passing the output to the test (the prototypes you listed doesn’t 
> include JVM Dtest integration).  If one library makes this more annoying, 
> then do we care about fancy new features we don’t use when it makes the 
> features we do use harder?  If you start with the smaller tools first then 
> spend a ton of time migrating node tool then find JVM dtest is broken, then 
> you will spend so much more time fixing this, I would strongly recommend 
> doing some throw away POC to make sure w/e way you go won’t break JVM Dtest’s 
> node tool support.
>
> Once you are fine with your selected library, we will need a DISCUSS thread 
> to add that new library (current protocol).  This mostly just makes the pick 
> more visible, and normally we only check simple things like “are we legally 
> allowed to use” and “is this project dead?”.
>
>
> On Jul 3, 2024, at 6:06 AM, Maxim Muzafarov  wrote:
>
> Thank you all for your comments,
>
> I want to stress, that these changes won't affect the input/output
> formatting of commands, ensuring everything is the same.
>
> We are changing the command markup library, so there are two extra
> things to be checked:
> - We parse CLI arguments in the same way (as the parser is different
> in a new library);
> - The command help output is the same so that the user won't see any 
> difference;
>
> Additional tests cover both cases.
>
> On Mon, 1 Jul 2024 at 20:08, Dinesh Joshi  w

[DISCUSS] Replace airlift/airline library with Picocli

2024-07-15 Thread Maxim Muzafarov
Hello everyone,


I want to continue the discussion that was originally started here
[2], however, it's better to move it to a new thread with an
appropriate title, so that everyone is aware of the replacement
library we're trying to agree on.

The question is:
Does everyone agree with using Picocli as an airlift/airline
replacement for our cli tools?
The prototype to look at is here [1].


The reasons are as follows:

Why to replace?

There are several cli tools that rely on the airlift/airline library
to mark up the commands: NodeTool, JMXTool, FullQueryLogTool,
CompactionStress (with the size of the NodeTool dominating the rest of
the tools). The airline is no longer maintained, so we will have to
update it sooner or later anyway.


What criteria?

Before we dive into the pros and cons of each candidate, I think we
have to formulate criteria for the libraries we are considering, based
on what we already have in the source code (from Cassandra's
perspective). This in itself limits the libraries we can consider.

Criteria can be as follows:
- Library licensing, including risks that it may change in the future
(the asf libs are the safest for us from this perspective);
- Similarity of library design (to the airline). This means that the
closer the libraries are, the easier it is to migrate to them, and the
easier it is to get guarantees that we haven't broken anything. The
further away the libraries are, the more extra code and testing we
need;
- Backward compatibility. The ideal case is where the user doesn't
even notice that a different library is being used under the hood.
This includes both the help output and command output.

Of course, all libraries need to be known and well-maintained.

What candidates?


Picocli
https://picocli.info/

This is the well-known cli library under the Apache 2.0 license, which
is similar to what we have in source code right now. This also means
that the amount of changes (despite the number of the commands)
required to migrate what we have is quite small.
In particular, I would like to point out that:
- It allows us to unbind the jmx-specific command options from the
commands themselves, so that they can be reused in other APIs (my
goal);
- We can customize the help output so that the user doesn't notice
anything while using of the nodetool;
- The cli parser is the same as what we now do with cli arguments.

This makes the library a good candidate, but leaves open the question
of changing the license of the lib in the future. However, these risks
are relatively small because the CLI library is not a monetizable
thing, as I believe. We can also mitigate the risks copying the lib to
sources, as it mentioned here:
https://picocli.info/#_getting_started


commons-cli
https://commons.apache.org/proper/commons-cli/

In terms of licenses, it is the easiest candidate for us to use as
it's under the asf, and in fact the library is already used in e.g.
BulkLoader, SSTableExpoert.
However, I'd like to point out the following disadvantages the library
has for our case:
- This is not a drop-in replacement for the airline commands, as the
lib does not have annotation for markup commands. We have to flesh out
all the options we have as java classes, or create our owns;
- Subcommands have to be supported manually, which requires extra
effort to adopt the cli parser (correct me if I'm wrong here). We have
at least several subcommands in the NodeTool e.g. cms describe, cms
snapshot;
- Apart from parsing the cli arguments, we need to manually initialize
the command class and set the input arguments we have.


JComannder
https://jcommander.org/

The library is licensed under the Apache 2.0 license, so the situation
is the same as for Picocli. Here I'd like to point out a few things I
encountered while prototyping:
- Unbinding the jmx-specific options from commands is quite tricky and
requires touching an internal API (which I won't do). Option
inheritance is not the way to go if we want to have a clear command
hierarchy regardless of the API used.
- We won't be able to inject a Logger (the Output class in terms of
NodeTool) or other abstractions (e.g. MBeans) directly into the
commands, because it doesn't support dependency injection. This is
also part of the activity of reusing the commands in other APIs, for
instance to execute them via CQL;

More basic in comparison to the Picocli, focusing primarily on simple
annotation-based parsing and subcommands, and won't allow us to reuse
the commands outside of the cli.


airline2
https://github.com/rvesse/airline

The library is licensed under the Apache 2.0 license, and this is an
attempt to rebuild the original airline library. Currently, this is
not a drop-in replacement, as it has some minor API differences from
the original library. It is also not a better choice for us, as it has
the same drawbacks I mentioned for the previous alternatives, e.g. not
possible to unbind the specific options from the command and use them
only when commands ar

[REVIEW REQUEST] Exposing the status of a cleanup command on a virtual table

2024-07-15 Thread Maxim Muzafarov
Hello everyone,

I would like to gently ask for help in reviewing the following issue
that we've been facing for a while:
https://issues.apache.org/jira/browse/CASSANDRA-19760

When a cleanup command is called, the compaction process under the
hood is triggered accordingly. However, if there is nothing to compact
or the cleanup command returns with a status other than SUCCESSFUL,
there is no way to get the execution results of the command that was
run. This is especially true when using any kind of
automation/scripting on top of JMX or as a nodetool wrapper.

I propose to keep these history results in memory for some time and
expose them via a virtual table so that a user can query it to check
the status.

Any suggestions are welcome. I believe other commands like verify,
scrub, etc. can be exposed in the same way.


Re: [DISCUSS] Replace airlift/airline library with Picocli

2024-07-16 Thread Maxim Muzafarov
nk I’m not fully sold on the need to do anything at all here. The 
>>> library may no longer be maintained, but so what if it isn’t, really?
>>>
>>> Parsing command line arguments is a pretty well defined problem, it’s not 
>>> the kind of code that rots and needs to be updated to stay operational. If 
>>> it works now it will keep working.
>>>
>>> Why would we have to update it sooner or later?
>>>
>>> I might be missing something, of course, but what are our pain points with 
>>> airlift/airline in its current state?
>>>
>>> —
>>> AY
>>>
>>> > On 16 Jul 2024, at 02:07, Remko Popma  wrote:
>>> >
>>> > Hi Maxim, thank you for letting me know of this discussion.
>>> >
>>> > Hello everyone,
>>> >
>>> > I developed and maintain picocli; let me try to address the concerns 
>>> > raised below.
>>> >
>>> > For background, I am on the PMC for Apache Logging Services (mostly 
>>> > involved with Log4j), and on the PMC for Apache Groovy.
>>> > My involvement in these projects is why I chose the Apache 2.0 license. 
>>> > Apache is close to my heart and I have no intention to switch to another 
>>> > license.
>>> >
>>> > The picocli documentation mentions it is possible to incorporate picocli 
>>> > in one’s project by copying a single source file. This is not meant as a 
>>> > recommendation (I should probably clarify this in the docs). Some 
>>> > people/projects have resistance to using an external dependency for 
>>> > command line parsing and I thought this would alleviate that concern and 
>>> > make it easier for picocli to gain more adoption.
>>> > If you were to select picocli for Cassandra, I would recommend adding it 
>>> > as an external dependency via Maven or Gradle.
>>> >
>>> > I hope this is useful.
>>> >
>>> > Warmly,
>>> > Remko Popma
>>> >
>>> >
>>> >
>>> > On 2024/07/15 18:53:47 Maxim Muzafarov wrote:
>>> >> Hello everyone,
>>> >>
>>> >>
>>> >> I want to continue the discussion that was originally started here
>>> >> [2], however, it's better to move it to a new thread with an
>>> >> appropriate title, so that everyone is aware of the replacement
>>> >> library we're trying to agree on.
>>> >>
>>> >> The question is:
>>> >> Does everyone agree with using Picocli as an airlift/airline
>>> >> replacement for our cli tools?
>>> >> The prototype to look at is here [1].
>>> >>
>>> >>
>>> >> The reasons are as follows:
>>> >>
>>> >> Why to replace?
>>> >>
>>> >> There are several cli tools that rely on the airlift/airline library
>>> >> to mark up the commands: NodeTool, JMXTool, FullQueryLogTool,
>>> >> CompactionStress (with the size of the NodeTool dominating the rest of
>>> >> the tools). The airline is no longer maintained, so we will have to
>>> >> update it sooner or later anyway.
>>> >>
>>> >>
>>> >> What criteria?
>>> >>
>>> >> Before we dive into the pros and cons of each candidate, I think we
>>> >> have to formulate criteria for the libraries we are considering, based
>>> >> on what we already have in the source code (from Cassandra's
>>> >> perspective). This in itself limits the libraries we can consider.
>>> >>
>>> >> Criteria can be as follows:
>>> >> - Library licensing, including risks that it may change in the future
>>> >> (the asf libs are the safest for us from this perspective);
>>> >> - Similarity of library design (to the airline). This means that the
>>> >> closer the libraries are, the easier it is to migrate to them, and the
>>> >> easier it is to get guarantees that we haven't broken anything. The
>>> >> further away the libraries are, the more extra code and testing we
>>> >> need;
>>> >> - Backward compatibility. The ideal case is where the user doesn't
>>> >> even notice that a different library is being used under the hood.
>>> >> This includes both the help output and command output.
>>> >>
>>> 

Re: Welcome Joey Lynch as Cassandra PMC member

2024-07-26 Thread Maxim Muzafarov
My congratulations Joseph Lynch!

On Thu, 25 Jul 2024 at 18:15, Paulo Motta  wrote:
>
> Congratulations Joey!
>
> On Thu, 25 Jul 2024 at 00:55 Venkata Hari Krishna Nukala 
>  wrote:
>>
>> Congratulations Joey!!
>>
>> On Thu, 25 Jul 2024 at 7:20 AM, Joseph Lynch  wrote:
>>>
>>> Thank you all for the warm wishes and I greatly appreciate this opportunity!
>>>
>>> This is such a great community and I am proud to be part of it.
>>>
>>> Cheers!
>>> -Joey
>>>
>>> On Wed, Jul 24, 2024 at 10:12 AM Benjamin Lerer  wrote:

 The PMC's members are pleased to announce that Joey Lynch has accepted the 
 invitation to become a PMC member.

 Thanks a lot, Joey, for everything you have done for the project all these 
 years.

 Congratulations and welcome

 The Apache Cassandra PMC members


Re: Welcome Jordan West and Stefan Miklosovic as Cassandra PMC members!

2024-09-01 Thread Maxim Muzafarov
Сongrats Jordan and Stefan.
Great work!

On Sun, 1 Sept 2024 at 12:46, guo Maxwell  wrote:
>
> Congrats Stefan and Jordan!!!
>
> Jacek Lewandowski 于2024年9月1日 周日下午4:39写道:
>>
>> Congrats Stefan and Jordan!!! This is great!
>>
>>
>> sob., 31 sie 2024, 22:21 użytkownik Jordan West  napisał:
>>>
>>> Thanks all!!!
>>>
>>> On Sat, Aug 31, 2024 at 07:55 J. D. Jordan  
>>> wrote:

 Two great additions to the PMC. Congratulations to you both!

 -Jeremiah Jordan

 > On Aug 30, 2024, at 3:21 PM, Jon Haddad  wrote:
 >
 > 
 > The PMC's members are pleased to announce that Jordan West and Stefan 
 > Miklosovic have accepted invitations to become PMC members.
 >
 > Thanks a lot, Jordan and Stefan, for everything you have done for the 
 > project all these years.
 >
 > Congratulations and welcome!!
 >
 > The Apache Cassandra PMC


[DISCUSSION] Cassandra's code style and source code analysis

2022-11-24 Thread Maxim Muzafarov
Hello everyone,


First of all, thank you all for this awesome project which I have
often been inspired by. My name is Maxim Muzafarov I'm a Committer and
PMC of Apache Ignite hence you most likely don't know me as I come
from another part of the ASF. Perhaps, I did almost the same things
with B-Trees, off-heap memory management, rebalancing, checkpointing,
snapshotting, and IEPs (you are calling it CEPs) but on a slightly
different distributed database architecture.

Nevertheless,

I was chasing down for my first issue to get experience with Cassandra
and found a bunch of opened JIRAs related to the source code analysis
(static analysis as well as the code style). These issues still appear
in JIRA from time to time [1][2][3][4]. It seems to me there not
enough attention has been paid to this topic and all possible options
for this analysis and code style haven't been widely discussed before.
I'd like to summarize everything that I have found and offer my skills
and my experience for solving some of such issues we'll agree on.


= Motivation =

The goal is to make small contributions easier and safer to apply with
GitHub PRs for both a contributor and a committer by adding automation
code checks for each new Cassandra contribution. This also will help
to reduce the time required for reviewing and applying such PRs by an
experienced developer.

As you may know, the infrastructure team has disabled public sign-up
to ASF JIRA (the GitHub issues are recommended instead). Thus the
following things become more important if we are still interested in
attracting new contributions as it was discussed earlier [6].

I do not want to add extra steps to the existing workflow with code
review or make GitHub pull requests as default for patches as it also
was discussed already [7], just to improve the automation checks in it
and make checks more convenient.


= Proposed Solution =

== 1. Make the checkstyle config a single point of truth for the
source code style. ==

The checkstyle is already used and included in the Cassandra project
build lifecycle (ant command line, Jenkins, CircleCI). There is no
need to maintain code style configurations for different types of IDEs
(e.g. IntelliJ inspections configuration) since the checkstyle.xml
file can be directly imported to IDE used by a developer. This is fair
for Intellij Idea, NetBeans, and Eclipse.

So, I propose to focus on the checks themselves and checking pull
requests with automation scripts, rather than maintaining these
integrations. The benefits here are avoiding all issues with
maintaining configurations for different IDEs. Another good advantage
of this approach would be the ability to add new checkstyle rules
without touching IDE configuration - and such tickets will be LFH and
easy to commit.

The actions points here are:

- create an umbrella JIRA ticket for all checkstyle issues e.g. [8]
(or label checkstyle);
- add checkstyle to GitHub pull requests using GitHub actions (execute
ant command);
- include checkstyle to the build (already done);
- remove redundant code style configurations related to IDEs from the
source code e.g. [9];


== 2. Add additional tooling for code analysis to the build and GitHub
pull requests. ==

The source code static analysis and automated checks have been
discussed recently in the "SpotBugs to the build" topic [10]. I'd like
to add my 50 cents here.

Before discussing the pros and cons of each solution, let's focus on
the criteria that such solutions must meet. You can find the most
complete list of such tooling here [11].

>From my point of view, the crucial criteria are:
- free for open-source (at least licenses should allow such usages);
- popularity in the ASF ecosystems;
- convenient integration and/or plugins for IDEs and GitHub;
- we must be able to integrate with CirleCI, and Jenkins as well as
launch from a command line;


=== Sonar ===

pros
+ this tool is free for open-source and recommended by the ASF
infrastructure team [12];
+ was already used for the Cassandra project some time ago at
sonarcloud.io [13];
+ it has GitHub pull requests analysis [14];

cons
- run locally requires additional configuration and TOKEN_ID due to
the analysis results stored in the ext database (infra will not
provide it for local usage);

=== SpotBugs (FindBugs) ===

pros
+ license is allowed to use it and run it as a library (should be legal for us);
+ it analyses the bytecode that differs from how the checkstyle works;
+ can be executed from the command line as well as integrated into the build;

cons
- requires compiled source code;

=== PMD ===

pros
+ BSD licenses more permissive than LGPL (SpotBugs);
+ analyses the source code like the checkstyle does;
+ have an extended rule sets for source code analyses;

cons
- the checkstyle is already used in the project, and should be enough for now;

=== IntelliJ IDEA ===

pros
+ free for open-source and can be used from the command line [15];

Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-11-28 Thread Maxim Muzafarov
Thank you all for the feedback and productive discussion.


I couldn't have formed my thoughts on the build tools for the product
better and provided such good examples than Scott did. Rephrasing what
I wrote in the first letter, seeing Maven/Gradle in the project
underfoot, a modern IDE will take care of all the necessary files and
configurations for themselves much better than we do with scripts. I
fully agree that there is no rush with such migration, and the
databases in such cases must be more conservative than progressive,
and not change anything without strong benefits and a broad consensus
on it. I still believe this consensus can be reached in future and
when (and if) the consensus will be reached, a clear migration plan
should be developed for several releases ahead as well. There's still
a lot of work to be done here that's why I mentioned it at the end of
my proposal, so as not to pay too much attention to this question at
this moment.
I've added a link to this thread to the JIRA issue [1], so we don't
lose the insights mentioned by members above.


I want to take away your concerns about lints expansion for now. I
thought first of all about making all the source code-checking tools
more convenient for use with a minimal set of already existing lints
rather than adding or forcing new rules. I really want to avoid here
cases with storing multiple configurations for a single tool e.g.
having different configurations for 'optional' or 'mandatory' checks
as well as different configurations for 'production' or 'tests'.

Thus, the ideal picture in my mind of all discussed above is :

We have:
- checkstyle
- SpotBugs
- Sonar

They work the same way for:
- Jenkins builds
- CirleCI builds
- GitHub pull requests
- build on the local machine

For all that, we have the code style webpage [2] (and wiki [3]) is
pretty well described, there is no need to expand checking tools with
new rules until we will get working these tools on the minimal set of
rules. For instance, we can pick up for the checkstyle 'Unused
imports', 'Import order', for the SpotBugs 'AutoCloseable',
'Number.valueOf', for Sonar - only reports to monitor the source code
trends.

I agree that adding new lints require a broad consensus, so I'd like
to avoid such debatable questions for now. Moreover, even with the
lints already agreed upon, it is still risky to implement some of them
because they can contain a lot of boilerplate changes and may affect
more important fixes ready for merge.


So, as a first step, I can invest my time into the checkstyle tool and
make it work everywhere with the same configuration.
WDYT?


P.S.

For IntelliJ with the Checkstyle Plugin it's easy to import the
checkstyle.xml the following way:
Preferences -> Code Style -> Show Scheme Actions (wheel) -> Import
scheme -> Checkstyle configuration.


[1] https://issues.apache.org/jira/browse/CASSANDRA-17015
[2] https://cassandra.apache.org/_/development/code_style.html
[3] https://cwiki.apache.org/confluence/display/CASSANDRA2/CodeStyle

On Sun, 27 Nov 2022 at 13:17, Josh McKenzie  wrote:
>
> My .02 on the build discussion is we should try and keep the guts of that in 
> one place, be it the other email thread or on JIRA. Some insightful points 
> made on this thread but would hate to see this thread derailed on a complex 
> independent topic as well as see some of these points lost on the other 
> discussion.
>
> I think there needs to be a lot of community consensus on the broad expansion 
> of lints that can reject patches.
>
> +1. It may be worthwhile to configure 2 tiers of lints, optional and 
> required, so we can move to a more gradual process of cleaning up lint 
> violations for those that are interested in that type of work. I know in the 
> past we've seen value in looking at the diff in linting violations even w/a 
> 1k+ noisy violation environment.
>
>
> On Fri, Nov 25, 2022, at 12:41 PM, sc...@paradoxica.net wrote:
>
> For me, the strongest arguments in favor of adopting a modern build tool like 
> Maven or Gradle are their ecosystems - both explicit (in terms of plugins), 
> and implicit (in terms of nearly all build tooling supporting both of them, 
> but not ant).
>
> Investment in Ant - and in tooling that integrates with Ant - fell off years 
> ago. This makes integrating build-phase aspects of Cassandra with other 
> tooling a very frustrating task that users of most build tools get for free. 
> Many tools built in the last several years don’t support it, or do so only as 
> an afterthought.
>
> Two recent examples that have caused pain for me, which I suspect are felt by 
> many:
>
> – Integration with internal build systems at many companies that develop 
> Cassandra. Because ant has fallen into disuse, this integration is heavily 
> manual instead of automatic and free. It usually requires forking the 
> project’s build.xml, developing custom tooling around it, or creating a mock 
> Gradle build that wraps ant lifecycle tasks (which also requir

Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-12-07 Thread Maxim Muzafarov
Dear community,


I have created the epic with code-style activities to track the progress:
https://issues.apache.org/jira/browse/CASSANDRA-18090

In my understanding, there is no need to format whole the code base at
once according to the code style described on the page [1], and the
best strategy here is to go forward with small evolutionary changes.
Thus eventually we will come up with a set of rules convenient for all
members of the community. In my mind, having one commit per an added
code style rule should be easy to look at for a reviewer, the git
commits history as well as rebasing/merging other pull requests that
may be affected by the new rules.


I want to raise one more question related to class imports and the
classses import order for a wider discussion. The import order is well
described on the code style page [1], but using wildcard imports is
not mentioned at all. The wildcard imports with their drawbacks has
has already been raised in the JIRA issue [2] and didn't get enough
attention.

The checkstyle has the rules we are interested in for import control
and they must be considered together. We can implement them in a
single pull request or one by one, or use only the last one:
- AvoidStarImport
- CustomImportOrder

But still, I think that wildcard imports have more disadvantages
(class names conflicts e.g. java.util.*, java.sql.* or a new version
of a library has name clashes) than advantages and such problems will
be found in later CI cycles.
Currently, I've implemented the AvoidStarImport checkstyle rule in a
dedicated pull request [3][4], so you will be able to see all amount
of the changes with removing wildcard imports. The changes are made
for the checkstyle configuration as well as for code style
configurations for different IDEs we supported.

So, the open questions here are:

- Should the source code obey the AvoidStarImport rule [3]? (I think yes);
- Should we implement AvoidStarImport and CustomImportOrder in a
single pull request or do it one by one?


Anyway, I will fix the result of the agreement over the
AvoidStarImport rule on the documentation page [1].



[1] https://cassandra.apache.org/_/development/code_style.html
[2] https://issues.apache.org/jira/browse/CASSANDRA-17925
[3] https://issues.apache.org/jira/browse/CASSANDRA-18089
[4] https://github.com/apache/cassandra/pull/2041

On Thu, 1 Dec 2022 at 11:55, Claude Warren, Jr via dev
 wrote:
>
> The last time I worked on a project that tried to implement a coding style 
> across the project it was "an education".  The short story is that trying to 
> "mitigate" the code base, with respect to style, is either a massive change 
> or a long slow process.
>
> Arguments here have stated that earlier attempts to have the tooling reformat 
> the code did not go well.  What we ended up doing was turned on the style 
> checker and looked at the number of issues across the project.  When new code 
> was accepted the number of issues could not rise.  Eventually most of the 
> code was clean, with a few well coded legacy bits still not up to standard.  
> We could do something similar here.  Much like code coverage, you can't 
> perform a merge unless the number of style errors remains the same or 
> decreases.
>
> As with all software rules, this is a strong recommendation as I am certain 
> that there are edge/corner case exceptions to be found.
>
>
>
>
> On Wed, Nov 30, 2022 at 3:30 PM Patrick McFadin  wrote:
>>
>> Why are we still debating build tooling? I think you’re wrong, but I’ve 
>> conceded - on the assumption that we can get enough volunteers willing to 
>> adopt responsibility for the new world order.
>>
>> Not debating. I am just throwing in my support since I have been in the Camp 
>> of Ant.
>>
>> On Wed, Nov 30, 2022 at 1:29 AM Benedict  wrote:
>>>
>>> Why are we still debating build tooling? I think you’re wrong, but I’ve 
>>> conceded - on the assumption that we can get enough volunteers willing to 
>>> adopt responsibility for the new world order.
>>>
>>> I suggest five long term contributors nominate themselves as the build file 
>>> maintainers, and collectively manage a safe and painless migration for the 
>>> rest of us - and agree to maintain and develop the new build file going 
>>> forwards, and support the community as they adopt it.
>>>
>>> On the topic of over-exuberant linting I will continue to push back. I 
>>> think linting our brace rules could make sense since they are atypical, but 
>>> more formatting rules than this likely just leads to atrophying style. 
>>> Authorship involves thinking about how to present your code; I don’t want 
>>> to either encourage lazy authorship or prevent experimentation with 
>>> presentation. Both would be bad, and I expect we would struggle to evolve 
>>> our style guide again in future as the language evolves. Our brace rules 
>>> are a good example everyone unilaterally ignored when lambdas arrived, as 
>>> we all recognised they materially harmed the brevity 

Re: [DISCUSS] Usage of "var" instead of types in the code

2024-11-05 Thread Maxim Muzafarov
To me, this sounds like the style consistency throughout the project,
so if we just allowed having the "var" keyword we would have a mix of
new and old styles without any distant prospect of a unified style.

We should evolve the code style from one unified form to another, thus
either we use it everywhere and fix all the places where it's
applicable, or forbid it, avoid having "mixed" styles.  If everyone
coded the way they liked, it would be a mess.

I would vote -0.5 to allow it, and +1 to forbid it everywhere.

On Tue, 5 Nov 2024 at 00:02, Štefan Miklošovič  wrote:
>
> People who are OK with vars in tests - are you also the ones who are going to 
> write vars from now on yourself or you just do not mind if you encounter it?
>
> There is a difference between
>
> "keep it in tests, I am going to use this, this is actually a good idea"
>
> and
>
> "keep it in tests if people are going to use it, I do not mind but I am not 
> going to change my style".
>
> If the latter is the case, then who is actually going to write tests on a 
> daily basis with vars? If one or two people then I guess it does not make a 
> lot of sense to keep it around.
>
> On Mon, Nov 4, 2024 at 11:10 PM Ariel Weisberg  wrote:
>>
>> Hi,
>>
>> I don’t like `var` anywhere. Even if IntelliJ could automatically insert the 
>> concrete type it would still be a problem in the GH compare view. GH compare 
>> view is a real problem, because any time something is sufficiently 
>> obfuscated I have to bounce back and forth with an IDE, check out the code 
>> etc or just proceed with a weaker mental model of what is going on.
>>
>> I have finite mental energy to expend every day and I don’t want to spend it 
>> hunting down and then recalling what each instance of var means repeatedly. 
>> It uses almost no energy to read past extra type information (formatting 
>> means I don’t even need to parse it) or do a little extra typing/autocomplete
>>
>> Ariel
>>
>> On Tue, Oct 29, 2024, at 1:13 PM, Štefan Miklošovič wrote:
>>
>> Hello,
>>
>> this should give you an idea
>>
>>  grep --include '*.java' -r 'var ' src/ test/
>>
>> I think this is a new concept here which was introduced recently with 
>> support of Java 11 / Java 17 after we dropped 8.
>>
>> What is your opinion? Are we free to use it wherever we want? I am quite 
>> conservative in this area and I will most probably still use types as we 
>> know them but maybe in tests we might relax it a little bit? Or production 
>> code with "var" is totally fine too without any concerns? I think this 
>> should be covered by the code style.
>>
>> Regards
>>
>>


Re: CEP-32: Open-Telemetry integration

2024-10-23 Thread Maxim Muzafarov
Hello,


I wanted to throw some ideas and a vision in terms of metrics,
trancing and the adoption of new integrations, particularly
OpenTelemetry. I personally feel that the more integrations we have,
the better the adoption of Cassandra as a database will be. With
OpenTelemetry, users could have a better "first experience", so I'm +1
here.

I have two concerns with the way we currently handle such integrations:

1. The first is how do we manage all these integrations, because
according to the CEP we are adding new dependencies and interfaces [1]
to the project and adding new configuration values, this is not bad in
itself. However, it also means that as the number of integrations
increases, so does the maintenance of the project and config (the
vision - is to have minimal extra deps in the core and the smallest
config).

2. Exporting metrics/logs should not affect the node itself (adjusting
the JVM params [2] of the node to make the integration work tells us
that we are doing something wrong) and the JVM process that does the
main work with the data by handling user requests. The priority of
serving metrics/logs is lower than a user request. The current
approach of adding new metric exporters and/or instrumenting JVM
agents could affect the stability and performance of the node, the
bugs could prevent the node from serving user requests as well (e.g.
calculating instead of exporting raw histograms [3] causing gcs and
impacts the node).



With all that, the alternate solution and the vision I'm trying to
highlight here is that we should just rely on the native protocol and
"incorporate" these things into the native protocol itself and CQL as
its part.
That way, Cassandra Sidecar and other sub-projects interested in the
internal state of the node can rely only on the protocol specification
and the query syntax.

Specifically, querying the node's internal state (basically metrics
and logs) is being done using two paradigms: "poll" and "push".

1. The "poll" is the simplest part, we already have all we need - lots
of virtual tables. A new virtual keyspace "system_metrics" [4]
contains all the internal metrics in the Prometheus format that
Cassandra exposes in JMX, and can be queried by any other system (e.g.
the Cassandra Sidecar that has established a local connection via Unix
Domain Socket to query the metrics) to expose them via the REST API or
other interfaces they need. The efficiency of exposing these metrics
is the best we can offer in terms of performance (I hope).

2. The "push" is currently and unfortunately is not implemented - but
normally is used and designed to export logs and internal events. The
vision is - to register a continuous query to listen for the log
updates on the node, which is also a part of the Sidecar. Such a
feature would be useful in itself, regardless of the fact that in our
case we are going to use it to listen to internal events and log
updates. From my point of view, other database vendors offer something
similar that Cassandra lacks:

https://cloud.google.com/bigquery/docs/continuous-queries-introduction
https://docs.influxdata.com/influxdb/v1/query_language/continuous_queries/

The syntax could be:

CREATE CONTINUOUS QUERY cq_log_updates ON system_queries
BEGIN
  SELECT timestamp, text FROM system_logs
END

These two paradigms can be collected under the umbrella of the
Cassandra Sidecar to support the OpenTelemetry without having extra
config and extra dependency and leaving the Cassandra core with no
additional dependencies and free to focus on the native protocol and
CQL.



By far, it was all about metrics and logs. I also took a look at the
opentelemetry-demo project to see what a trace might look like. The
example [5] shows a good picture of a tracing request up to the point
where the request reaches the core, and from that point on the trace
picture becomes foggy, showing almost nothing about Cassndra
internals. I think this part should be improved by adding traceable
types for each of the internal components we want to trace (messaging
between nodes, the waiting time in a pool, the time to parse a request
etc.). For instance, here is an example [6] of what the granularity
could look like.

With that, and since we would somehow have specific trace types for
Cassandra to view on internal components, I think the point where we
can bind Cassandra traces to the OpenTelemetry trace context is the
Sidecar (the best candidate from my point of view). Theoretically, we
could use the opentelemetry-java-instrumentation [7] to get an
overview of the internals, but without custom Cassandra trancing types
it won't give us good granularity and a clear picture of the node
internals.


Please don't take this as a criticism or -1 on my part, just wanted to
share an alternative way it could have been seen. The amount of extra
dependencies scares me so much :-)


[1] 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-32%3A+%28DRAFT%29+OpenTelemetry+integration#CEP32:(DRAFT)Ope

Re: [VOTE] Release Apache Cassandra 5.0.1

2024-09-24 Thread Maxim Muzafarov
+1

On Fri, 20 Sept 2024 at 16:36, Mick Semb Wever  wrote:
>
>
> Proposing the test build of Cassandra 5.0.1 for release.
>
> sha1: c206e4509003ac4cd99147d821bd4b5d23bdf5e8
> Git: https://github.com/apache/cassandra/tree/5.0.1-tentative
> Maven Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1348/org/apache/cassandra/cassandra-all/5.0.1/
>
> The Source and Build Artifacts, and the Debian and RPM packages and 
> repositories, are available here: 
> https://dist.apache.org/repos/dist/dev/cassandra/5.0.1/
>
> ==
> This release introduces safeguards and observability into possible data loss 
> scenarios when nodes have a  divergent view of the cluster. This happens 
> around edge-cases on unsafe bootstrapping, decommissions, or when a node has 
> a corrupted topology. Two configuration options have been added: 
> `log_out_of_token_range_requests` and  `reject_out_of_token_range_requests`, 
> both enabled by default. The former will make nodes log requests they receive 
> that don't belong in their current or pending token ranges. The latter will 
> reject those requests, which prevents any eventual data loss that can occur 
> but may also incur small windows of degraded availability during range 
> movements. See CASSANDRA-13704 for further details.
> ==
>
> The vote will be open for 96 hours (longer if needed). Everyone who has 
> tested the build is invited to vote. Votes by PMC members are considered 
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
> [1]: CHANGES.txt: 
> https://github.com/apache/cassandra/blob/5.0.1-tentative/CHANGES.txt
> [2]: NEWS.txt: 
> https://github.com/apache/cassandra/blob/5.0.1-tentative/NEWS.txt


Re: [VOTE] Release Apache Cassandra 4.1.7

2024-09-24 Thread Maxim Muzafarov
+1

On Mon, 23 Sept 2024 at 01:59, Jordan West  wrote:
>
> +1. Validated by starting and creating a 3 node cluster using easy-cass-lab.
>
> Jordan
>
> On Fri, Sep 20, 2024 at 7:36 AM Mick Semb Wever  wrote:
>>
>>
>> Proposing the test build of Cassandra 4.1.7 for release.
>>
>> sha1: ca494526025a480bc8530ed3ae472ce8c9cbaf7a
>> Git: https://github.com/apache/cassandra/tree/4.1.7-tentative
>> Maven Artifacts: 
>> https://repository.apache.org/content/repositories/orgapachecassandra-1347/org/apache/cassandra/cassandra-all/4.1.7/
>>
>> The Source and Build Artifacts, and the Debian and RPM packages and 
>> repositories, are available here: 
>> https://dist.apache.org/repos/dist/dev/cassandra/4.1.7/
>>
>> ==
>> This release introduces safeguards and observability into possible data loss 
>> scenarios when nodes have a  divergent view of the cluster. This happens 
>> around edge-cases on unsafe bootstrapping, decommissions, or when a node has 
>> a corrupted topology. Two configuration options have been added: 
>> `log_out_of_token_range_requests` and  `reject_out_of_token_range_requests`, 
>> both enabled by default. The former will make nodes log requests they 
>> receive that don't belong in their current or pending token ranges. The 
>> latter will reject those requests, which prevents any eventual data loss 
>> that can occur but may also incur small windows of degraded availability 
>> during range movements. See CASSANDRA-13704 for further details.
>> ==
>>
>> The vote will be open for 96 hours (longer if needed). Everyone who has 
>> tested the build is invited to vote. Votes by PMC members are considered 
>> binding. A vote passes if there are at least three binding +1s and no -1's.
>>
>> [1]: CHANGES.txt: 
>> https://github.com/apache/cassandra/blob/4.1.7-tentative/CHANGES.txt
>> [2]: NEWS.txt: 
>> https://github.com/apache/cassandra/blob/4.1.7-tentative/NEWS.txt


Re: [REVIEW REQUEST] Exposing the status of a cleanup command on a virtual table

2024-09-27 Thread Maxim Muzafarov
Hello everyone,

I still need a few more eyes on [1][2], but this time I'm going to try
and do some marketing for the feature I'm talking about, so...


We are trying to bridge the gap between the API that is called and the
compaction process that MAY or MAY NOT be called as a result, and make
users aware of what is happening inside the cluster with their running
commands. Currently, this can only be viewed by reading logs, which is
not a convenient way for both operators and audit subsystems of the
node internals.

What we want to do is store the history of running operations for the
compaction manager in a small collection in the java heap and fill
this gap with virtual tables on top of this data collection, namely:

- compaction_operations_status - has (operation_type, operation_id)
primary key and exposes the status of the cleanup command as a whole.
It may or may not trigger the compaction process and the compaction
may or may not appear in the sstable_tasks virtual table (active
compactions);
- compaction_operations_linked_tasks - has (operation_type,
operation_id, compaction_id) as its primary key and shows the
relationship between the user-triggered operation and the compaction
process invoked as a result;

The CASSANDRA-19670 [1] issue covers only the cleanup command and
demonstrates the approach; all other commands, which can be identified
by the OperationType class, could be implemented in follow-up issues.


Examples:

- The definition of these new virtual tables looks like:
https://gist.github.com/Mmuzaf/2d3006f5b654d54e7cabc343cd73a2a3

- The output when we run the cleanup command, but it doesn't trigger
the compaction:
https://gist.github.com/user-attachments/assets/9089d5c1-70d4-475f-9cf7-cc16dff48699


[1] https://issues.apache.org/jira/browse/CASSANDRA-19760
[2] https://github.com/apache/cassandra/pull/3412/files

On Mon, 15 Jul 2024 at 21:06, Maxim Muzafarov  wrote:
>
> Hello everyone,
>
> I would like to gently ask for help in reviewing the following issue
> that we've been facing for a while:
> https://issues.apache.org/jira/browse/CASSANDRA-19760
>
> When a cleanup command is called, the compaction process under the
> hood is triggered accordingly. However, if there is nothing to compact
> or the cleanup command returns with a status other than SUCCESSFUL,
> there is no way to get the execution results of the command that was
> run. This is especially true when using any kind of
> automation/scripting on top of JMX or as a nodetool wrapper.
>
> I propose to keep these history results in memory for some time and
> expose them via a virtual table so that a user can query it to check
> the status.
>
> Any suggestions are welcome. I believe other commands like verify,
> scrub, etc. can be exposed in the same way.


Re: [VOTE] CEP-45: Mutation Tracking

2025-02-06 Thread Maxim Muzafarov
+1 (nb)

On Thu, 6 Feb 2025 at 05:34, Patrick McFadin  wrote:
>
> +1
>
> On Wed, Feb 5, 2025 at 8:15 PM C. Scott Andreas  wrote:
> >
> > +1
> >
> > On Feb 5, 2025, at 2:50 PM, Alex Petrov  wrote:
> >
> >
> > +1
> >
> > On Wed, Feb 5, 2025, at 11:03 PM, Blake Eggleston wrote:
> >
> > Ok ok, I've jumped gun here, sorry, small off by 24 error. Please continue 
> > voting, and I'll be back tomorrow :D
> >
> > On Wed, Feb 5, 2025, at 1:49 PM, Blake Eggleston wrote:
> >
> > The vote passes with 10 +1s (4 nb) and no -1.
> >
> > Thanks everyone!
> >
> > Blake
> >
> > On Wed, Feb 5, 2025, at 1:07 PM, Jon Meredith wrote:
> >
> > +1 (nb)
> >
> > On Tue, Feb 4, 2025 at 5:07 PM guo Maxwell  wrote:
> >
> > +1
> > Dmitry Konstantinov 于2025年2月5日 周三上午6:04写道:
> >
> > +1 (nb)
> >
> > On Tue, 4 Feb 2025 at 22:00, Abe Ratnofsky  wrote:
> >
> > +1 (nb)
> >
> >
> >
> >
> > --
> > Dmitry Konstantinov
> >
> >
> >
> >


Re: Looking for Cassandra Forward topics and speakers

2025-01-30 Thread Maxim Muzafarov
Hello community, Patrick,

I can also prepare some PP slides (status, design, and progress) and a
short talk for:
 - CQL Management API

If you can help, option 2 sounds good to me.

On Thu, 30 Jan 2025 at 11:30, Rolo, Carlos via dev
 wrote:
>
> Hello Patrick,
>
> Count me in!
>
> I would like to pick either
>
>  - CQL Management API: Can we just celebrate the end of JMX hell
> potentially? Who can talk about this?
>
> Or
>
>  - SAI enhancements were recently highlighted by Caleb on the ML. This
> isn't just a one-and-done feature, and its future is really cool.
>
> What would be your timeline for this? Option 2 sounds great to me!
>
> Cheers,
>
> Carlos
>
>
> 
> From: Patrick McFadin 
> Sent: 29 January 2025 19:48
> To: dev 
> Subject: Looking for Cassandra Forward topics and speakers
>
> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments
>
>
>
>
> Hi everyone,
>
> A couple of years ago, I organized a Cassandra Forward event to get
> people excited about the next version of Cassandra. It's time to ramp
> up the excitement about one of the more consequential releases of
> Cassandra: 5.1 or 6. Whatever we land on as a version will impact the
> community with the addition of ACID transactions.
>
> There is a lot more to talk about, so my plan is to create a nicely
> rounded menu of topics that showcase the velocity of our new features.
>
> Format: Online and prerecorded.
> Date: First week of March
>
> My main issue is needing speakers. Here's the topic list (please feel
> free to comment on this)
>
>  - Accord: I can cover this like I have been. No need for a speaker there.
>
>  - TCM: I think the most important thing nobody knows about with
> future Cassandra.
>
>  - Sidecar (Spark jobs, Live migration) #2 on my list of "things you
> should know about Cassandra but don't."  This project has been in the
> shadows too long and I think users will love it.
>
>  - Cassandra and Kubernetes: Not especially new, but certainly ramping
> up. This would be great topic for an end user to discuss. Share the
> good and bad.
>
>  - CQL Management API: Can we just celebrate the end of JMX hell
> potentially? Who can talk about this?
>
>  SAI enhancements were recently highlighted by Caleb on the ML. This
> isn't just a one-and-done feature, and its future is really cool.
>
> This you?
>
> I DON'T HAVE TIME FOR ANY OF THIS!  Here are some options:
>
>   - If you want to give a talk but don't want to deal with the
> logistics of recording, I can get you on Zoom and record it.
>  - Don't have time to create the content for a talk? I can get you in
> a Zoom and do an interview style recording. 30-60 minutes of your
> time.
>  - Can't get permission to talk? Let's find somebody who can give the
> talk, and then we can work together to ensure the content is right.
>  - I will feed java files into ChatGPT about the feature you love and
> present it like a boss.
>
> I recommend choices 1-3.
>
> Thanks, everyone. I appreciate your time if you got this far.
>
> Patrick


Re: Patrick McFadin joins the PMC

2025-01-22 Thread Maxim Muzafarov
Congratulations, Patrick!
I’m surprised because I thought you were already a member! :-)

On Wed, 22 Jan 2025 at 18:36, Francisco Guerrero  wrote:
>
> Congrats, Patrick! It is well deserved.
>
> On 2025/01/22 16:05:09 Jordan West wrote:
> > The PMC's members are pleased to announce that Patrick McFadin has accepted
> > an invitation to become a PMC member.
> >
> > Thanks a lot, Patrick, for everything you have done for the project all
> > these years.
> >
> > Congratulations and welcome!!
> >
> > The Apache Cassandra PMC
> >


Re: [VOTE] Release Apache Cassandra Java Driver 3.12.1

2025-01-23 Thread Maxim Muzafarov
+1 (nb)

On Thu, 23 Jan 2025 at 16:35, Josh McKenzie  wrote:
>
> +1
>
> On Thu, Jan 23, 2025, at 9:58 AM, Štefan Miklošovič wrote:
>
> +1
>
> On Sat, Jan 18, 2025 at 10:54 PM Bret McGuire  wrote:
>
> Greetings all!
>
>
>I’m proposing the Cassandra Java Driver 3.12.1 for release.
>
>
> sha1: 873e6f764a499bd9c5a42cafa53dc77184711eea
>
> git: https://github.com/apache/cassandra-java-driver/tree/3.12.1
>
> Maven Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1355
>
>
>The Source release is available here:
>
> https://dist.apache.org/repos/dist/dev/cassandra/cassandra-java-driver/3.12.1/
>
>
>This is the first release of new functionality for the 3.x Java driver 
> since its donation.  Our recent 3.12.0 release was intended to provide an 
> ASF-branded baseline for the 3.x Java driver while this release is intended 
> to get any changes that might have been waiting in the 3.x branch out into 
> the wild.  The full changelog can be found at 
> https://github.com/apache/cassandra-java-driver/tree/3.12.1/changelog#3121
>
>
>The vote will be open for 120 hours (longer if needed) due to the upcoming 
> holiday weekend. Everyone who has tested the build is invited to vote. Votes 
> by PMC members are considered binding. A vote passes if there are at least 
> three binding +1s and no -1's.
>
>
>Thanks!


Re: [DISCUSS] Bracing style on trunk

2025-01-18 Thread Maxim Muzafarov
I'm leaning toward not changing the bracing style we already have
unless there's a powerful reason (hard to imagine what it could be).
So currently -1. I would rather focus on enabling lints we all agree
on, and/or the consensus is easy to achieve. There are many such
lints, and much work to be done.


What I don't agree with is that "post-accord" merging things. If that
is an exception, it's not a problem, I believe in flexibility. If this
is a rule of thumb, then we will never progress in an active community
with such lints and code cleanup because the development of important
features will always be running.

The patches (the import order and the javadocs) have been waiting for
an important merge to happen for YEARS (no progress since then):
https://issues.apache.org/jira/issues/?jql=labels%20%3D%20code-polishing

On Sat, 18 Jan 2025 at 22:19, Dinesh Joshi  wrote:
>
> I honestly did not want to debate stylistic topics on a beautiful Saturday 
> afternoon. As a project we have other, more pressing, topics to discuss. 
> However, I just wanted to chime in and say that we have 3 options –
>
> 1. Continue placing the brace on a new line.
> 2. Move to accepting both new line braces and braces on the same line.
> 3. Adopt braces on the same line and not reformat the whole codebase.
>
> Irrespective of the choice of option, reformatting our codebase is a 
> non-starter as it will kill productivity.
>
> If there are exceptions to a formatting rule we should try and enumerate a 
> couple examples for clarity.
>
> thanks,
>
> Dinesh
>
>
> On Sat, Jan 18, 2025 at 1:07 PM Ekaterina Dimitrova  
> wrote:
>>
>> That is how I see it and how I personally understood you, Blake! Thanks!
>>
>> Also, Jon, appreciate your point of view too. I would support it for a new 
>> project though, Cassandra codebase is too big, too old, and too active IMHO 
>> for such a lift. Also, from my experience being around for about 5 years 
>> already - easy wide-spread changes normally bring the most friction. I burnt 
>> myself with that in the past. Thus my position, being extra cautious.
>>
>> Though if the majority of the people see it in a different way - I am open 
>> to change position following the right arguments and I am not going to stay 
>> on the way. Thank you all. Curious to hear more points of view.
>>
>> On Sat, 18 Jan 2025 at 15:56, Blake Eggleston  wrote:
>>>
>>> Just to be clear, I do think it would be good for the project to conform to 
>>> a more standard java style. I just think that the contributor friction from 
>>> this issue is pretty small, and the impact to work in progress would be 
>>> pretty severe. If the goal is to reduce contributor friction, there's 
>>> probably lower hanging fruit that's less disruptive.
>>>
>>>
>>> On Jan 18, 2025, at 12:43 PM, Jon Haddad  wrote:
>>>
>>> + .9 for me.
>>>
>>> I think it would be a good thing for the project in the long run to conform 
>>> to a more standard Java style, but I'm unable to provide any time to make 
>>> it happen.  I don't think it's reasonable to impose this burden on anyone 
>>> if I'm not willing to take it on myself, so this is why I'm not a +1.
>>>
>>> https://www.apache.org/foundation/voting.html#expressing-votes-1-0-1-and-fractions
>>>
>>>
>>>
>>>
>>> On Sat, Jan 18, 2025 at 12:32 PM Ekaterina Dimitrova 
>>>  wrote:

 I also do not see huge benefit in switching the style, honestly. And I see 
 risks more than benefits.

 I also share Blake’s opinion that this would be more suitable for a new 
 project.

 On Sat, 18 Jan 2025 at 15:27, Blake Eggleston  wrote:
>
> I lean pretty strongly towards -1 on this. If we were starting a new 
> project, then yeah it would make sense. As an older project though, I 
> don’t see any clear benefits for switching the style at this point, and 
> can foresee it causing a lot of pain. Even if we were to wait for accord 
> before going forward, and address any issues with git blame, there are a 
> lot of other in-flight projects that would have to deal with this, there 
> are a lot of tickets waiting for review that would be affected.
>
>
> On Jan 18, 2025, at 9:40 AM, Štefan Miklošovič  
> wrote:
>
> I agree with Benedict wholeheartedly.
>
> Yes, I am happy where the braces currently are and I don't understand 
> that placing braces and the whole "problematic" is such a big topic for 
> other people.
>
> 99% of problems with braces are caused by people not having their 
> formatter in IDEA (or any IDE of their choosing) set up correctly. 
> Setting up a formatter in your IDE is a perfectly normal thing to do.
>
> I am trying to figure out how to set this up automatically so when people 
> hit formatting shortcuts, the braces would be put where they should be.
>
> I do not think that "well but I need to think about formatting and 
> hitting that shortcut!" is a valid point. Come o

  1   2   >