Re: [DISSCUSS] Access to JDK internals only after dev mailing list consensus?

2022-11-07 Thread Claude Warren, Jr via dev
This change looks good to me.  It is clear and concise.

On Fri, Nov 4, 2022 at 9:50 PM Ekaterina Dimitrova 
wrote:

> šŸ‘‹
>
> I finally got the chance to put down a proposal for a section at the end
> of the Cassandra Code Style document.
> Please help a fellow non-native speaker and definitely not a writer with
> some constructive criticism. :-)
> My proposal is in this commit -
>
> https://github.com/ekaterinadimitrova2/cassandra-website/commit/4a9edc7e88fd9fc2c6aa8a84290b75b02cac03bf
>
> I noticed the Dependencies section suggested in the past by Benedict was
> missing, even if we had a consensus around that. I added it back from the
> original doc.
>
> Best regards ,
> Ekaterina
>
> On Fri, 9 Sep 2022 at 13:34, Ekaterina Dimitrova 
> wrote:
>
>> Hi everyone,
>> Seems to me that people will be fine with heads up on the mailing list
>> and details on tickets?
>> Plus update of the Code Style to add a point around that, as suggested.
>>
>> I will leave this thread open a few more days and if there are no
>> objections I will continue with documenting it.
>>
>> Have a great weekend everyone!
>>
>> On Fri, 2 Sep 2022 at 14:01, Ekaterina Dimitrova 
>> wrote:
>>
>>> Git and jira , nothing specific
>>>
>>> On Fri, 2 Sep 2022 at 12:51, Derek Chen-Becker 
>>> wrote:
>>>
 I think it's fine to state it explicitly rather than making it an
 assumption. Are we tracking any usage of internals in the codebase
 currently?

 Cheers,

 Derek

 On Fri, Sep 2, 2022 at 6:30 AM Ekaterina Dimitrova <
 e.dimitr...@gmail.com> wrote:

>
>
> ā€œ A quick heads up to the dev list with the jira would be sufficient
> for anybody interested in discussing it further to comment on the jira.ā€
>
> Agreed, I did’t mean voting but more or less we have the lazy
> consensus or sharing concerns. Discussing them on a ticket should be 
> enough
> but it needs to happen. Also, it shouldn’t be  more often than adding
> dependencies I guess.
>
> JDK team is only closing more and more internals and warning us about
> potential breakages. I want to prevent us from urgent fixing in patch
> releases and to ease the maintenance.
>
> I think ensuring that it is clearly documented why an exception is
> acceptable and what options were considered will be of benefit for
> maintenance. We can revise in time what has changed.
>
> ā€œ . Unless absolutely needed we should avoid accessing the internals.
> Folks on this project should understand why. We can make the dangers of
> this explicit in our contributor documentation. ā€
> +1
>
> On Fri, 2 Sep 2022 at 1:26, Dinesh Joshi  wrote:
>
>> Personally not opposed to this. However, this is something that
>> should be vetted closely by the reviewers. Unless absolutely needed we
>> should avoid accessing the internals. Folks on this project should
>> understand why. We can make the dangers of this explicit in our 
>> contributor
>> documentation. However, requiring an entire dev list discussion around it
>> seems unnecessary. A quick heads up to the dev list with the jira would 
>> be
>> sufficient for anybody interested in discussing it further to comment on
>> the jira. WDYT?
>>
>> Dinesh
>>
>> On Sep 1, 2022, at 8:31 AM, Ekaterina Dimitrova <
>> e.dimitr...@gmail.com> wrote:
>>
>> Hi everyone,
>>
>>
>> Some time ago we added a note to the project Cassandra Code Style:
>> ā€œNew dependencies should not be included without community consensus
>> first being obtained via a [DISCUSS] thread on the
>> dev@cassandra.apache.org mailing listā€
>>
>> I would like to suggest also to add a point around accessing JDK
>> internals. Any  patch that suggests accessing internals and/or adding 
>> even
>> more add-opens/add-exports to be approved prior commit on the mailing 
>> list.
>>
>> It seems to me the project can only benefit of this visibility. If
>> something is accepted as an exception, we need to have the right
>> understanding and visibility of why; in some cases maybe to see for
>> alternatives, to have follow up tickets opened, ownership taken. In my
>> opinion this will be very helpful for maintaining the codebase.
>>
>> If others agree with that I can add a sentence to the Code Style.
>> Please let me know what you think.
>>
>> Best regards,
>> Ekaterina
>>
>>
>>

 --
 +---+
 | Derek Chen-Becker |
 | GPG Key available at https://keybase.io/dchenbecker and   |
 | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
 | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
 +---+




Re: [DISSCUSS] Access to JDK internals only after dev mailing list consensus?

2022-11-07 Thread Chen-Becker, Derek
There was one very minor grammatical nit that I commented on, but I think that 
otherwise this is clear and well-written 😊

Cheers,

Derek


From: Ekaterina Dimitrova 
Reply-To: "dev@cassandra.apache.org" 
Date: Friday, November 4, 2022 at 3:51 PM
To: "dev@cassandra.apache.org" 
Subject: RE: [EXTERNAL][DISSCUSS] Access to JDK internals only after dev 
mailing list consensus?


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


šŸ‘‹

I finally got the chance to put down a proposal for a section at the end of the 
Cassandra Code Style document.
Please help a fellow non-native speaker and definitely not a writer with some 
constructive criticism. :-)
My proposal is in this commit -
https://github.com/ekaterinadimitrova2/cassandra-website/commit/4a9edc7e88fd9fc2c6aa8a84290b75b02cac03bf

I noticed the Dependencies section suggested in the past by Benedict was 
missing, even if we had a consensus around that. I added it back from the 
original doc.

Best regards ,
Ekaterina

On Fri, 9 Sep 2022 at 13:34, Ekaterina Dimitrova 
mailto:e.dimitr...@gmail.com>> wrote:
Hi everyone,
Seems to me that people will be fine with heads up on the mailing list and 
details on tickets?
Plus update of the Code Style to add a point around that, as suggested.

I will leave this thread open a few more days and if there are no objections I 
will continue with documenting it.

Have a great weekend everyone!

On Fri, 2 Sep 2022 at 14:01, Ekaterina Dimitrova 
mailto:e.dimitr...@gmail.com>> wrote:
Git and jira , nothing specific

On Fri, 2 Sep 2022 at 12:51, Derek Chen-Becker 
mailto:de...@chen-becker.org>> wrote:
I think it's fine to state it explicitly rather than making it an assumption. 
Are we tracking any usage of internals in the codebase currently?

Cheers,

Derek

On Fri, Sep 2, 2022 at 6:30 AM Ekaterina Dimitrova 
mailto:e.dimitr...@gmail.com>> wrote:


ā€œ A quick heads up to the dev list with the jira would be sufficient for 
anybody interested in discussing it further to comment on the jira.ā€

Agreed, I did’t mean voting but more or less we have the lazy consensus or 
sharing concerns. Discussing them on a ticket should be enough but it needs to 
happen. Also, it shouldn’t be  more often than adding dependencies I guess.

JDK team is only closing more and more internals and warning us about potential 
breakages. I want to prevent us from urgent fixing in patch releases and to 
ease the maintenance.

I think ensuring that it is clearly documented why an exception is acceptable 
and what options were considered will be of benefit for maintenance. We can 
revise in time what has changed.

ā€œ . Unless absolutely needed we should avoid accessing the internals. Folks on 
this project should understand why. We can make the dangers of this explicit in 
our contributor documentation. ā€
+1

On Fri, 2 Sep 2022 at 1:26, Dinesh Joshi 
mailto:djo...@apache.org>> wrote:
Personally not opposed to this. However, this is something that should be 
vetted closely by the reviewers. Unless absolutely needed we should avoid 
accessing the internals. Folks on this project should understand why. We can 
make the dangers of this explicit in our contributor documentation. However, 
requiring an entire dev list discussion around it seems unnecessary. A quick 
heads up to the dev list with the jira would be sufficient for anybody 
interested in discussing it further to comment on the jira. WDYT?
Dinesh


On Sep 1, 2022, at 8:31 AM, Ekaterina Dimitrova 
mailto:e.dimitr...@gmail.com>> wrote:
Hi everyone,

Some time ago we added a note to the project Cassandra Code Style:
ā€œNew dependencies should not be included without community consensus first 
being obtained via a [DISCUSS] thread on the 
dev@cassandra.apache.org mailing listā€

I would like to suggest also to add a point around accessing JDK internals. Any 
 patch that suggests accessing internals and/or adding even more 
add-opens/add-exports to be approved prior commit on the mailing list.

It seems to me the project can only benefit of this visibility. If something is 
accepted as an exception, we need to have the right understanding and 
visibility of why; in some cases maybe to see for alternatives, to have follow 
up tickets opened, ownership taken. In my opinion this will be very helpful for 
maintaining the codebase.

If others agree with that I can add a sentence to the Code Style. Please let me 
know what you think.

Best regards,
Ekaterina




--
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+



Cassandra project status update 2022-11-07

2022-11-07 Thread Josh McKenzie
Oh good grief, it's been 26 days since I wrote one of these. My apologies! 
(Life happens - I can confirm that the terribly named "triple-demic" is real 
folks)

We've had a number of releases since the last status email. The current and 
latest supported GA cassandra releases across all branches are:

- cassandra 4: 4.0.7
- cassandra 3.11: 3.11.14
- cassandra 3.0: 3.0.28


[Needs Committers]
I'd like to first focus our attention on tickets that are flagged as "Needs 
Committer". Our project rules for Cassandra are that 2 committers need to sign 
off on a commit, so many times if an author or reviewer isn't yet a committer, 
these tickets can need external input to get into the codebase. The following 
URL is for a query to pull the Needs Committer tickets: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20resolution%20%3D%20unresolved%20and%20status%20%3D%20%22Needs%20Committer%22

CASSANDRA-17861, Update Python test framework from nose to pytest in CCM could 
use another committer on it: 
https://issues.apache.org/jira/browse/CASSANDRA-17861

CASSANDRA-17870, nodetool/rebuild: Add flag to exclude nodes from local 
datacenter could also use another committer on review: 
https://issues.apache.org/jira/browse/CASSANDRA-17870

CASSANDRA-15402, Make incremental backup configurable per keyspace and table 
looks like it has committer attention as per a recent comment so we're good 
there.

CASSANDRA-14930, decommission may cause timeout because messaging backlog is 
cleared: not sure why this one is marked as Needs Committer actually as it has 
2 as reviewer. Might just need a status update.

Before we get to 4.1 status, I'd like to call out that Trie memtables were 
merged in CASSANDRA-17240. This is a large body of novel work (that Branimir 
presented on at ApacheCon for those of you lucky enough to attend) and it's 
great to see this land in the project; it's worth your time to pop open that 
diff and take a look around and see some of the new things being added to 
Cassandra. Notably, there's some great discussion about property-based testing 
going on in the comments which has sparked some offline discussion about how we 
can integrate exploratory fuzz testing in our primary CI pipeline; more to come 
on that front as discussions evolve.


[4.1 status]
Let's move on to 4.1 status. We're down to 2 tickets blocking rc, and I'm given 
to understand that the one in progress is close to having something to review, 
so on the "outstanding work" side we're in great shape: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484

That leaves us with the question: what do we do about CI? We've recently 
expanded our governance options as to what we consider validated and cleared 
for release: 
https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle. 
Specifically:

"Three consecutive green runs of circleci OR of ASF CI are required to cut RC"

Our most recent run of 4.1 on ASF infra had 9 failures - 
https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1.
 This has been trending up a bit very recently from a low of 1 a bit over a 
week ago; the lion's share of the failures look to be environmental with 
timeouts.

With ASF CI having stragglers that are flaking lately, option 2 would be three 
consecutive green runs on circleci, however in order for this to be 
representative we need some improvements to the test configuration in circle to 
get it into parity with the ASF env, as tracked in CASSANDRA-17930 here: 
https://issues.apache.org/jira/browse/CASSANDRA-17930. As of a recent comment 
Ekaterina's taking point on this and tracking that addition in CASSANDRA-18001: 
https://issues.apache.org/jira/browse/CASSANDRA-18001. Ekaterina - if there's 
anything other folks on the project can do to assist (including reviewing) 
please let us know.

So we do have a 3rd option we discussed in slack: running tests on the ASF 
infra and then selectively multiplexing failures on circle. If a test fails on 
ASF CI but passes 500 times on circle, the general consensus was that was 
sufficient for us to have confidence in the test. With the recent changes 
Andres introduced in CASSANDRA-17939, multiplexing multiple tests in circle has 
become very simple and you can see instructions on generating the correct 
circle config using .circleci/generate.sh --help (look for the REPEATED_UTESTS= 
, REPEATED_JVM_DTESTS=, etc options). This hybrid third approach (canonical run 
on ASF infra + multiplex failures on circle) gives us another outlet to get a 
validated release if necessary, albeit at the cost of more effort.

I'm working with some of the other contributors on ways we can evolve our 
canonical CI infrastructure as well as making that environment reproducible in 
order to get us a more stable environment in the ASF while also allowing 
contributors with access to private cloud hardware to run testing at higher 
parallelization levels; stay tun

Re: Some tests are never executed in CI due to their name

2022-11-07 Thread Miklosovic, Stefan
Hi,

A while ago we came up with a very nice solution (thank you David Capwell for 
suggesting this). Sorry for no updating this thread sooner, a lot of discussion 
was done in the ticket. (CASSANDRA-17964).

The solution we think is the best is to have an Ant task which would scan all 
classes via reflection on build-test target (separate / dedicated target will 
be available too) and it will scan files on containing Test annotation. If that 
annotation is present, it means that class is a test and it has to have so and 
so naming convention (currently it has to end on Test). There are details 
covered as well (for example, cases when there is an abstract class containing 
all @Test methods and a concrete test class is just extending that, all is 
covered too).

The reason for this approach is that:

1) it is way more robust that trying to come up with some super regexp
2) We can put arbitrary logic into it, for example, we may ignore the invalid 
name of a test class if we "know what we are doing" but putting 
@IgnoreInvalidTestName annotation on top of that class and Ant task would just 
skip that.
3) We can introduce this task into all branches we support. Checkstyle is 
present only in 4.1 and trunk so we would have kind of half-baked solution here 
and older branches would not be covered.

As we enabled more tests because they were ignored / effectively skipped 
previously, some enabled tests started to fail. There are two (currently know):

(1) This one started to fail on 4.0, not sure whats the deal for older 
branches, it probably needs deeper look
(2) PaxorRepairTest2 - this one fails to start because of configuration in that 
test but even it is fixed it fails anyway on something else. I will create a 
ticket for that soon.

I will reference this email to Cassandra update status to raise awareness that 
some tests we enabled started to fail in 4.0 -> trunk.

1) https://issues.apache.org/jira/browse/CASSANDRA-18021
2) ticket to be created

Thanks


From: Berenguer Blasi 
Sent: Thursday, November 3, 2022 9:47
To: dev@cassandra.apache.org
Subject: Re: Some tests are never executed in CI due to their name

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




Re '@Test' without 'abstract': There's a corner case of a base abstract class 
with all test cases. Then 2 or 3 inheriting classes with no '@Test' annotations 
bc they just run the base class with different configs, setups, envs, 
parametrized runs, etc I've seen some of those.

On 1/11/22 22:54, Mick Semb Wever wrote:


Aha wait a minute I got it :D Yeah. This is not what you invented, it is 
already in, sorry.

So, you are saying we should just modify this to cover @Test / abstract.

I dont know ... interesting.

Lets wait for others to answer.


This thread is rather painful to digest because of the top-posting :'(

I agree that all non-abstract classes that contain a @test annotation must have 
a filename ending in "Test.java"

That leaves CI scripts untouched.

How we enforce that I have no opinion about. If it's not easy in checkstyle, 
just do it with some bash in the cassandra-test.sh script. It'll get caught 
pre-commit either way, and once the rename of existing files is done the 
precedence is clearly in place.


Re: Cassandra project status update 2022-11-07

2022-11-07 Thread Miklosovic, Stefan
Hi Josh,

thanks for the status.

I would like to raise awareness that as we fix CASSANDRA-17964, it will 
introduce two tests which will start to fail (because they were not executed as 
part of CI until now because how they are named (not ending on *Test)).

I believe that these tests will need to be addressed and fixed before 4.1 is 
out.

My email describing that in more detail is here (1).

(1) https://lists.apache.org/thread/pl0q1krhgv0rvybp5jmdy3411hchy28l

Regards,

Stefan

(1) https://lists.apache.org/thread/pl0q1krhgv0rvybp5jmdy3411hchy28l


From: Josh McKenzie 
Sent: Monday, November 7, 2022 22:59
To: dev
Subject: Cassandra project status update 2022-11-07

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Oh good grief, it's been 26 days since I wrote one of these. My apologies! 
(Life happens - I can confirm that the terribly named "triple-demic" is real 
folks)

We've had a number of releases since the last status email. The current and 
latest supported GA cassandra releases across all branches are:

- cassandra 4: 4.0.7
- cassandra 3.11: 3.11.14
- cassandra 3.0: 3.0.28


[Needs Committers]
I'd like to first focus our attention on tickets that are flagged as "Needs 
Committer". Our project rules for Cassandra are that 2 committers need to sign 
off on a commit, so many times if an author or reviewer isn't yet a committer, 
these tickets can need external input to get into the codebase. The following 
URL is for a query to pull the Needs Committer tickets: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20and%20resolution%20%3D%20unresolved%20and%20status%20%3D%20%22Needs%20Committer%22

CASSANDRA-17861, Update Python test framework from nose to pytest in CCM could 
use another committer on it: 
https://issues.apache.org/jira/browse/CASSANDRA-17861

CASSANDRA-17870, nodetool/rebuild: Add flag to exclude nodes from local 
datacenter could also use another committer on review: 
https://issues.apache.org/jira/browse/CASSANDRA-17870

CASSANDRA-15402, Make incremental backup configurable per keyspace and table 
looks like it has committer attention as per a recent comment so we're good 
there.

CASSANDRA-14930, decommission may cause timeout because messaging backlog is 
cleared: not sure why this one is marked as Needs Committer actually as it has 
2 as reviewer. Might just need a status update.

Before we get to 4.1 status, I'd like to call out that Trie memtables were 
merged in CASSANDRA-17240. This is a large body of novel work (that Branimir 
presented on at ApacheCon for those of you lucky enough to attend) and it's 
great to see this land in the project; it's worth your time to pop open that 
diff and take a look around and see some of the new things being added to 
Cassandra. Notably, there's some great discussion about property-based testing 
going on in the comments which has sparked some offline discussion about how we 
can integrate exploratory fuzz testing in our primary CI pipeline; more to come 
on that front as discussions evolve.


[4.1 status]
Let's move on to 4.1 status. We're down to 2 tickets blocking rc, and I'm given 
to understand that the one in progress is close to having something to review, 
so on the "outstanding work" side we're in great shape: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484

That leaves us with the question: what do we do about CI? We've recently 
expanded our governance options as to what we consider validated and cleared 
for release: 
https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle. 
Specifically:

"Three consecutive green runs of circleci OR of ASF CI are required to cut RC"

Our most recent run of 4.1 on ASF infra had 9 failures - 
https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1.
 This has been trending up a bit very recently from a low of 1 a bit over a 
week ago; the lion's share of the failures look to be environmental with 
timeouts.

With ASF CI having stragglers that are flaking lately, option 2 would be three 
consecutive green runs on circleci, however in order for this to be 
representative we need some improvements to the test configuration in circle to 
get it into parity with the ASF env, as tracked in CASSANDRA-17930 here: 
https://issues.apache.org/jira/browse/CASSANDRA-17930. As of a recent comment 
Ekaterina's taking point on this and tracking that addition in CASSANDRA-18001: 
https://issues.apache.org/jira/browse/CASSANDRA-18001. Ekaterina - if there's 
anything other folks on the project can do to assist (including reviewing) 
please let us know.

So we do have a 3rd option we discussed in slack: running tests on the ASF 
infra and then selectively multiplexing failures on circle. If a test fails on 
ASF CI but passes 500 times on circle, the general consensus was that was 
suff

[DISCUSSION] Add SpotBugs to the build

2022-11-07 Thread David Capwell
I was thinking that it would be good to add SpotBugs 
(https://spotbugs.github.io) into our build to help find bugs earlier in the 
life cycle.  SpotBugs is LGPL but as it is used only in the build and not to 
within this project, then this should be fine with Apache.

The motivation for adding this was from CASSANDRA-17178; the Simulator has 
issues with Serializable classes missing serialVersionUID (as we deal with 
ClassLoaders; this field is strongly recommend in general for all Serializable 
classes), but this project can add more value as there are a large collection 
of potential bugs to look out for; below are a few examples found.

* Number.valueOf vs Number.parse.  In many parts of the code we do 
valueOf which returns a boxed value; we then unbox for the usage; this adds 
more garbage that isn’t needed
* Using Number.compareTo rather than primitive compare functions (causing 
boxing)
* Ignoring return value for functions that don’t have a side effect.  This 
happens in a few cases where we are building a StringBuilder where we call 
.toString but ignore the string… then call it later on
* use of putIfAbsent without looking at the return.  This was found in 
CacheService where we add the SSTable reader to the cache and assume we win the 
race and start using it… rather than using the object that won the race

Re: [DISCUSSION] Add SpotBugs to the build

2022-11-07 Thread Dinesh Joshi
This is a good idea. Could you please run it by ASF legal just so we have all 
our bases covered?

> 
> On Nov 7, 2022, at 3:10 PM, David Capwell  wrote:
> 
> I was thinking that it would be good to add SpotBugs 
> (https://spotbugs.github.io) into our build to help find bugs earlier in the 
> life cycle.  SpotBugs is LGPL but as it is used only in the build and not to 
> within this project, then this should be fine with Apache.
> 
> The motivation for adding this was from CASSANDRA-17178; the Simulator has 
> issues with Serializable classes missing serialVersionUID (as we deal with 
> ClassLoaders; this field is strongly recommend in general for all 
> Serializable classes), but this project can add more value as there are a 
> large collection of potential bugs to look out for; below are a few examples 
> found.
> 
> * Number.valueOf vs Number.parse.  In many parts of the code we do 
> valueOf which returns a boxed value; we then unbox for the usage; this adds 
> more garbage that isn’t needed
> * Using Number.compareTo rather than primitive compare functions (causing 
> boxing)
> * Ignoring return value for functions that don’t have a side effect.  This 
> happens in a few cases where we are building a StringBuilder where we call 
> .toString but ignore the string… then call it later on
> * use of putIfAbsent without looking at the return.  This was found in 
> CacheService where we add the SSTable reader to the cache and assume we win 
> the race and start using it… rather than using the object that won the race



Re: [DISCUSSION] Add SpotBugs to the build

2022-11-07 Thread Derek Chen-Becker
I'm always in favor of having the compiler/runtime do more work for
us, but I guess in the interest of gauging impact to dev productivity,
does this add much overhead? I guess we'll need to discuss what it
finds after it runs, as well.

Cheers,

Derek


On Mon, Nov 7, 2022 at 4:10 PM David Capwell  wrote:
>
> I was thinking that it would be good to add SpotBugs 
> (https://spotbugs.github.io) into our build to help find bugs earlier in the 
> life cycle.  SpotBugs is LGPL but as it is used only in the build and not to 
> within this project, then this should be fine with Apache.
>
> The motivation for adding this was from CASSANDRA-17178; the Simulator has 
> issues with Serializable classes missing serialVersionUID (as we deal with 
> ClassLoaders; this field is strongly recommend in general for all 
> Serializable classes), but this project can add more value as there are a 
> large collection of potential bugs to look out for; below are a few examples 
> found.
>
> * Number.valueOf vs Number.parse.  In many parts of the code we do 
> valueOf which returns a boxed value; we then unbox for the usage; this adds 
> more garbage that isn’t needed
> * Using Number.compareTo rather than primitive compare functions (causing 
> boxing)
> * Ignoring return value for functions that don’t have a side effect.  This 
> happens in a few cases where we are building a StringBuilder where we call 
> .toString but ignore the string… then call it later on
> * use of putIfAbsent without looking at the return.  This was found in 
> CacheService where we add the SSTable reader to the cache and assume we win 
> the race and start using it… rather than using the object that won the race



--
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+


Re: [DISCUSSION] Add SpotBugs to the build

2022-11-07 Thread Dinesh Joshi
I would expect the role of SpotBugs to be advisory at first. After running it 
on the code and evaluating the number and type of violations we can decide 
which make sense to address and whether we would like it break the builds if it 
detects violations.

> 
> On Nov 7, 2022, at 4:45 PM, Derek Chen-Becker  wrote:
> 
> I'm always in favor of having the compiler/runtime do more work for
> us, but I guess in the interest of gauging impact to dev productivity,
> does this add much overhead? I guess we'll need to discuss what it
> finds after it runs, as well.
> 
> Cheers,
> 
> Derek
> 
> 
>> On Mon, Nov 7, 2022 at 4:10 PM David Capwell  wrote:
>> 
>> I was thinking that it would be good to add SpotBugs 
>> (https://spotbugs.github.io) into our build to help find bugs earlier in the 
>> life cycle.  SpotBugs is LGPL but as it is used only in the build and not to 
>> within this project, then this should be fine with Apache.
>> 
>> The motivation for adding this was from CASSANDRA-17178; the Simulator has 
>> issues with Serializable classes missing serialVersionUID (as we deal with 
>> ClassLoaders; this field is strongly recommend in general for all 
>> Serializable classes), but this project can add more value as there are a 
>> large collection of potential bugs to look out for; below are a few examples 
>> found.
>> 
>> * Number.valueOf vs Number.parse.  In many parts of the code we do 
>> valueOf which returns a boxed value; we then unbox for the usage; this adds 
>> more garbage that isn’t needed
>> * Using Number.compareTo rather than primitive compare functions (causing 
>> boxing)
>> * Ignoring return value for functions that don’t have a side effect.  This 
>> happens in a few cases where we are building a StringBuilder where we call 
>> .toString but ignore the string… then call it later on
>> * use of putIfAbsent without looking at the return.  This was found in 
>> CacheService where we add the SSTable reader to the cache and assume we win 
>> the race and start using it… rather than using the object that won the race
> 
> 
> 
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+


Re: [DISCUSSION] Add SpotBugs to the build

2022-11-07 Thread Brandon Williams
I suspect we'll have a -Dno-spotbugs flag as a counterpart to the
-Dno-checkstyle flag we have now.

Kind Regards,
Brandon

On Mon, Nov 7, 2022 at 6:45 PM Derek Chen-Becker  wrote:
>
> I'm always in favor of having the compiler/runtime do more work for
> us, but I guess in the interest of gauging impact to dev productivity,
> does this add much overhead? I guess we'll need to discuss what it
> finds after it runs, as well.
>
> Cheers,
>
> Derek
>
>
> On Mon, Nov 7, 2022 at 4:10 PM David Capwell  wrote:
> >
> > I was thinking that it would be good to add SpotBugs 
> > (https://spotbugs.github.io) into our build to help find bugs earlier in 
> > the life cycle.  SpotBugs is LGPL but as it is used only in the build and 
> > not to within this project, then this should be fine with Apache.
> >
> > The motivation for adding this was from CASSANDRA-17178; the Simulator has 
> > issues with Serializable classes missing serialVersionUID (as we deal with 
> > ClassLoaders; this field is strongly recommend in general for all 
> > Serializable classes), but this project can add more value as there are a 
> > large collection of potential bugs to look out for; below are a few 
> > examples found.
> >
> > * Number.valueOf vs Number.parse.  In many parts of the code we do 
> > valueOf which returns a boxed value; we then unbox for the usage; this adds 
> > more garbage that isn’t needed
> > * Using Number.compareTo rather than primitive compare functions (causing 
> > boxing)
> > * Ignoring return value for functions that don’t have a side effect.  This 
> > happens in a few cases where we are building a StringBuilder where we call 
> > .toString but ignore the string… then call it later on
> > * use of putIfAbsent without looking at the return.  This was found in 
> > CacheService where we add the SSTable reader to the cache and assume we win 
> > the race and start using it… rather than using the object that won the race
>
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+


Re: [DISCUSSION] Add SpotBugs to the build

2022-11-07 Thread Berenguer Blasi
+1 to static code analysis and hoping it's smart enough to check only 
changed files instead of adding 30s to each build by checking everything 
:fingerscrossed:


On 8/11/22 2:03, Brandon Williams wrote:

I suspect we'll have a -Dno-spotbugs flag as a counterpart to the
-Dno-checkstyle flag we have now.

Kind Regards,
Brandon

On Mon, Nov 7, 2022 at 6:45 PM Derek Chen-Becker  wrote:

I'm always in favor of having the compiler/runtime do more work for
us, but I guess in the interest of gauging impact to dev productivity,
does this add much overhead? I guess we'll need to discuss what it
finds after it runs, as well.

Cheers,

Derek


On Mon, Nov 7, 2022 at 4:10 PM David Capwell  wrote:

I was thinking that it would be good to add SpotBugs 
(https://spotbugs.github.io) into our build to help find bugs earlier in the 
life cycle.  SpotBugs is LGPL but as it is used only in the build and not to 
within this project, then this should be fine with Apache.

The motivation for adding this was from CASSANDRA-17178; the Simulator has 
issues with Serializable classes missing serialVersionUID (as we deal with 
ClassLoaders; this field is strongly recommend in general for all Serializable 
classes), but this project can add more value as there are a large collection 
of potential bugs to look out for; below are a few examples found.

* Number.valueOf vs Number.parse.  In many parts of the code we do 
valueOf which returns a boxed value; we then unbox for the usage; this adds more 
garbage that isn’t needed
* Using Number.compareTo rather than primitive compare functions (causing 
boxing)
* Ignoring return value for functions that don’t have a side effect.  This 
happens in a few cases where we are building a StringBuilder where we call 
.toString but ignore the string… then call it later on
* use of putIfAbsent without looking at the return.  This was found in 
CacheService where we add the SSTable reader to the cache and assume we win the 
race and start using it… rather than using the object that won the race



--
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+


Re: [DISCUSSION] Add SpotBugs to the build

2022-11-07 Thread Mick Semb Wever
Positive David!

Can I suggest building on top of a change like this:
https://github.com/apache/cassandra/compare/cassandra-4.1...thelastpickle:cassandra:mck/buildfiles-linter-docs/4.1

i.e.
- moving all the linter stuff out to .build/build-linter.xml
- standardising on the -Dno-xxx approach (as we got -Dxxx.skip in a few
places, but permitting both)
- adding broader skip flags like. -Dno-linter=