Re: [DISCUSSION] Cassandra's code style and source code analysis

2023-01-03 Thread Maxim Muzafarov
Folks,

Let me update the voting status and put together everything we have so
far. We definitely need more votes to have a solid foundation for this
change, so I encourage everyone to consider the options above and
share them in this thread.


Total for each applicable option:

4-th option -- 4 votes
3-rd option -- 3 votes
5-th option -- 1 vote
1-st option -- 0 votes
2-nd option -- 0 votes

On Thu, 22 Dec 2022 at 22:06, Mick Semb Wever  wrote:
>>
>>
>> 3. Total 5 groups, 2968 files to change
>>
>> ```
>> org.apache.cassandra.*
>> [blank line]
>> java.*
>> [blank line]
>> javax.*
>> [blank line]
>> all other imports
>> [blank line]
>> static all other imports
>> ```
>
>
>
> 3, then 5.
> There's lots under com.*, net.*, org.* that is essentially the same as "all 
> other imports", what's the reason to separate those?
>
> My preference for 3 is simply that imports are by default collapsed, and if I 
> expand them it's the dependencies on other cassandra stuff I'm first 
> grokking. It's also our only imports that lead to cyclic dependencies (which 
> we're not good at).


Re: [EXTERNAL] [DISCUSS] Taking another(other(other)) stab at performance testing

2023-01-03 Thread German Eichberger via dev
All,

This is a great idea and I am looking forward to it.

 Having dedicated consistent hardware is a good way to find regressions in the 
code but orthogonal to that is "certifying" new hardware to run with Cassandra, 
e.g. is there a performance regression when running on AMD? ARM64? What about 
more RAM? faster SSD?

What has limited us in perf testing in the past was some "representative" 
benchmark with clear recommendations so I am hoping that this work will produce 
a reference test suite with at least some hardware recommendation for the 
machine running the tests to make things more comparable. Additionally, some 
perf tests keep increasing the load until latency hits a certain threshold and 
others do some operations and measure how long it took. What types of tests 
where you aiming for?

The proposal also doesn't talk much about the test matrix. Will all supported 
Cassandra versions be tested with the same tests or will there be version 
specific tests?

I understand that we need to account for variances in configuration hardware 
but I am wondering if we can have more than just the sha. For example the 
complete cassandra.yaml for a test should be checked in as well - also we 
shoudl encourage people not to change too much from the reference test. 
Different hardware, different cassandra.yaml, and different tests will just 
create numbers which are hard to make sense of.

Really excited about this - thanks for the great work,
German


From: Josh McKenzie 
Sent: Friday, December 30, 2022 7:41 AM
To: dev 
Subject: [EXTERNAL] [DISCUSS] Taking another(other(other)) stab at performance 
testing

There was a really interesting presentation from the Lucene folks at ApacheCon 
about how they're doing perf regression testing. That combined with some recent 
contributors wanting to get involved on some performance work and not having 
much direction or clarity on how to get involved led some of us to come 
together and riff on what we might be able to take away from that presentation 
and context.

Lucene presentation: "Learning from 11+ years of Apache Lucene benchmarks": 
https://docs.google.com/presentation/d/1Tix2g7W5YoSFK8jRNULxOtqGQTdwQH3dpuBf4Kp4ouY/edit#slide=id.p

Their nightly indexing benchmark site: 
https://home.apache.org/~mikemccand/lucenebench/indexing.html

I've checked in with a handful of performance minded contributors in early 
December and we came up with a first draft, then some others of us met on an 
adhoc call on the 12/9 (which was recorded; ping on this thread if you'd like 
that linked - I believe Joey Lynch has that).

Here's where we landed after the discussions earlier this month (1st page, 
estimated reading time 5 minutes): 
https://docs.google.com/document/d/1X5C0dQdl6-oGRr9mXVPwAJTPjkS8lyt2Iz3hWTI4yIk/edit#

Curious to hear what other perspectives there are out there on the topic.

Early Happy New Years everyone!

~Josh




Re: [EXTERNAL] [DISCUSS] Taking another(other(other)) stab at performance testing

2023-01-03 Thread Josh McKenzie
> more things in reference test suite... increasing the load until latency 
> hits... operations and measures... test matrix... checking in complete 
> cassandra.yaml... different hardware... different tests...
All great things. For v2+. :)

Perf testing is a deep, deep rabbit hole. What's tripped us up in the past has 
(IMO) predominantly been due to us biting off more than we could chew to 
consensus. I immediately agree at face value with most of the things you've 
asked about in your reply but I think we'll need to build up to that and/or 
include some of that in the "community benchmarks" rather than "reference 
benchmarks" as outlined in the doc.

~Josh

On Tue, Jan 3, 2023, at 12:57 PM, German Eichberger via dev wrote:
> All,
> 
> This is a great idea and I am looking forward to it.
> 
>  Having dedicated consistent hardware is a good way to find regressions in 
> the code but orthogonal to that is "certifying" new hardware to run with 
> Cassandra, e.g. is there a performance regression when running on AMD? ARM64? 
> What about more RAM? faster SSD?
> 
> What has limited us in perf testing in the past was some "representative" 
> benchmark with clear recommendations so I am hoping that this work will 
> produce a reference test suite with at least some hardware recommendation for 
> the machine running the tests to make things more comparable. Additionally, 
> some perf tests keep increasing the load until latency hits a certain 
> threshold and others do some operations and measure how long it took. What 
> types of tests where you aiming for?
> 
> The proposal also doesn't talk much about the test matrix. Will all supported 
> Cassandra versions be tested with the same tests or will there be version 
> specific tests? 
> 
> I understand that we need to account for variances in configuration hardware 
> but I am wondering if we can have more than just the sha. For example the 
> complete cassandra.yaml for a test should be checked in as well - also we 
> shoudl encourage people not to change too much from the reference test. 
> Different hardware, different cassandra.yaml, and different tests will just 
> create numbers which are hard to make sense of.
> 
> Really excited about this - thanks for the great work,
> German
> 
> 
> 
> *From:* Josh McKenzie 
> *Sent:* Friday, December 30, 2022 7:41 AM
> *To:* dev 
> *Subject:* [EXTERNAL] [DISCUSS] Taking another(other(other)) stab at 
> performance testing 
>  
> There was a really interesting presentation from the Lucene folks at 
> ApacheCon about how they're doing perf regression testing. That combined with 
> some recent contributors wanting to get involved on some performance work and 
> not having much direction or clarity on how to get involved led some of us to 
> come together and riff on what we might be able to take away from that 
> presentation and context.
> 
> Lucene presentation: "Learning from 11+ years of Apache Lucene benchmarks": 
> https://docs.google.com/presentation/d/1Tix2g7W5YoSFK8jRNULxOtqGQTdwQH3dpuBf4Kp4ouY/edit#slide=id.p
>  
> 
> 
> Their nightly indexing benchmark site: 
> https://home.apache.org/~mikemccand/lucenebench/indexing.html 
> 
> 
> I've checked in with a handful of performance minded contributors in early 
> December and we came up with a first draft, then some others of us met on an 
> adhoc call on the 12/9 (which was recorded; ping on this thread if you'd like 
> that linked - I believe Joey Lynch has that).
> 
> Here's where we landed after the discussions earlier this month (1st page, 
> estimated reading time 5 minutes): 
> https://docs.google.com/document/d/1X5C0dQdl6-oGRr9mXVPwAJTPjkS8lyt2Iz3hWTI4yIk/edit#
>  
>