Re: CASSANDRA-14227 removing the 2038 limit

2022-10-18 Thread Berenguer Blasi

Hi,

apologies for the late reply as I have been OOO. I have done some 
profiling and results look virtually identical on trunk and 14227. I 
have attached some screenshots to the ticket 
https://issues.apache.org/jira/browse/CASSANDRA-14227. Unless my eyes 
are fooling me everything in the jfrs look the same.


Regards

On 30/9/22 9:44, Berenguer Blasi wrote:


Hi Benedict,

thanks for the reply! Yes some profiling is probably needed, then we 
can see if going down the delta encoding big refactor rabbit hole is 
worth it?


Let's see what other concerns people bring up.

Thx.

On 29/9/22 11:12, Benedict Elliott Smith wrote:
My only slight concern with this approach is the additional memory 
pressure. Since 64yrs should be plenty at any moment in time, I 
wonder if it wouldn’t be better to represent these times as deltas 
from the nowInSec being used to process the query. So, long math 
would only be used to normalise the times to this nowInSec (from 
whatever is stored in the sstable) within a method, and ints would be 
stored in memtables and any objects used for processing.


This might admittedly be more work, but I don’t believe it should be 
too challenging - we can introduce a method deletionTime(int 
nowInSec) that returns a long value by adding nowInSec to the 
deletionTime, and make the underlying value private, refactoring call 
sites?


On 29 Sep 2022, at 09:37, Berenguer Blasi  
wrote:


Hi all,

I have taken a stab in a PR you can find attached in the ticket. Mainly:

- I have moved deletion times, gc and nowInSec timestamps to long. 
That should get us past the 2038 limit.


- TTL is maxed now to 68y. Think CQL API compatibility and a sort of 
a 'free' guardrail.


- A new NONE overflow policy is the default but everything is 
backwards compatible by keeping the previous ones in place. Think 
upgrade scenarios or apps relying on the previous behavior.


- The new limit is around year 292,471,208,677 which sounds ok given 
the Sun will start collapsing in 3 to 5 billion years :-)


- Please feel free to drop by the ticket and take a look at the PR 
even if it's cursory


Thx in advance.



Looking for documentation tasks to work on

2022-10-18 Thread sharanf

Hi All

It was really good to be at ApacheCon in New Orleans and finally put 
some real faces to the names I've seen on the mailing list. And sitting 
in the Cassandra BOF session has made me even more motivated to help 
out. I've been thinking of getting involved and contributing to the docs 
so please can someone point me in the direction of something that needs 
doing around documentation :-)


Thanks
Sharan


New CircleCI test multiplexer

2022-10-18 Thread Andrés de la Peña
Just to let you know that CASSANDRA-17939 has just been committed.

It changes the way the CircleCI multiplexer works, in line with the recent
changes in our release criteria:

* The default number of repeated tests iterations is 500, except for long
and upgrade tests.
* It is possible to specify multiple test classes and methods to be
repeated into the same config push. So patches altering dozens of tests
won't require dozens of config pushes anymore.
* Running the .circleci/generate.sh script with -l/-m/-h flags will use git
diff to automatically detect the new or modified tests and will add them to
the lists of tests to be repeated. The pre-commit workflow will
automatically start repeated runs for these tests. The only exception to
this are Python dtests, that should be specified manually.
* The CircleCI jobs are rearranged so for every regular job there is a
companion job to run the repeated tests associated to that job. Those
companion jobs will only be visible if there are repeated tests to run. Here

is an example run with repeated tests for all the test suites, and here

is the same workflow without any repeated tests.

Some documentation on how to use it can be found here:
https://github.com/apache/cassandra/blob/trunk/.circleci/readme.md#running-tests-in-a-loop


Re: New CircleCI test multiplexer

2022-10-18 Thread Josh McKenzie
> * Running the .circleci/generate.sh script with -l/-m/-h flags will use git 
> diff to automatically detect the new or modified tests and will add them to 
> the lists of tests to be repeated. The pre-commit workflow will automatically 
> start repeated runs for these tests. The only exception to this are Python 
> dtests, that should be specified manually.
Of note: the -h profile should not be used (correct me if I'm wrong here 
Andres). Use -l for the free tier on circle or -m for paid.

Will have some follow up tickets regarding job naming, default config type, and 
updating documentation shortly.

On Tue, Oct 18, 2022, at 12:33 PM, Andrés de la Peña wrote:
> Just to let you know that CASSANDRA-17939 has just been committed. 
> 
> It changes the way the CircleCI multiplexer works, in line with the recent 
> changes in our release criteria:
> 
> * The default number of repeated tests iterations is 500, except for long and 
> upgrade tests.
> * It is possible to specify multiple test classes and methods to be repeated 
> into the same config push. So patches altering dozens of tests won't require 
> dozens of config pushes anymore.
> * Running the .circleci/generate.sh script with -l/-m/-h flags will use git 
> diff to automatically detect the new or modified tests and will add them to 
> the lists of tests to be repeated. The pre-commit workflow will automatically 
> start repeated runs for these tests. The only exception to this are Python 
> dtests, that should be specified manually.
> * The CircleCI jobs are rearranged so for every regular job there is a 
> companion job to run the repeated tests associated to that job. Those 
> companion jobs will only be visible if there are repeated tests to run. Here 
> 
>  is an example run with repeated tests for all the test suites, and here 
> 
>  is the same workflow without any repeated tests.
> 
> Some documentation on how to use it can be found here: 
> https://github.com/apache/cassandra/blob/trunk/.circleci/readme.md#running-tests-in-a-loop


Re: New CircleCI test multiplexer

2022-10-18 Thread Andrés de la Peña
The -h profile works but it spends a lot of resources for slightly faster
results. The -m profile is better value in terms of speed per resources. I
guess -h can be used if one wants to get results as soon as possible, no
matter the cost. Ekaterina might be better informed than me, given her work
on CASSANDRA-15712.

In any case, the new multiplexer doesn't change the resources configuration
at all. We might want to reevaluate that config in the future, and probably
follow David's suggestion of deciding parallelism as a function of the
number of tests to be run.

On Tue, 18 Oct 2022 at 18:13, Josh McKenzie  wrote:

> * Running the .circleci/generate.sh script with -l/-m/-h flags will use
> git diff to automatically detect the new or modified tests and will add
> them to the lists of tests to be repeated. The pre-commit workflow will
> automatically start repeated runs for these tests. The only exception to
> this are Python dtests, that should be specified manually.
>
> Of note: the -h profile should not be used (correct me if I'm wrong here
> Andres). Use -l for the free tier on circle or -m for paid.
>
> Will have some follow up tickets regarding job naming, default config
> type, and updating documentation shortly.
>
> On Tue, Oct 18, 2022, at 12:33 PM, Andrés de la Peña wrote:
>
> Just to let you know that CASSANDRA-17939 has just been committed.
>
> It changes the way the CircleCI multiplexer works, in line with the recent
> changes in our release criteria:
>
> * The default number of repeated tests iterations is 500, except for long
> and upgrade tests.
> * It is possible to specify multiple test classes and methods to be
> repeated into the same config push. So patches altering dozens of tests
> won't require dozens of config pushes anymore.
> * Running the .circleci/generate.sh script with -l/-m/-h flags will use
> git diff to automatically detect the new or modified tests and will add
> them to the lists of tests to be repeated. The pre-commit workflow will
> automatically start repeated runs for these tests. The only exception to
> this are Python dtests, that should be specified manually.
> * The CircleCI jobs are rearranged so for every regular job there is a
> companion job to run the repeated tests associated to that job. Those
> companion jobs will only be visible if there are repeated tests to run.
> Here
> 
> is an example run with repeated tests for all the test suites, and here
> 
> is the same workflow without any repeated tests.
>
> Some documentation on how to use it can be found here:
> https://github.com/apache/cassandra/blob/trunk/.circleci/readme.md#running-tests-in-a-loop
>
>
>


Re: New CircleCI test multiplexer

2022-10-18 Thread Ekaterina Dimitrova
First I want to take a moment to say thanks to Andres  and Josh for this
new improvement. I truly believe it will save us a lot of time and efforts
in the flaky tests fight

Pre-CASSANDRA-15712 there was default config for free tier and HIGHRES
option(bumping all resources to max).
We added MIDRES as a balanced  version which gives you the chance to be
able to run all tests but with less resources. What do I mean? It is still
to be used by people with paid accounts but who want to consider the
cost/resources spent. It was based on experimental runs which proved that
with half the resources(compared to max out) for some suites we get similar
and reasonable time to run for those test suites.
 One confusion I want to clear here - MIDRES doesn’t mean we will use
medium containers but mid/less than the HIGHRES resources. Some Python
tests cannot be run with medium containers, for example.
Now whether some people still use HIGHRES or not I can neither confirm nor
deny. I guess we can make a survey on that.
I think improvement in choosing parallelism dynamically will be probably a
great improvement too. We need to test. But  if I recall correctly the
moment we bump container size is when we bump to use way more credits. To
be double-checked though, it’s been a while. Hope this info helps.
On the topic of docs - I have one draft version for CircleCI
doc/instructions but it has to be updated with all latest developments. I
created it last year but never published as there was always something
about to happen and for me to have to wait and update it accordingly. I can
share with anyone willing to finish it so they don’t have to start from
scratch. I don’t think I will have the time to do it in the next few weeks
at least.


On Tue, 18 Oct 2022 at 13:30, Andrés de la Peña 
wrote:

> The -h profile works but it spends a lot of resources for slightly faster
> results. The -m profile is better value in terms of speed per resources. I
> guess -h can be used if one wants to get results as soon as possible, no
> matter the cost. Ekaterina might be better informed than me, given her work
> on CASSANDRA-15712.
>
> In any case, the new multiplexer doesn't change the resources
> configuration at all. We might want to reevaluate that config in the
> future, and probably follow David's suggestion of deciding parallelism as a
> function of the number of tests to be run.
>
> On Tue, 18 Oct 2022 at 18:13, Josh McKenzie  wrote:
>
>> * Running the .circleci/generate.sh script with -l/-m/-h flags will use
>> git diff to automatically detect the new or modified tests and will add
>> them to the lists of tests to be repeated. The pre-commit workflow will
>> automatically start repeated runs for these tests. The only exception to
>> this are Python dtests, that should be specified manually.
>>
>> Of note: the -h profile should not be used (correct me if I'm wrong here
>> Andres). Use -l for the free tier on circle or -m for paid.
>>
>> Will have some follow up tickets regarding job naming, default config
>> type, and updating documentation shortly.
>>
>> On Tue, Oct 18, 2022, at 12:33 PM, Andrés de la Peña wrote:
>>
>> Just to let you know that CASSANDRA-17939 has just been committed.
>>
>> It changes the way the CircleCI multiplexer works, in line with the
>> recent changes in our release criteria:
>>
>> * The default number of repeated tests iterations is 500, except for long
>> and upgrade tests.
>> * It is possible to specify multiple test classes and methods to be
>> repeated into the same config push. So patches altering dozens of tests
>> won't require dozens of config pushes anymore.
>> * Running the .circleci/generate.sh script with -l/-m/-h flags will use
>> git diff to automatically detect the new or modified tests and will add
>> them to the lists of tests to be repeated. The pre-commit workflow will
>> automatically start repeated runs for these tests. The only exception to
>> this are Python dtests, that should be specified manually.
>> * The CircleCI jobs are rearranged so for every regular job there is a
>> companion job to run the repeated tests associated to that job. Those
>> companion jobs will only be visible if there are repeated tests to run.
>> Here
>> 
>> is an example run with repeated tests for all the test suites, and here
>> 
>> is the same workflow without any repeated tests.
>>
>> Some documentation on how to use it can be found here:
>> https://github.com/apache/cassandra/blob/trunk/.circleci/readme.md#running-tests-in-a-loop
>>
>>
>>


Re: Looking for documentation tasks to work on

2022-10-18 Thread Josh McKenzie
Hey Sharan! Documentation is near and dear to my heart; the longer I spend on 
this project the more I think it's one of the higher leverage things we can 
invest our energy into. A general high level view of open documentation work in 
the project in reverse key order can be found here: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20cassandra%20and%20resolution%20%3D%20unresolved%20and%20summary%20~%20%22documentation%22%20order%20by%20issuekey%20desc

Specifically I have my eye on our CI and contribution process and should have a 
variety of documentation tickets opening up based on some of the changes Andres 
made in CASSANDRA-17939 and some follow up work (see comment here if curious: 
https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17617880&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17617880)

If I'm not mistaken Derek said he ran into some dead / not working links in the 
how to contribute structure (link: 
https://cassandra.apache.org/_/development/index.html) - @Derek were you taking 
that or is that part of the above workload?)

And at the risk of flooding you w/too many options, last but not least is a 
pretty straightforward resurrection of some documentation that got dropped for 
the Denylisting functionality 
(https://issues.apache.org/jira/browse/CASSANDRA-17547) - you can ping Jordan 
West on ASF slack to see if he's active on that ticket. Of all the issues that 
may be one of the more straightforward.

Hit me up on slack with any questions and glad to have you here!

~Josh

On Tue, Oct 18, 2022, at 6:28 AM, sharanf wrote:
> Hi All
> 
> It was really good to be at ApacheCon in New Orleans and finally put 
> some real faces to the names I've seen on the mailing list. And sitting 
> in the Cassandra BOF session has made me even more motivated to help 
> out. I've been thinking of getting involved and contributing to the docs 
> so please can someone point me in the direction of something that needs 
> doing around documentation :-)
> 
> Thanks
> Sharan
> 


Re: New CircleCI test multiplexer

2022-10-18 Thread Josh McKenzie
Thanks for the context Ekaterina. My Grand Plan is to get whatever wisdom David 
encoded into his script in terms of job bucketing and parallelism into the 
mainline circle config and all of us benefit from that.

On Tue, Oct 18, 2022, at 2:17 PM, Ekaterina Dimitrova wrote:
> First I want to take a moment to say thanks to Andres  and Josh for this new 
> improvement. I truly believe it will save us a lot of time and efforts in the 
> flaky tests fight
> 
> Pre-CASSANDRA-15712 there was default config for free tier and HIGHRES 
> option(bumping all resources to max). 
> We added MIDRES as a balanced  version which gives you the chance to be able 
> to run all tests but with less resources. What do I mean? It is still to be 
> used by people with paid accounts but who want to consider the cost/resources 
> spent. It was based on experimental runs which proved that with half the 
> resources(compared to max out) for some suites we get similar and reasonable 
> time to run for those test suites.
>  One confusion I want to clear here - MIDRES doesn’t mean we will use medium 
> containers but mid/less than the HIGHRES resources. Some Python tests cannot 
> be run with medium containers, for example. 
> Now whether some people still use HIGHRES or not I can neither confirm nor 
> deny. I guess we can make a survey on that. 
> I think improvement in choosing parallelism dynamically will be probably a 
> great improvement too. We need to test. But  if I recall correctly the moment 
> we bump container size is when we bump to use way more credits. To be 
> double-checked though, it’s been a while. Hope this info helps.
> On the topic of docs - I have one draft version for CircleCI doc/instructions 
> but it has to be updated with all latest developments. I created it last year 
> but never published as there was always something about to happen and for me 
> to have to wait and update it accordingly. I can share with anyone willing to 
> finish it so they don’t have to start from scratch. I don’t think I will have 
> the time to do it in the next few weeks at least. 
> 
> 
> On Tue, 18 Oct 2022 at 13:30, Andrés de la Peña  wrote:
>> The -h profile works but it spends a lot of resources for slightly faster 
>> results. The -m profile is better value in terms of speed per resources. I 
>> guess -h can be used if one wants to get results as soon as possible, no 
>> matter the cost. Ekaterina might be better informed than me, given her work 
>> on CASSANDRA-15712.
>> 
>> In any case, the new multiplexer doesn't change the resources configuration 
>> at all. We might want to reevaluate that config in the future, and probably 
>> follow David's suggestion of deciding parallelism as a function of the 
>> number of tests to be run.
>> 
>> On Tue, 18 Oct 2022 at 18:13, Josh McKenzie  wrote:
>>> __
 * Running the .circleci/generate.sh script with -l/-m/-h flags will use 
 git diff to automatically detect the new or modified tests and will add 
 them to the lists of tests to be repeated. The pre-commit workflow will 
 automatically start repeated runs for these tests. The only exception to 
 this are Python dtests, that should be specified manually.
>>> Of note: the -h profile should not be used (correct me if I'm wrong here 
>>> Andres). Use -l for the free tier on circle or -m for paid.
>>> 
>>> Will have some follow up tickets regarding job naming, default config type, 
>>> and updating documentation shortly.
>>> 
>>> On Tue, Oct 18, 2022, at 12:33 PM, Andrés de la Peña wrote:
 Just to let you know that CASSANDRA-17939 has just been committed. 
 
 It changes the way the CircleCI multiplexer works, in line with the recent 
 changes in our release criteria:
 
 * The default number of repeated tests iterations is 500, except for long 
 and upgrade tests.
 * It is possible to specify multiple test classes and methods to be 
 repeated into the same config push. So patches altering dozens of tests 
 won't require dozens of config pushes anymore.
 * Running the .circleci/generate.sh script with -l/-m/-h flags will use 
 git diff to automatically detect the new or modified tests and will add 
 them to the lists of tests to be repeated. The pre-commit workflow will 
 automatically start repeated runs for these tests. The only exception to 
 this are Python dtests, that should be specified manually.
 * The CircleCI jobs are rearranged so for every regular job there is a 
 companion job to run the repeated tests associated to that job. Those 
 companion jobs will only be visible if there are repeated tests to run. 
 Here 
 
  is an example run with repeated tests for all the test suites, and here 
 
>>>

Re: New CircleCI test multiplexer

2022-10-18 Thread Ekaterina Dimitrova
I am not familiar with the content of his scripts but if they are helping
to balance even more our resource usage without breaking the bank and
failing the environment, all is more than welcome. Thank you!

On Tue, 18 Oct 2022 at 16:40, Josh McKenzie  wrote:

> Thanks for the context Ekaterina. My Grand Plan is to get whatever wisdom
> David encoded into his script in terms of job bucketing and parallelism
> into the mainline circle config and all of us benefit from that.
>
> On Tue, Oct 18, 2022, at 2:17 PM, Ekaterina Dimitrova wrote:
>
> First I want to take a moment to say thanks to Andres  and Josh for this
> new improvement. I truly believe it will save us a lot of time and efforts
> in the flaky tests fight
>
> Pre-CASSANDRA-15712 there was default config for free tier and HIGHRES
> option(bumping all resources to max).
> We added MIDRES as a balanced  version which gives you the chance to be
> able to run all tests but with less resources. What do I mean? It is still
> to be used by people with paid accounts but who want to consider the
> cost/resources spent. It was based on experimental runs which proved that
> with half the resources(compared to max out) for some suites we get similar
> and reasonable time to run for those test suites.
>  One confusion I want to clear here - MIDRES doesn’t mean we will use
> medium containers but mid/less than the HIGHRES resources. Some Python
> tests cannot be run with medium containers, for example.
> Now whether some people still use HIGHRES or not I can neither confirm nor
> deny. I guess we can make a survey on that.
> I think improvement in choosing parallelism dynamically will be probably a
> great improvement too. We need to test. But  if I recall correctly the
> moment we bump container size is when we bump to use way more credits. To
> be double-checked though, it’s been a while. Hope this info helps.
> On the topic of docs - I have one draft version for CircleCI
> doc/instructions but it has to be updated with all latest developments. I
> created it last year but never published as there was always something
> about to happen and for me to have to wait and update it accordingly. I can
> share with anyone willing to finish it so they don’t have to start from
> scratch. I don’t think I will have the time to do it in the next few weeks
> at least.
>
>
> On Tue, 18 Oct 2022 at 13:30, Andrés de la Peña 
> wrote:
>
> The -h profile works but it spends a lot of resources for slightly faster
> results. The -m profile is better value in terms of speed per resources. I
> guess -h can be used if one wants to get results as soon as possible, no
> matter the cost. Ekaterina might be better informed than me, given her work
> on CASSANDRA-15712.
>
> In any case, the new multiplexer doesn't change the resources
> configuration at all. We might want to reevaluate that config in the
> future, and probably follow David's suggestion of deciding parallelism as a
> function of the number of tests to be run.
>
> On Tue, 18 Oct 2022 at 18:13, Josh McKenzie  wrote:
>
>
> * Running the .circleci/generate.sh script with -l/-m/-h flags will use
> git diff to automatically detect the new or modified tests and will add
> them to the lists of tests to be repeated. The pre-commit workflow will
> automatically start repeated runs for these tests. The only exception to
> this are Python dtests, that should be specified manually.
>
> Of note: the -h profile should not be used (correct me if I'm wrong here
> Andres). Use -l for the free tier on circle or -m for paid.
>
> Will have some follow up tickets regarding job naming, default config
> type, and updating documentation shortly.
>
> On Tue, Oct 18, 2022, at 12:33 PM, Andrés de la Peña wrote:
>
> Just to let you know that CASSANDRA-17939 has just been committed.
>
> It changes the way the CircleCI multiplexer works, in line with the recent
> changes in our release criteria:
>
> * The default number of repeated tests iterations is 500, except for long
> and upgrade tests.
> * It is possible to specify multiple test classes and methods to be
> repeated into the same config push. So patches altering dozens of tests
> won't require dozens of config pushes anymore.
> * Running the .circleci/generate.sh script with -l/-m/-h flags will use
> git diff to automatically detect the new or modified tests and will add
> them to the lists of tests to be repeated. The pre-commit workflow will
> automatically start repeated runs for these tests. The only exception to
> this are Python dtests, that should be specified manually.
> * The CircleCI jobs are rearranged so for every regular job there is a
> companion job to run the repeated tests associated to that job. Those
> companion jobs will only be visible if there are repeated tests to run.
> Here
> 
> is an example run with repeated tests for all the test suites, and here
> 

New episode of The Apache Cassandra(R) Corner

2022-10-18 Thread Aaron Ploetz
Link to next episode:

Ep11 - Mick Semb Weaver (Apache Cassandra PMC Chair)

https://drive.google.com/file/d/16bSw-hJVVOkIzTEg86KYUMT9g9Nm-X5B/view?usp=sharing

(You may have to download it to listen)

It will remain in staging for 72 hours, going live (assuming no objections)
by Friday (night), October 14th.

If anyone should have any questions, comments, or if you want to be a
guest, please reach out to me.

Thanks, everyone!

Aaron


Re: Shall 4.2 become 5.0 ?

2022-10-18 Thread Mick Semb Wever
>
>
> So the TL;DR here is that you (Mick) now also agree we should move the
>> version to 5.0?  I haven’t seen any other arguments for staying on 4.2, so
>> should we just move the version number to 5.0 now?  Do we want to have a
>> VOTE thread for it?  Or should we just do it?
>>
>
> I was never against 5.0, and I see no need for a vote –  no objection in
> this thread for 5.0.
>


Moved to CASSANDRA-17973