Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-12 Thread Dinesh Joshi
I can certainly start a VOTE thread for the CQL syntax addition. There
hasn't been any feedback that suggests that there is an unaddressed
concern to the changes we are making.

That said, I'm not sure if there was explicit decision that has resulted
in an update to the project's governance to reflect this requirement? If
there is I seem to have missed it. There was a discussion in the past
about notifying the dev list to ensure there is visibility to changes
but I don't recall whether there was an explicit voting requirement.

On 7/11/23 19:17, Yuki Morishita wrote:
>> folks - I think we’ve achieved lazy consensus here. Please continue
> with feedback on the jira.
> 
> Hi Dinesh,
> 
> As Jeremiah commented on JIRA, shouldn't we have a vote in the ML?
> 
> For the future reference, in my opinion, adding new CQL syntax should
> have a CEP as it is not something we can easily change once defined.


Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the release process

2023-07-12 Thread Miklosovic, Stefan
CEP is a great idea. The devil is in details and while this looks cool, it will 
definitely not hurt to have the nuances ironed out.


From: Patrick McFadin 
Sent: Tuesday, July 11, 2023 2:24
To: dev@cassandra.apache.org; German Eichberger
Subject: Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the 
release process

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



I would say it helps a lot of people. 45k downloads in just last month: 
https://pypistats.org/packages/cqlsh

I feel like a CEP would be in order, along the lines of CEP-8: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation

Unless anyone objects, I can help you get the CEP together and we can get a 
vote, then a JIRA in place for any changes in trunk.

Patrick

On Mon, Jul 10, 2023 at 4:58 PM German Eichberger via dev 
mailto:dev@cassandra.apache.org>> wrote:
Same - really appreciate those efforts and also welcome the upstreaming and 
release automation...

German

From: Jeff Widman mailto:j...@jeffwidman.com>>
Sent: Sunday, July 9, 2023 1:44 PM
To: Max C. mailto:mc_cassand...@core43.com>>
Cc: dev@cassandra.apache.org 
mailto:dev@cassandra.apache.org>>; Brad Schoening 
mailto:bscho...@gmail.com>>
Subject: [EXTERNAL] Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as 
part of the release process

You don't often get email from j...@jeffwidman.com. 
Learn why this is important
Thanks Max, always encouraging to hear that the time I spend on open source is 
helping others.

Your use case is very similar to what drove my original desire to get involved 
with the project. Being able to `pip install cqlsh` from a dev machine was so 
much lighter weight than the alternatives.

Anyone else care to weigh in on this?

What are the next steps to move to a decision?

Cheers,
Jeff

On Sat, Jul 8, 2023, 7:23 PM Max C. 
mailto:mc_cassand...@core43.com>> wrote:

As a user, I really appreciate your efforts Jeff & Brad.  I would *love* for 
the C* project to officially support this.

In our environment we have a lot of client machines that all share common NFS 
mounted directories.  It's much easier for us to create a Python virtual 
environment on a file server with the cqlsh PyPI package installed than it is 
to install the Cassandra RPMs on every single machine.  Before I discovered 
your PyPI package, our developers would need to login to  a Cassandra node in 
order to run cqlsh.  The cqlsh PyPI package, however, is in our standard 
"python dev tools" virtual environment -- along with Ansible, black, isort and 
various other Python packages; which means it's accessible to everyone, 
everywhere.

I agree that this should not replace packaging cqlsh in the Cassandra RPM, so 
much provide an additional option for installing cqlsh without the baggage of 
installing the full Cassandra package.

Thanks again for your work Jeff & Brad.

- Max

On 7/6/2023 5:55 PM, Jeff Widman wrote:
Myself and Brad Schoening currently maintain 
https://pypi.org/project/cqlsh/ which 
repackages CQLSH that ships with every Cassandra release.

This way:

  *   anyone who wants a lightweight client to talk to a remote cassandra can 
simply `pip install cqlsh` without having to download the full cassandra 
source, unzip it, etc.
  *   it's very easy for folks to use it as scaffolding in their python 
scripts/tooling since they can simply include it in the list of their required 
dependencies.

We currently handle the packaging by waiting for a release, then manually 
copy/pasting the code out of the cassandra source tree into 
https://github.com/jeffwidman/cqlsh which 
has some additional build/python package configuration files, then using 
standard python tooling to publish to PyPI.

Given that our project is simply a build/packaging project, I wanted to start a 
conversation about upstreaming this into core Cassandra. I realize that 
Cassandra has no interest in maintaining lots of build targets... but given 
that cqlsh is written in Python and publishing to PyPI enables DBA's to share 
more complicated tooling built on top of it this seems like a natural fit for 
core cassandra rather than a standalone project.

Goal:
When a Cassandra release happens, the build/release process automatically 
publishes cqlsh to 
https://pypi.org/project/cqlsh/.

Non-Goal: This is _not_ about having cassandra itself rely on PyPI. There was 
some initial chatter about that in 
https://issues.apache.org/jira/browse/CASSANDRA-18654

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Jacek Lewandowski
Isn't novnodes a special case of vnodes with n=1 ?

We should rather select a subset of tests for which it makes sense to run
with different configurations.

The set of configurations against which we run the tests currently is still
only the subset of all possible cases.
I could ask - why don't run dtests w/wo sstable compression x w/wo
internode encryption x w/wo vnodes,
w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I
think this is a matter of cost vs result.
This equation contains the likelihood of failure in configuration X, given
there was no failure in the default
configuration, the cost of running those tests, the time we delay merging,
the likelihood that we wait for
the test results so long that our branch diverge and we will have to rerun
them or accept the fact that we merge
a code which was tested on outdated base. Eventually, the overall new
contributors experience - whether they
want to participate in the future.



śr., 12 lip 2023 o 07:24 Berenguer Blasi 
napisał(a):

> On our 4.0 release I remember a number of such failures but not recently.
> What is more common though is packaging errors,
> cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade tests,
> being less responsive post-commit as you already moved on,... Either the
> smoke pre-commit has approval steps for everything or we should give imo a
> devBranch alike job to the dev pre-commit. I find it terribly useful. My
> 2cts.
> On 11/7/23 18:26, Josh McKenzie wrote:
>
> 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: at
> reviewer's discretion
>
> In general, maybe offering a dev the option of choosing either "pre-commit
> smoke" or "post-commit full" at their discretion for any work would be the
> right play.
>
> A follow-on thought: even with something as significant as Accord, TCM,
> Trie data structures, etc - I'd be a bit surprised to see tests fail on
> JDK17 that didn't on 11, or with vs. without vnodes, in ways that weren't
> immediately clear the patch stumbled across something surprising and was
> immediately trivially attributable if not fixable. *In theory* the things
> we're talking about excluding from the pre-commit smoke test suite are all
> things that are supposed to be identical across environments and thus
> opaque / interchangeable by default (JDK version outside checking build
> which we will, vnodes vs. non, etc).
>
> Has that not proven to be the case in your experience?
>
> On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote:
>
> A strong +1 to getting to a single CI system. CircleCI definitely has some
> niceties and I understand why it's currently used, but right now we get 2
> CI systems for twice the price. +1 on the proposed subsets.
>
> Derek
>
> On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie 
> wrote:
>
>
> I'm personally not thinking about CircleCI at all; I'm envisioning a world
> where all of us have 1 CI *software* system (i.e. reproducible on any
> env) that we use for pre-commit validation, and then post-commit happens on
> reference ASF hardware.
>
> So:
> 1: Pre-commit subset of tests (suites + matrices + env) runs. On green,
> merge.
> 2: Post-commit tests (all suites, matrices, env) runs. If failure, link
> back to the JIRA where the commit took place
>
> Circle would need to remain in lockstep with the requirements for point 1
> here.
>
> On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:
>
> +1 to Josh which is exactly my line of thought as well. But that is only
> valid if we have a solid Jenkins that will eventually run all test configs.
> So I think I lost track a bit here. Are you proposing:
>
> 1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD)
> config of tests
>
> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you in
> case of problems?
>
> Or sthg different like having 1 also in Jenkins?
> On 7/7/23 17:55, Andrés de la Peña wrote:
>
> I think 500 runs combining all configs could be reasonable, since it's
> unlikely to have config-specific flaky tests. As in five configs with 100
> repetitions each.
>
> On Fri, 7 Jul 2023 at 16:14, Josh McKenzie  wrote:
>
> Maybe. Kind of depends on how long we write our tests to run doesn't it? :)
>
> But point taken. Any non-trivial test would start to be something of a
> beast under this approach.
>
> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:
>
> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie 
> wrote:
> > 3. Multiplexed tests (changed, added) run against all JDK's and a
> broader range of configs (no-vnode, vnode default, compression, etc)
>
> I think this is going to be too heavy...we're taking 500 iterations
> and multiplying that by like 4 or 5?
>
>
>
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Ekaterina Dimitrova
“ On our 4.0 release I remember a number of such failures but not recently.
 ”

Based on all the 5.0 work I’d say we need as a minimum to build and start a
node with all JDK versions pre-commit.

On Wed, 12 Jul 2023 at 7:29, Jacek Lewandowski 
wrote:

> Isn't novnodes a special case of vnodes with n=1 ?
>
> We should rather select a subset of tests for which it makes sense to run
> with different configurations.
>
> The set of configurations against which we run the tests currently is
> still only the subset of all possible cases.
> I could ask - why don't run dtests w/wo sstable compression x w/wo
> internode encryption x w/wo vnodes,
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I
> think this is a matter of cost vs result.
> This equation contains the likelihood of failure in configuration X, given
> there was no failure in the default
> configuration, the cost of running those tests, the time we delay merging,
> the likelihood that we wait for
> the test results so long that our branch diverge and we will have to rerun
> them or accept the fact that we merge
> a code which was tested on outdated base. Eventually, the overall new
> contributors experience - whether they
> want to participate in the future.
>
>
>
> śr., 12 lip 2023 o 07:24 Berenguer Blasi 
> napisał(a):
>
>> On our 4.0 release I remember a number of such failures but not recently.
>> What is more common though is packaging errors,
>> cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade tests,
>> being less responsive post-commit as you already moved on,... Either the
>> smoke pre-commit has approval steps for everything or we should give imo a
>> devBranch alike job to the dev pre-commit. I find it terribly useful. My
>> 2cts.
>> On 11/7/23 18:26, Josh McKenzie wrote:
>>
>> 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: at
>> reviewer's discretion
>>
>> In general, maybe offering a dev the option of choosing either
>> "pre-commit smoke" or "post-commit full" at their discretion for any work
>> would be the right play.
>>
>> A follow-on thought: even with something as significant as Accord, TCM,
>> Trie data structures, etc - I'd be a bit surprised to see tests fail on
>> JDK17 that didn't on 11, or with vs. without vnodes, in ways that weren't
>> immediately clear the patch stumbled across something surprising and was
>> immediately trivially attributable if not fixable. *In theory* the
>> things we're talking about excluding from the pre-commit smoke test suite
>> are all things that are supposed to be identical across environments and
>> thus opaque / interchangeable by default (JDK version outside checking
>> build which we will, vnodes vs. non, etc).
>>
>> Has that not proven to be the case in your experience?
>>
>> On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote:
>>
>> A strong +1 to getting to a single CI system. CircleCI definitely has
>> some niceties and I understand why it's currently used, but right now we
>> get 2 CI systems for twice the price. +1 on the proposed subsets.
>>
>> Derek
>>
>> On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie 
>> wrote:
>>
>>
>> I'm personally not thinking about CircleCI at all; I'm envisioning a
>> world where all of us have 1 CI *software* system (i.e. reproducible on
>> any env) that we use for pre-commit validation, and then post-commit
>> happens on reference ASF hardware.
>>
>> So:
>> 1: Pre-commit subset of tests (suites + matrices + env) runs. On green,
>> merge.
>> 2: Post-commit tests (all suites, matrices, env) runs. If failure, link
>> back to the JIRA where the commit took place
>>
>> Circle would need to remain in lockstep with the requirements for point 1
>> here.
>>
>> On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:
>>
>> +1 to Josh which is exactly my line of thought as well. But that is only
>> valid if we have a solid Jenkins that will eventually run all test configs.
>> So I think I lost track a bit here. Are you proposing:
>>
>> 1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD)
>> config of tests
>>
>> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you
>> in case of problems?
>>
>> Or sthg different like having 1 also in Jenkins?
>> On 7/7/23 17:55, Andrés de la Peña wrote:
>>
>> I think 500 runs combining all configs could be reasonable, since it's
>> unlikely to have config-specific flaky tests. As in five configs with 100
>> repetitions each.
>>
>> On Fri, 7 Jul 2023 at 16:14, Josh McKenzie  wrote:
>>
>> Maybe. Kind of depends on how long we write our tests to run doesn't it?
>> :)
>>
>> But point taken. Any non-trivial test would start to be something of a
>> beast under this approach.
>>
>> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:
>>
>> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie 
>> wrote:
>> > 3. Multiplexed tests (changed, added) run against all JDK's and a
>> broader range of configs (no-vnode, vnode default, compression, et

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Josh McKenzie
(This response ended up being a bit longer than intended; sorry about that)

> What is more common though is packaging errors,
> cdc/compression/system_ks_directory targeted fixes, CI w/wo
> upgrade tests, being less responsive post-commit as you already
> moved on
*Two that ***should ***be resolved in the new regime:**
*
* Packaging errors should be caught pre as we're making the artifact builds 
part of pre-commit.
* I'm hoping to merge the commit log segment allocation so CDC allocator is the 
only one for 5.0 (and just bypasses the cdc-related work on allocation if it's 
disabled thus not impacting perf); the existing targeted testing of cdc 
specific functionality should be sufficient to confirm its correctness as it 
doesn't vary from the primary allocation path when it comes to mutation space 
in the buffer
* Upgrade tests are going to be part of the pre-commit suite

*Outstanding issues:**
*
* compression. If we just run with defaults we won't test all cases so errors 
could pop up here
* system_ks_directory related things: is this still ongoing or did we have a 
transient burst of these types of issues? And would we expect these to vary 
based on different JDK's, non-default configurations, etc?
* Being less responsive post-commit: My only ideas here are a combination of 
the jenkins_jira_integration 

 script updating the JIRA ticket with test results if you cause a regression + 
us building a muscle around reverting your commit if they break tests.

To quote Jacek:
> why don't run dtests w/wo sstable compression x w/wo internode encryption x 
> w/wo vnodes, 
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I 
> think this is a matter of cost vs result. 

I think we've organically made these decisions and tradeoffs in the past 
without being methodical about it. If we can:
1. Multiplex changed or new tests
2. Tighten the feedback loop of "tests were green, now they're *consistently* 
not, you're the only one who changed something", and
3. Instill a culture of "if you can't fix it immediately revert your commit"

Then I think we'll only be vulnerable to flaky failures introduced across 
different non-default configurations as side effects in tests that aren't 
touched, which *intuitively* feels like a lot less than we're facing today. We 
could even get clever as a day 2 effort and define packages in the primary 
codebase where changes take place and multiplex (on a smaller scale) their 
respective packages of unit tests in the future if we see problems in this area.

Flakey tests are a giant pain in the ass and a huge drain on productivity, 
don't get me wrong. *And* we have to balance how much cost we're paying before 
each commit with the benefit we expect to gain from that. I don't take the past 
as strongly indicative of the future here since we've been allowing circle to 
validate pre-commit and haven't been multiplexing.

Does the above make sense? Are there things you've seen in the trenches that 
challenge or invalidate any of those perspectives?

On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote:
> Isn't novnodes a special case of vnodes with n=1 ?
> 
> We should rather select a subset of tests for which it makes sense to run 
> with different configurations. 
> 
> The set of configurations against which we run the tests currently is still 
> only the subset of all possible cases. 
> I could ask - why don't run dtests w/wo sstable compression x w/wo internode 
> encryption x w/wo vnodes, 
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I 
> think this is a matter of cost vs result. 
> This equation contains the likelihood of failure in configuration X, given 
> there was no failure in the default 
> configuration, the cost of running those tests, the time we delay merging, 
> the likelihood that we wait for 
> the test results so long that our branch diverge and we will have to rerun 
> them or accept the fact that we merge 
> a code which was tested on outdated base. Eventually, the overall new 
> contributors experience - whether they 
> want to participate in the future.
> 
> 
> 
> śr., 12 lip 2023 o 07:24 Berenguer Blasi  
> napisał(a):
>> On our 4.0 release I remember a number of such failures but not recently. 
>> What is more common though is packaging errors, 
>> cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade tests, 
>> being less responsive post-commit as you already moved on,... Either the 
>> smoke pre-commit has approval steps for everything or we should give imo a 
>> devBranch alike job to the dev pre-commit. I find it terribly useful. My 
>> 2cts.
>> 
>> On 11/7/23 18:26, Josh McKenzie wrote:
 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: at 
 reviewer's discretion
>>> In general, maybe offering a dev the option of choosing either "pre-commit 
>>> smoke" 

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Ekaterina Dimitrova
jenkins_jira_integration

script
updating the JIRA ticket with test results if you cause a regression + us
building a muscle around reverting your commit if they break tests.“

I am not sure people finding the time to fix their breakages will be solved
but at least they will be pinged automatically. Hopefully many follow Jira
updates.

“  I don't take the past as strongly indicative of the future here since
we've been allowing circle to validate pre-commit and haven't been
multiplexing.”
I am interested to compare how many tickets for flaky tests we will have
pre-5.0 now compared to pre-4.1.


On Wed, 12 Jul 2023 at 8:41, Josh McKenzie  wrote:

> (This response ended up being a bit longer than intended; sorry about that)
>
> What is more common though is packaging errors,
> cdc/compression/system_ks_directory targeted fixes, CI w/wo
> upgrade tests, being less responsive post-commit as you already
> moved on
>
> *Two that **should **be resolved in the new regime:*
> * Packaging errors should be caught pre as we're making the artifact
> builds part of pre-commit.
> * I'm hoping to merge the commit log segment allocation so CDC allocator
> is the only one for 5.0 (and just bypasses the cdc-related work on
> allocation if it's disabled thus not impacting perf); the existing targeted
> testing of cdc specific functionality should be sufficient to confirm its
> correctness as it doesn't vary from the primary allocation path when it
> comes to mutation space in the buffer
> * Upgrade tests are going to be part of the pre-commit suite
>
> *Outstanding issues:*
> * compression. If we just run with defaults we won't test all cases so
> errors could pop up here
> * system_ks_directory related things: is this still ongoing or did we have
> a transient burst of these types of issues? And would we expect these to
> vary based on different JDK's, non-default configurations, etc?
> * Being less responsive post-commit: My only ideas here are a combination
> of the jenkins_jira_integration
> 
> script updating the JIRA ticket with test results if you cause a regression
> + us building a muscle around reverting your commit if they break tests.
>
> To quote Jacek:
>
> why don't run dtests w/wo sstable compression x w/wo internode encryption
> x w/wo vnodes,
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I
> think this is a matter of cost vs result.
>
>
> I think we've organically made these decisions and tradeoffs in the past
> without being methodical about it. If we can:
> 1. Multiplex changed or new tests
> 2. Tighten the feedback loop of "tests were green, now they're
> *consistently* not, you're the only one who changed something", and
> 3. Instill a culture of "if you can't fix it immediately revert your
> commit"
>
> Then I think we'll only be vulnerable to flaky failures introduced across
> different non-default configurations as side effects in tests that aren't
> touched, which *intuitively* feels like a lot less than we're facing
> today. We could even get clever as a day 2 effort and define packages in
> the primary codebase where changes take place and multiplex (on a smaller
> scale) their respective packages of unit tests in the future if we see
> problems in this area.
>
> Flakey tests are a giant pain in the ass and a huge drain on productivity,
> don't get me wrong. *And* we have to balance how much cost we're paying
> before each commit with the benefit we expect to gain from that.
>
> Does the above make sense? Are there things you've seen in the trenches
> that challenge or invalidate any of those perspectives?
>
> On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote:
>
> Isn't novnodes a special case of vnodes with n=1 ?
>
> We should rather select a subset of tests for which it makes sense to run
> with different configurations.
>
> The set of configurations against which we run the tests currently is
> still only the subset of all possible cases.
> I could ask - why don't run dtests w/wo sstable compression x w/wo
> internode encryption x w/wo vnodes,
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I
> think this is a matter of cost vs result.
> This equation contains the likelihood of failure in configuration X, given
> there was no failure in the default
> configuration, the cost of running those tests, the time we delay merging,
> the likelihood that we wait for
> the test results so long that our branch diverge and we will have to rerun
> them or accept the fact that we merge
> a code which was tested on outdated base. Eventually, the overall new
> contributors experience - whether they
> want to participate in the future.
>
>
>
> śr., 12 lip 2023 o 07:24 Berenguer Blasi 
> napisał(a):
>
> On our 4.0 release I remember a numbe

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Jacek Lewandowski
Would it be re-opening the ticket or creating a new ticket with "revert of
fix" ?



śr., 12 lip 2023 o 14:51 Ekaterina Dimitrova 
napisał(a):

> jenkins_jira_integration
> 
>  script
> updating the JIRA ticket with test results if you cause a regression + us
> building a muscle around reverting your commit if they break tests.“
>
> I am not sure people finding the time to fix their breakages will be
> solved but at least they will be pinged automatically. Hopefully many
> follow Jira updates.
>
> “  I don't take the past as strongly indicative of the future here since
> we've been allowing circle to validate pre-commit and haven't been
> multiplexing.”
> I am interested to compare how many tickets for flaky tests we will have
> pre-5.0 now compared to pre-4.1.
>
>
> On Wed, 12 Jul 2023 at 8:41, Josh McKenzie  wrote:
>
>> (This response ended up being a bit longer than intended; sorry about
>> that)
>>
>> What is more common though is packaging errors,
>> cdc/compression/system_ks_directory targeted fixes, CI w/wo
>> upgrade tests, being less responsive post-commit as you already
>> moved on
>>
>> *Two that **should **be resolved in the new regime:*
>> * Packaging errors should be caught pre as we're making the artifact
>> builds part of pre-commit.
>> * I'm hoping to merge the commit log segment allocation so CDC allocator
>> is the only one for 5.0 (and just bypasses the cdc-related work on
>> allocation if it's disabled thus not impacting perf); the existing targeted
>> testing of cdc specific functionality should be sufficient to confirm its
>> correctness as it doesn't vary from the primary allocation path when it
>> comes to mutation space in the buffer
>> * Upgrade tests are going to be part of the pre-commit suite
>>
>> *Outstanding issues:*
>> * compression. If we just run with defaults we won't test all cases so
>> errors could pop up here
>> * system_ks_directory related things: is this still ongoing or did we
>> have a transient burst of these types of issues? And would we expect these
>> to vary based on different JDK's, non-default configurations, etc?
>> * Being less responsive post-commit: My only ideas here are a combination
>> of the jenkins_jira_integration
>> 
>> script updating the JIRA ticket with test results if you cause a regression
>> + us building a muscle around reverting your commit if they break tests.
>>
>> To quote Jacek:
>>
>> why don't run dtests w/wo sstable compression x w/wo internode encryption
>> x w/wo vnodes,
>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc.
>> I think this is a matter of cost vs result.
>>
>>
>> I think we've organically made these decisions and tradeoffs in the past
>> without being methodical about it. If we can:
>> 1. Multiplex changed or new tests
>> 2. Tighten the feedback loop of "tests were green, now they're
>> *consistently* not, you're the only one who changed something", and
>> 3. Instill a culture of "if you can't fix it immediately revert your
>> commit"
>>
>> Then I think we'll only be vulnerable to flaky failures introduced across
>> different non-default configurations as side effects in tests that aren't
>> touched, which *intuitively* feels like a lot less than we're facing
>> today. We could even get clever as a day 2 effort and define packages in
>> the primary codebase where changes take place and multiplex (on a smaller
>> scale) their respective packages of unit tests in the future if we see
>> problems in this area.
>>
>> Flakey tests are a giant pain in the ass and a huge drain on
>> productivity, don't get me wrong. *And* we have to balance how much cost
>> we're paying before each commit with the benefit we expect to gain from
>> that.
>>
>> Does the above make sense? Are there things you've seen in the trenches
>> that challenge or invalidate any of those perspectives?
>>
>> On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote:
>>
>> Isn't novnodes a special case of vnodes with n=1 ?
>>
>> We should rather select a subset of tests for which it makes sense to run
>> with different configurations.
>>
>> The set of configurations against which we run the tests currently is
>> still only the subset of all possible cases.
>> I could ask - why don't run dtests w/wo sstable compression x w/wo
>> internode encryption x w/wo vnodes,
>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc.
>> I think this is a matter of cost vs result.
>> This equation contains the likelihood of failure in configuration X,
>> given there was no failure in the default
>> configuration, the cost of running those tests, the time we delay
>> merging, the likelihood that we wait for
>> the test results so long that our branch diverge and we will have to
>> rerun them or accept the fact that 

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Jacek Lewandowski
I believe some tools can determine which tests make sense to multiplex,
given that some exact lines of code were changed using code coverage
analysis. After the initial run, we should have data from the coverage
analysis, which would tell us which test classes are tainted - that is,
they cover the modified code fragments.

Using a similar approach, we could detect the coverage differences when
running, say w/wo compression, and discover the tests which cover those
parts of the code.

That way, we can be smart and save time by precisely pointing to it makes
sense to test more accurately.


śr., 12 lip 2023 o 14:52 Jacek Lewandowski 
napisał(a):

> Would it be re-opening the ticket or creating a new ticket with "revert of
> fix" ?
>
>
>
> śr., 12 lip 2023 o 14:51 Ekaterina Dimitrova 
> napisał(a):
>
>> jenkins_jira_integration
>> 
>>  script
>> updating the JIRA ticket with test results if you cause a regression + us
>> building a muscle around reverting your commit if they break tests.“
>>
>> I am not sure people finding the time to fix their breakages will be
>> solved but at least they will be pinged automatically. Hopefully many
>> follow Jira updates.
>>
>> “  I don't take the past as strongly indicative of the future here since
>> we've been allowing circle to validate pre-commit and haven't been
>> multiplexing.”
>> I am interested to compare how many tickets for flaky tests we will have
>> pre-5.0 now compared to pre-4.1.
>>
>>
>> On Wed, 12 Jul 2023 at 8:41, Josh McKenzie  wrote:
>>
>>> (This response ended up being a bit longer than intended; sorry about
>>> that)
>>>
>>> What is more common though is packaging errors,
>>> cdc/compression/system_ks_directory targeted fixes, CI w/wo
>>> upgrade tests, being less responsive post-commit as you already
>>> moved on
>>>
>>> *Two that **should **be resolved in the new regime:*
>>> * Packaging errors should be caught pre as we're making the artifact
>>> builds part of pre-commit.
>>> * I'm hoping to merge the commit log segment allocation so CDC allocator
>>> is the only one for 5.0 (and just bypasses the cdc-related work on
>>> allocation if it's disabled thus not impacting perf); the existing targeted
>>> testing of cdc specific functionality should be sufficient to confirm its
>>> correctness as it doesn't vary from the primary allocation path when it
>>> comes to mutation space in the buffer
>>> * Upgrade tests are going to be part of the pre-commit suite
>>>
>>> *Outstanding issues:*
>>> * compression. If we just run with defaults we won't test all cases so
>>> errors could pop up here
>>> * system_ks_directory related things: is this still ongoing or did we
>>> have a transient burst of these types of issues? And would we expect these
>>> to vary based on different JDK's, non-default configurations, etc?
>>> * Being less responsive post-commit: My only ideas here are a
>>> combination of the jenkins_jira_integration
>>> 
>>> script updating the JIRA ticket with test results if you cause a regression
>>> + us building a muscle around reverting your commit if they break tests.
>>>
>>> To quote Jacek:
>>>
>>> why don't run dtests w/wo sstable compression x w/wo internode
>>> encryption x w/wo vnodes,
>>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc.
>>> I think this is a matter of cost vs result.
>>>
>>>
>>> I think we've organically made these decisions and tradeoffs in the past
>>> without being methodical about it. If we can:
>>> 1. Multiplex changed or new tests
>>> 2. Tighten the feedback loop of "tests were green, now they're
>>> *consistently* not, you're the only one who changed something", and
>>> 3. Instill a culture of "if you can't fix it immediately revert your
>>> commit"
>>>
>>> Then I think we'll only be vulnerable to flaky failures introduced
>>> across different non-default configurations as side effects in tests that
>>> aren't touched, which *intuitively* feels like a lot less than we're
>>> facing today. We could even get clever as a day 2 effort and define
>>> packages in the primary codebase where changes take place and multiplex (on
>>> a smaller scale) their respective packages of unit tests in the future if
>>> we see problems in this area.
>>>
>>> Flakey tests are a giant pain in the ass and a huge drain on
>>> productivity, don't get me wrong. *And* we have to balance how much
>>> cost we're paying before each commit with the benefit we expect to gain
>>> from that.
>>>
>>> Does the above make sense? Are there things you've seen in the trenches
>>> that challenge or invalidate any of those perspectives?
>>>
>>> On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote:
>>>
>>> Isn't novnodes a special case of vnodes with n=1 ?
>>>
>>> We should rather select a subset of tests for which 

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Josh McKenzie
> Would it be re-opening the ticket or creating a new ticket with "revert of 
> fix" ?
I have a weak preference for re-opening the original ticket and tracking the 
revert + fix there. Keeps the workflow in one place. "Downside" is having 
multiple commits with "CASSANDRA-XX" in the message but that might actually 
be a nice thing grepping through to see what changes were made for a specific 
effort.

> I am not sure people finding the time to fix their breakages will be solved 
> but at least they will be pinged automatically.
That's where the "muscle around git revert" comes in. If we all agree to revert 
patches that break tests, fix them, and then re-merge them, I think that both 
keeps that work in the "original mental bucket required to be done", and also 
pressures all of us to take our pre-commit CI seriously and continue to refine 
it until such breakages don't occur, or occur so rarely they reach an 
acceptable level.

We also will offer the ability to run the pre-commit suite pre-merge or the 
post-commit suite pre-merge for folks who would prefer that approach to 
investment (machine time vs. risk of human time).

On Wed, Jul 12, 2023, at 8:52 AM, Jacek Lewandowski wrote:
> Would it be re-opening the ticket or creating a new ticket with "revert of 
> fix" ?
> 
> 
> 
> śr., 12 lip 2023 o 14:51 Ekaterina Dimitrova  
> napisał(a):
>> jenkins_jira_integration 
>> 
>>  script updating the JIRA ticket with test results if you cause a regression 
>> + us building a muscle around reverting your commit if they break tests.“
>> 
>> I am not sure people finding the time to fix their breakages will be solved 
>> but at least they will be pinged automatically. Hopefully many follow Jira 
>> updates.
>> 
>> “  I don't take the past as strongly indicative of the future here since 
>> we've been allowing circle to validate pre-commit and haven't been 
>> multiplexing.”
>> I am interested to compare how many tickets for flaky tests we will have 
>> pre-5.0 now compared to pre-4.1.
>> 
>> 
>> On Wed, 12 Jul 2023 at 8:41, Josh McKenzie  wrote:
>>> __
>>> (This response ended up being a bit longer than intended; sorry about that)
>>> 
 What is more common though is packaging errors,
 cdc/compression/system_ks_directory targeted fixes, CI w/wo
 upgrade tests, being less responsive post-commit as you already
 moved on
>>> *Two that ***should ***be resolved in the new regime:***
>>> * Packaging errors should be caught pre as we're making the artifact builds 
>>> part of pre-commit.
>>> * I'm hoping to merge the commit log segment allocation so CDC allocator is 
>>> the only one for 5.0 (and just bypasses the cdc-related work on allocation 
>>> if it's disabled thus not impacting perf); the existing targeted testing of 
>>> cdc specific functionality should be sufficient to confirm its correctness 
>>> as it doesn't vary from the primary allocation path when it comes to 
>>> mutation space in the buffer
>>> * Upgrade tests are going to be part of the pre-commit suite
>>> 
>>> *Outstanding issues:***
>>> * compression. If we just run with defaults we won't test all cases so 
>>> errors could pop up here
>>> * system_ks_directory related things: is this still ongoing or did we have 
>>> a transient burst of these types of issues? And would we expect these to 
>>> vary based on different JDK's, non-default configurations, etc?
>>> * Being less responsive post-commit: My only ideas here are a combination 
>>> of the jenkins_jira_integration 
>>> 
>>>  script updating the JIRA ticket with test results if you cause a 
>>> regression + us building a muscle around reverting your commit if they 
>>> break tests.
>>> 
>>> To quote Jacek:
 why don't run dtests w/wo sstable compression x w/wo internode encryption 
 x w/wo vnodes, 
 w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I 
 think this is a matter of cost vs result. 
>>> 
>>> I think we've organically made these decisions and tradeoffs in the past 
>>> without being methodical about it. If we can:
>>> 1. Multiplex changed or new tests
>>> 2. Tighten the feedback loop of "tests were green, now they're 
>>> *consistently* not, you're the only one who changed something", and
>>> 3. Instill a culture of "if you can't fix it immediately revert your commit"
>>> 
>>> Then I think we'll only be vulnerable to flaky failures introduced across 
>>> different non-default configurations as side effects in tests that aren't 
>>> touched, which *intuitively* feels like a lot less than we're facing today. 
>>> We could even get clever as a day 2 effort and define packages in the 
>>> primary codebase where changes take place and multiplex (on a smaller 
>>> scale) their respective packages of unit tests in the futur

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Ekaterina Dimitrova
Revert for only trunk patches right?
I’d say we need to completely stabilize the environment, no noise before we
go into that direction.

On Wed, 12 Jul 2023 at 8:55, Jacek Lewandowski 
wrote:

> Would it be re-opening the ticket or creating a new ticket with "revert of
> fix" ?
>
>
>
> śr., 12 lip 2023 o 14:51 Ekaterina Dimitrova 
> napisał(a):
>
>> jenkins_jira_integration
>> 
>>  script
>> updating the JIRA ticket with test results if you cause a regression + us
>> building a muscle around reverting your commit if they break tests.“
>>
>> I am not sure people finding the time to fix their breakages will be
>> solved but at least they will be pinged automatically. Hopefully many
>> follow Jira updates.
>>
>> “  I don't take the past as strongly indicative of the future here since
>> we've been allowing circle to validate pre-commit and haven't been
>> multiplexing.”
>> I am interested to compare how many tickets for flaky tests we will have
>> pre-5.0 now compared to pre-4.1.
>>
>>
>> On Wed, 12 Jul 2023 at 8:41, Josh McKenzie  wrote:
>>
>>> (This response ended up being a bit longer than intended; sorry about
>>> that)
>>>
>>> What is more common though is packaging errors,
>>> cdc/compression/system_ks_directory targeted fixes, CI w/wo
>>> upgrade tests, being less responsive post-commit as you already
>>> moved on
>>>
>>> *Two that **should **be resolved in the new regime:*
>>> * Packaging errors should be caught pre as we're making the artifact
>>> builds part of pre-commit.
>>> * I'm hoping to merge the commit log segment allocation so CDC allocator
>>> is the only one for 5.0 (and just bypasses the cdc-related work on
>>> allocation if it's disabled thus not impacting perf); the existing targeted
>>> testing of cdc specific functionality should be sufficient to confirm its
>>> correctness as it doesn't vary from the primary allocation path when it
>>> comes to mutation space in the buffer
>>> * Upgrade tests are going to be part of the pre-commit suite
>>>
>>> *Outstanding issues:*
>>> * compression. If we just run with defaults we won't test all cases so
>>> errors could pop up here
>>> * system_ks_directory related things: is this still ongoing or did we
>>> have a transient burst of these types of issues? And would we expect these
>>> to vary based on different JDK's, non-default configurations, etc?
>>> * Being less responsive post-commit: My only ideas here are a
>>> combination of the jenkins_jira_integration
>>> 
>>> script updating the JIRA ticket with test results if you cause a regression
>>> + us building a muscle around reverting your commit if they break tests.
>>>
>>> To quote Jacek:
>>>
>>> why don't run dtests w/wo sstable compression x w/wo internode
>>> encryption x w/wo vnodes,
>>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc.
>>> I think this is a matter of cost vs result.
>>>
>>>
>>> I think we've organically made these decisions and tradeoffs in the past
>>> without being methodical about it. If we can:
>>> 1. Multiplex changed or new tests
>>> 2. Tighten the feedback loop of "tests were green, now they're
>>> *consistently* not, you're the only one who changed something", and
>>> 3. Instill a culture of "if you can't fix it immediately revert your
>>> commit"
>>>
>>> Then I think we'll only be vulnerable to flaky failures introduced
>>> across different non-default configurations as side effects in tests that
>>> aren't touched, which *intuitively* feels like a lot less than we're
>>> facing today. We could even get clever as a day 2 effort and define
>>> packages in the primary codebase where changes take place and multiplex (on
>>> a smaller scale) their respective packages of unit tests in the future if
>>> we see problems in this area.
>>>
>>> Flakey tests are a giant pain in the ass and a huge drain on
>>> productivity, don't get me wrong. *And* we have to balance how much
>>> cost we're paying before each commit with the benefit we expect to gain
>>> from that.
>>>
>>> Does the above make sense? Are there things you've seen in the trenches
>>> that challenge or invalidate any of those perspectives?
>>>
>>> On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote:
>>>
>>> Isn't novnodes a special case of vnodes with n=1 ?
>>>
>>> We should rather select a subset of tests for which it makes sense to
>>> run with different configurations.
>>>
>>> The set of configurations against which we run the tests currently is
>>> still only the subset of all possible cases.
>>> I could ask - why don't run dtests w/wo sstable compression x w/wo
>>> internode encryption x w/wo vnodes,
>>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc.
>>> I think this is a matter of cost vs result.
>>> This equation contains the likeliho

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Josh McKenzie
> Revert for only trunk patches right? 
> I’d say we need to completely stabilize the environment, no noise before we 
> go into that direction.
Hm. Is the concern multi-branch reverts w/merge commits being awful? Because I 
hear that. Starting trunk-only would be reasonable enough I think, especially 
since we'd be bugfix only on other branches anyway and expect less test 
destabilization. This is tickling my memory a bit; I think we talked about... 
something? Different on how we handle CI and vetting on trunk compared to other 
branches. I'll have to dig around later and see if I can surface that.

I think completely stabilizing the environment is going to be something of a 
chicken / egg problem. Until we move away from our heterogenous execution 
environment w/constant degraded and failing agents and/or get more automated 
robustness (re-run stage w/just timed out tests for example), I don't think 
we'll be able to get to a completely stabilized environment. And IMO the "if 
you break it you buy it (revert)" approach would strictly serve to help us in 
our move in that direction.

As I type this out, it strikes me that this feels similar to being on-call for 
the code you write. When there's real-world stakes / pain / discomfort that 
*will be applied* to you if you're not thorough in your consideration, you 
think about things differently and it improves the quality of your work as a 
result.

I suspect the risk of having personal delivery timelines slip because your code 
introduced test failures would be a pretty strong incentive to both be more 
careful about how you work on what you're doing plus incentive to chip in and 
work on the CI environment as well to prevent any CI-stack specific errors in 
the future.

I think about this in terms of where the tax is being paid. If the pressure is 
applied to the person who contributed the code, they have to pay the tax. If we 
allow these kind of failures to rest in the system, the entire rest of the dev 
community pays the tax. The former seems less aggregate cost to us as a project 
than the latter to me?

On Wed, Jul 12, 2023, at 9:10 AM, Ekaterina Dimitrova wrote:
> Revert for only trunk patches right? 
> I’d say we need to completely stabilize the environment, no noise before we 
> go into that direction.
> 
> On Wed, 12 Jul 2023 at 8:55, Jacek Lewandowski  
> wrote:
>> Would it be re-opening the ticket or creating a new ticket with "revert of 
>> fix" ?
>> 
>> 
>> 
>> śr., 12 lip 2023 o 14:51 Ekaterina Dimitrova  
>> napisał(a):
>>> jenkins_jira_integration 
>>> 
>>>  script updating the JIRA ticket with test results if you cause a 
>>> regression + us building a muscle around reverting your commit if they 
>>> break tests.“
>>> 
>>> I am not sure people finding the time to fix their breakages will be solved 
>>> but at least they will be pinged automatically. Hopefully many follow Jira 
>>> updates.
>>> 
>>> “  I don't take the past as strongly indicative of the future here since 
>>> we've been allowing circle to validate pre-commit and haven't been 
>>> multiplexing.”
>>> I am interested to compare how many tickets for flaky tests we will have 
>>> pre-5.0 now compared to pre-4.1.
>>> 
>>> 
>>> On Wed, 12 Jul 2023 at 8:41, Josh McKenzie  wrote:
 __
 (This response ended up being a bit longer than intended; sorry about that)
 
> What is more common though is packaging errors,
> cdc/compression/system_ks_directory targeted fixes, CI w/wo
> upgrade tests, being less responsive post-commit as you already
> moved on
 *Two that ***should ***be resolved in the new regime:***
 * Packaging errors should be caught pre as we're making the artifact 
 builds part of pre-commit.
 * I'm hoping to merge the commit log segment allocation so CDC allocator 
 is the only one for 5.0 (and just bypasses the cdc-related work on 
 allocation if it's disabled thus not impacting perf); the existing 
 targeted testing of cdc specific functionality should be sufficient to 
 confirm its correctness as it doesn't vary from the primary allocation 
 path when it comes to mutation space in the buffer
 * Upgrade tests are going to be part of the pre-commit suite
 
 *Outstanding issues:***
 * compression. If we just run with defaults we won't test all cases so 
 errors could pop up here
 * system_ks_directory related things: is this still ongoing or did we have 
 a transient burst of these types of issues? And would we expect these to 
 vary based on different JDK's, non-default configurations, etc?
 * Being less responsive post-commit: My only ideas here are a combination 
 of the jenkins_jira_integration 
 
  script updating the JIRA ticket with test resu

Re: Changing the output of tooling between majors

2023-07-12 Thread Eric Evans
On Wed, Jul 12, 2023 at 1:54 AM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> I agree with Jackson that having a different output format (JSON/YAML) in
> order to be able to change the default output resolves nothing in practice.
>
> As Jackson said, "operators who maintain these scripts aren’t going to
> re-write them just because a better way of doing them is newly available,
> usually they’re too busy with other work and will keep using those old
> scripts until they stop working".
>
> This is true. If this approach is adopted, what will happen in practice is
> that we change the output and we provide a different format and then a user
> detects this change because his scripts changed. As he has existing
> solution in place which parses the text from human-readable output, he will
> try to fix that, he will not suddenly convert all scripting he has to
> parsing JSON just because we added it. Starting with JSON parsing might be
> done if he has no scripting in place yet but then we would not cover
> already existing deployments.
>

I think this is quite an extreme conclusion to draw.  If tooling had
stable, structured output formats, and if we documented an expectation that
human-readable console output was unstable, then presumably it would be
safe to assume that any new scripters would avail themselves of the stable
formats, or expect breakage later.  I think it's also fair to assume that
at least some people would spend the time to convert their scripts,
particularly if forced to revisit them (for example, after a breaking
change to console output).  As someone who manages several large-scale
mission-critical Cassandra clusters under constrained resources, this is
how I would approach it.

TL;DR Don't let perfect by the enemy of good


[ ... ]
>


> For that reason, what we could agree on is that we would never change the
> output for "tier 1" commands and if we ever changed something, it would be
> STRICT ADDITIONS only. In other words, everything it printed, it would
> continue to print that for ever. Only new lines could be introduced. We
> need to do this because Cassandra is evolving over time and we need to keep
> the output aligned as new functionality appears. But the output would be
> backward compatible. Plus, we are talking about majors only.
>
> The only reason we would ever changed the output on "tier 1" commands, if
> is not an addition, is the fix of the typo in the existing output. This
> would again happened only in majors.
>
> All other output for all other commands might be changed but their output
> will not need to be strictly additive. This would again happen only between
> majors.
>
> What is you opinion about this?
>

To be clear about where I'm coming from: I'm not arguing against you or
anyone else making changes like these (in major versions, or otherwise).
If —for example— we had console output that was incorrect, incomplete, or
obviously misleading, I'd absolutely want to see that fixed, script
breakage be damned.  All I want is for folks to recognize the problems this
sort of thing can create, and show a bit of empathy before submitting a
change.  For operators on the receiving end, it can be really frustrating,
especially when there is no normative change (i.e. it's in service of
aesthetics).

-- 
Eric Evans 
Staff SRE, Data Persistence
Wikimedia Foundation


Re: Changing the output of tooling between majors

2023-07-12 Thread C. Scott Andreas
Agreed with Eric’s point here, yes.- ScottOn Jul 12, 2023, at 10:48 AM, Eric Evans  wrote:On Wed, Jul 12, 2023 at 1:54 AM Miklosovic, Stefan  wrote:I agree with Jackson that having a different output format (JSON/YAML) in order to be able to change the default output resolves nothing in practice.

As Jackson said, "operators who maintain these scripts aren’t going to re-write them just because a better way of doing them is newly available, usually they’re too busy with other work and will keep using those old scripts until they stop working".

This is true. If this approach is adopted, what will happen in practice is that we change the output and we provide a different format and then a user detects this change because his scripts changed. As he has existing solution in place which parses the text from human-readable output, he will try to fix that, he will not suddenly convert all scripting he has to parsing JSON just because we added it. Starting with JSON parsing might be done if he has no scripting in place yet but then we would not cover already existing deployments.I think this is quite an extreme conclusion to draw.  If tooling had stable, structured output formats, and if we documented an expectation that human-readable console output was unstable, then presumably it would be safe to assume that any new scripters would avail themselves of the stable formats, or expect breakage later.  I think it's also fair to assume that at least some people would spend the time to convert their scripts, particularly if forced to revisit them (for example, after a breaking change to console output).  As someone who manages several large-scale mission-critical Cassandra clusters under constrained resources, this is how I would approach it.TL;DR Don't let perfect by the enemy of good[ ... ] 
For that reason, what we could agree on is that we would never change the output for "tier 1" commands and if we ever changed something, it would be STRICT ADDITIONS only. In other words, everything it printed, it would continue to print that for ever. Only new lines could be introduced. We need to do this because Cassandra is evolving over time and we need to keep the output aligned as new functionality appears. But the output would be backward compatible. Plus, we are talking about majors only.

The only reason we would ever changed the output on "tier 1" commands, if is not an addition, is the fix of the typo in the existing output. This would again happened only in majors.

All other output for all other commands might be changed but their output will not need to be strictly additive. This would again happen only between majors.

What is you opinion about this?To be clear about where I'm coming from: I'm not arguing against you or anyone else making changes like these (in major versions, or otherwise).  If —for example— we had console output that was incorrect, incomplete, or obviously misleading, I'd absolutely want to see that fixed, script breakage be damned.  All I want is for folks to recognize the problems this sort of thing can create, and show a bit of empathy before submitting a change.  For operators on the receiving end, it can be really frustrating, especially when there is no normative change (i.e. it's in service of aesthetics). -- Eric EvansStaff SRE, Data PersistenceWikimedia Foundation


Re: Changing the output of tooling between majors

2023-07-12 Thread Miklosovic, Stefan
Eric,

I appreciate your feedback on this, especially more background about where you 
are comming from in the second paragraph.

I think we are on the same page afterall. I definitely understand that people 
are depending on this output and we need to be careful. That is why I propose 
to change it only each major. What I feel is that everybody's usage / 
expectations is little bit different and outputs of the commands are very 
diverse and it is hard to balance this so everybody is happy.

I am trying to come up with a solution which would not change the most 
important commands unnecessarily while also having some free room to tweak the 
existing commands where we see it appropriate. I just find it ridiculous we can 
not change "someProperty: 10" to "Some Property: 10" and there is so much red 
tape about that.

If I had to summarize this whole discussion, the best conclustion I can think 
of is to not change what is used the most (this would probably need to be 
defined more explicitly) and if we have to change something else we better 
document that extensively and provide json/yaml for people to be able to 
divorce from the parsing of human-readable format (which probably all agree 
should not happen in the first place).

What I am afraid of is that in order to satisfy these conditions, if, for 
example, we just want to fix a typo or the format of a key of some value, the 
we would need to deliver JSON/YAML format as well if there is not any yet and 
that would mean that the change of such triviality would require way more work 
in terms of the implementation of JSON/YAML format output. Some commands are 
quite sophisticated and I do not want to be blocked to change a field in 
human-readable out because providing corresponding JSON/YAML format would be 
gigantic portion of the work itself.

From what I see you guys want to condition any change by offering json/yaml as 
well and I dont know if that is just not too much.



From: Eric Evans 
Sent: Wednesday, July 12, 2023 19:48
To: dev@cassandra.apache.org
Subject: Re: Changing the output of tooling between majors

You don't often get email from eev...@wikimedia.org. Learn why this is 
important
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.





On Wed, Jul 12, 2023 at 1:54 AM Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>> wrote:
I agree with Jackson that having a different output format (JSON/YAML) in order 
to be able to change the default output resolves nothing in practice.

As Jackson said, "operators who maintain these scripts aren’t going to re-write 
them just because a better way of doing them is newly available, usually 
they’re too busy with other work and will keep using those old scripts until 
they stop working".

This is true. If this approach is adopted, what will happen in practice is that 
we change the output and we provide a different format and then a user detects 
this change because his scripts changed. As he has existing solution in place 
which parses the text from human-readable output, he will try to fix that, he 
will not suddenly convert all scripting he has to parsing JSON just because we 
added it. Starting with JSON parsing might be done if he has no scripting in 
place yet but then we would not cover already existing deployments.

I think this is quite an extreme conclusion to draw.  If tooling had stable, 
structured output formats, and if we documented an expectation that 
human-readable console output was unstable, then presumably it would be safe to 
assume that any new scripters would avail themselves of the stable formats, or 
expect breakage later.  I think it's also fair to assume that at least some 
people would spend the time to convert their scripts, particularly if forced to 
revisit them (for example, after a breaking change to console output).  As 
someone who manages several large-scale mission-critical Cassandra clusters 
under constrained resources, this is how I would approach it.

TL;DR Don't let perfect by the enemy of 
good

[ ... ]

For that reason, what we could agree on is that we would never change the 
output for "tier 1" commands and if we ever changed something, it would be 
STRICT ADDITIONS only. In other words, everything it printed, it would continue 
to print that for ever. Only new lines could be introduced. We need to do this 
because Cassandra is evolving over time and we need to keep the output aligned 
as new functionality appears. But the output would be backward compatible. 
Plus, we are talking about majors only.

The only reason we would ever changed the output on "tier 1" commands, if is 
not an addition, is the fix of the typo in the existing output. This would 
again happened only in majors.

All other output for a

[ANNOUNCE] Apache Cassandra 4.0.11 test artifact available

2023-07-12 Thread Miklosovic, Stefan
The test build of Cassandra 4.0.11 is available.

sha1: f8584b943e7cd62ed4cb66ead2c9b4a8f1c7f8b5
Git: https://github.com/apache/cassandra/tree/4.0.11-tentative
Maven Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-1303/org/apache/cassandra/cassandra-all/4.0.11/

The Source and Build Artifacts, and the Debian and RPM packages and 
repositories, are available here: 
https://dist.apache.org/repos/dist/dev/cassandra/4.0.11/

A vote of this test build will be initiated within the next couple of days.

[1]: CHANGES.txt: 
https://github.com/apache/cassandra/blob/4.0.11-tentative/CHANGES.txt
[2]: NEWS.txt: 
https://github.com/apache/cassandra/blob/4.0.11-tentative/NEWS.txt

Re: [ANNOUNCE] Apache Cassandra 4.0.11 test artifact available

2023-07-12 Thread Brandon Williams
+1

Checked debian and boolean redhat installs

Kind Regards,
Brandon

On Wed, Jul 12, 2023 at 3:08 PM Miklosovic, Stefan
 wrote:
>
> The test build of Cassandra 4.0.11 is available.
>
> sha1: f8584b943e7cd62ed4cb66ead2c9b4a8f1c7f8b5
> Git: https://github.com/apache/cassandra/tree/4.0.11-tentative
> Maven Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1303/org/apache/cassandra/cassandra-all/4.0.11/
>
> The Source and Build Artifacts, and the Debian and RPM packages and 
> repositories, are available here: 
> https://dist.apache.org/repos/dist/dev/cassandra/4.0.11/
>
> A vote of this test build will be initiated within the next couple of days.
>
> [1]: CHANGES.txt: 
> https://github.com/apache/cassandra/blob/4.0.11-tentative/CHANGES.txt
> [2]: NEWS.txt: 
> https://github.com/apache/cassandra/blob/4.0.11-tentative/NEWS.txt