Re: Defining which code changes target which release types

2021-09-02 Thread Benjamin Lerer
>
> - New features, always with feature flag (added; happy to drop if
> controversial)


I believe that always having a feature flag for every new feature might be
too complicated in practice for different reasons.
Some new features might be low impact like new nodetool commands or new
virtual tables and adding flags for those might simply be extra
complication for the developers and users.
For some other features it might be simply too hard to hide them behind
feature flags.

Feature flag basically means "experimental" so it would be good when a
feature flag is introduced to also have a clear plan on when and how the
flag will be removed.

I would personally limit the feature flag to significant new features. As
those types of features now require a CEP, we could make the feature fag
discussion part of the CEP discussion.

What do you think?



Le jeu. 2 sept. 2021 à 08:41, Mick Semb Wever  a écrit :

> >
> >
> > There's certainly a lot of complexity in a lot of the systems here, no
> > denying that, so maybe we treat the topic of API changes as "here's loose
> > guidelines (destructive vs. additive w/sane defaults, etc) but plan to
> take
> > it case-by-case" and be a bit more prescriptive on the "where do bug
> fixes
> > vs. improvements vs. new features go and why"?
> >
>
>
> Agree.
>


[VOTE] Release dtest-api 0.0.9

2021-09-02 Thread Mick Semb Wever
Proposing the test build of in-jvm dtest API 0.0.9 for release.

Repository: 
https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git;a=shortlog;h=refs/tags/0.0.9

Candidate SHA: 
https://github.com/apache/cassandra-in-jvm-dtest-api/commit/aa25319c3e0f506d19383db31d2974a7f5c58ab8
tagged with 0.0.9

Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-1248/org/apache/cassandra/dtest-api/0.0.9/

Key signature: A4C465FEA0C552561A392A61E91335D77E3E87CB


Changes since last release:
  * CASSANDRA-16803
jvm-dtest-upgrade failing
MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck,
ClassNotFoundException: com.vdurmont.semver4j.Semver


The vote will be open for 24 hours. Everyone who has tested the build
is invited to vote. Votes by PMC members are considered binding. A
vote passes if there are at least three binding +1s.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE] Release dtest-api 0.0.9

2021-09-02 Thread Brandon Williams
+1

On Thu, Sep 2, 2021 at 6:20 AM Mick Semb Wever  wrote:
>
> Proposing the test build of in-jvm dtest API 0.0.9 for release.
>
> Repository: 
> https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git;a=shortlog;h=refs/tags/0.0.9
>
> Candidate SHA: 
> https://github.com/apache/cassandra-in-jvm-dtest-api/commit/aa25319c3e0f506d19383db31d2974a7f5c58ab8
> tagged with 0.0.9
>
> Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1248/org/apache/cassandra/dtest-api/0.0.9/
>
> Key signature: A4C465FEA0C552561A392A61E91335D77E3E87CB
>
>
> Changes since last release:
>   * CASSANDRA-16803
> jvm-dtest-upgrade failing
> MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck,
> ClassNotFoundException: com.vdurmont.semver4j.Semver
>
>
> The vote will be open for 24 hours. Everyone who has tested the build
> is invited to vote. Votes by PMC members are considered binding. A
> vote passes if there are at least three binding +1s.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE] Release dtest-api 0.0.9

2021-09-02 Thread Stefan Miklosovic
+1

On Thu, 2 Sept 2021 at 13:20, Mick Semb Wever  wrote:
>
> Proposing the test build of in-jvm dtest API 0.0.9 for release.
>
> Repository: 
> https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git;a=shortlog;h=refs/tags/0.0.9
>
> Candidate SHA: 
> https://github.com/apache/cassandra-in-jvm-dtest-api/commit/aa25319c3e0f506d19383db31d2974a7f5c58ab8
> tagged with 0.0.9
>
> Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1248/org/apache/cassandra/dtest-api/0.0.9/
>
> Key signature: A4C465FEA0C552561A392A61E91335D77E3E87CB
>
>
> Changes since last release:
>   * CASSANDRA-16803
> jvm-dtest-upgrade failing
> MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck,
> ClassNotFoundException: com.vdurmont.semver4j.Semver
>
>
> The vote will be open for 24 hours. Everyone who has tested the build
> is invited to vote. Votes by PMC members are considered binding. A
> vote passes if there are at least three binding +1s.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP-13: Denylisting partitions

2021-09-02 Thread Joshua McKenzie
I'm +1 on where it currently stands after the revisions. Consider resolving
out comment threads on the design doc that are closed so we can see if
there's any outstanding discussions from a high level?

~Josh

On Mon, Aug 30, 2021 at 1:14 AM Sumanth Pasupuleti <
sumanth.pasupuleti...@gmail.com> wrote:

> +1. Made changes to the design document linked against the CEP to reflect
> this feedback. Specifically, the following sections have been updated
> * Operations to blacklist
> * Blacklist information store
>
> Thanks,
> Sumanth
>
>
> On Fri, Aug 27, 2021 at 7:57 AM Joshua McKenzie 
> wrote:
>
> > I can see the case for all three:
> > * Deny both reads and writes to a partition (wide, heavily tombstones,
> too
> > many stables, etc) causing disruption to a replica set; don't want
> further
> > growth nor reads until operator intervention
> > * Deny reads but allow writing to rectify problems on a partition
> > (intervention window; see above)
> > * Deny writes to a partition but allow reads (prevent partitions growing
> > unbounded, or potentially evolving into a future feature creating a
> ceiling
> > on partition sizes that kicks in and demands application intervention to
> > reduce partition size at a guardrail limit)
> >
> > So yeah, at least to me at face value it seems like it'd be worth it not
> > only to allow denylisting both reads and writes, but to be able to choose
> > from the set of reads|writes|both on a per-partition basis.
> >
> > ~Josh
> >
> >
> > On Thu, Aug 26, 2021 at 2:16 PM Sumanth Pasupuleti <
> > sumanth.pasupuleti...@gmail.com> wrote:
> >
> > > Thank you, Josh for the elaborate explanation of a potential scenario
> > where
> > > denylisting writes would make sense.
> > > I, 100% agree that could benefit in a situation where we would want to
> > deny
> > > writes to a partition that we do not have much control on (which is
> true
> > in
> > > most situations) and such behavior can eventually lead to
> unavailability
> > of
> > > other partitions too, as you indicate.
> > >
> > > Do you think it makes sense to make it configurable per partition
> though?
> > > As in, maybe by default, we would want to deny both reads and writes
> to a
> > > partition, but for certain partitions, we may still want to allow
> writes
> > > just so we can issue a delete against that partition as an example.
> > > Ofcourse this would make the feature and the interface more heavy, and
> we
> > > need to think through if its worth it. I personally feel it could be
> > worth
> > > it, especially if we agree on the default behavior that makes the
> > interface
> > > simple in most cases. Thoughts?
> > >
> > > And yes, so good to see CEP process reaping benefits in multiple ways -
> > > especially around collaboration and documentation.
> > >
> > >
> > > On Thu, Aug 26, 2021 at 8:31 AM Joshua McKenzie 
> > > wrote:
> > >
> > > > The design doc and CEP currently pass on blocklisting / denylisting
> > > writes
> > > > at this time. In the proposed new patch it states:
> > > > "Note: We do not want to blacklist writes since it is the reads that
> > > > primarily impact the performance when reading a bad partition, and we
> > may
> > > > want writes to be allowed to “fix” a bad partition. We could revisit
> > this
> > > > in the future"
> > > >
> > > > In situations where you have an air gap between database ops and
> > > > application access (ops <> application teams, or more autonomous
> > > > application access patterns, self-service, etc), you can easily get
> > into
> > > a
> > > > situation where you have either a pathological client hammering
> writes
> > > to a
> > > > specific partition causing impact to other clients or in the worst
> > case,
> > > > the replica set, or unbounded partition growth that again leads to
> > > > performance degradation or replica set unavailability. The tradeoff
> > there
> > > > becomes "do we interrupt the application's ability to write to this
> > > > partition now, or do we instead defer and risk losing access to *all*
> > > > partitions on this replica set and still interrupt their access
> > > eventually
> > > > anyway?"
> > > >
> > > > Given this, I strongly advocate for support of denylisting both reads
> > > *and*
> > > > writes on these grounds; operators need another tool in their toolbox
> > to
> > > > deal with situations where specific partition writing has wider
> > negative
> > > > impacts on the replicas.
> > > >
> > > > Acknowledging of course that there was extensive discussion on this
> > back
> > > in
> > > > 2018, and that would have been a *great* time to engage in the
> > > discussion.
> > > > =/ Good thing we have this new CEP process! :)
> > > >
> > > > Curious what you think about this perspective Sumanth.
> > > >
> > > > ~Josh
> > > >
> > > >
> > > > On Tue, Aug 17, 2021 at 2:04 PM Joshua McKenzie <
> jmcken...@apache.org>
> > > > wrote:
> > > >
> > > > > Certainly. I'll take on distilling a high level view of the feature
> > > from
> > > > > what 

Re: Defining which code changes target which release types

2021-09-02 Thread Joshua McKenzie
>
> Feature flag basically means "experimental"

I'm thinking of feature flags more as giving the power to operators to
decide what they do and don't allow users of the database access to. Even
if a feature is very stable and non-experimental, it can have negative
effects on other use-cases on a shared cluster, be incompatible with the
underlying execution environment, be outside compliance policies of an
organization, require greater expertise to use correctly, etc.

That said, I 100% agree w/you on the "limit it to significant new
features". I don't think feature flagging nodetool commands makes a lot of
sense. :)

Adding it to the CEP template as something to yes/no on would be a simple
clarification for this. +1

~Josh


On Thu, Sep 2, 2021 at 3:14 AM Benjamin Lerer  wrote:

> >
> > - New features, always with feature flag (added; happy to drop if
> > controversial)
>
>
> I believe that always having a feature flag for every new feature might be
> too complicated in practice for different reasons.
> Some new features might be low impact like new nodetool commands or new
> virtual tables and adding flags for those might simply be extra
> complication for the developers and users.
> For some other features it might be simply too hard to hide them behind
> feature flags.
>
> Feature flag basically means "experimental" so it would be good when a
> feature flag is introduced to also have a clear plan on when and how the
> flag will be removed.
>
> I would personally limit the feature flag to significant new features. As
> those types of features now require a CEP, we could make the feature fag
> discussion part of the CEP discussion.
>
> What do you think?
>
>
>
> Le jeu. 2 sept. 2021 à 08:41, Mick Semb Wever  a écrit :
>
> > >
> > >
> > > There's certainly a lot of complexity in a lot of the systems here, no
> > > denying that, so maybe we treat the topic of API changes as "here's
> loose
> > > guidelines (destructive vs. additive w/sane defaults, etc) but plan to
> > take
> > > it case-by-case" and be a bit more prescriptive on the "where do bug
> > fixes
> > > vs. improvements vs. new features go and why"?
> > >
> >
> >
> > Agree.
> >
>


[DISCUSS] CASSANDRA-15234

2021-09-02 Thread Ekaterina Dimitrova
Hi team,

I would like to bring to the attention of the community CASSANDRA-15234,
standardise config and JVM parameters.

This is work we discussed back in Summer 2020 just before our first 4.0
Beta release. During the discussion we figured out that there is more than
one option to do the job and not enough time to get user feedback and
finish it so this was delayed post-4.0 And here I am, bringing it back to
the table.

This work’s goal is:

   -

   To standardize naming - that we did by agreeing to the form noun_verb
   -

   Provision of values with units while maintaining backward compatibility.


Those two parts are more or less already done.

More interesting is the third part - reorganizing the cassandra.yaml file.

My personal approach was to split it into sections, done here

.

Another proposal is done by Benedict; grouping the config parameters.

To make it clearer, he created a yaml

with comments mostly stripped.

In his version, there are basic settings for network, disk etc all grouped
together, followed by operator tuneables mostly under limits within which
we now have throughput, concurrency, capacity. This leads to settings for
some features being kept separate (most notably for caching), but helps the
operator understand what they have to play with for controlling resource
consumption.

I am interested to hear what people think about the two options or if
anyone has another idea to share, open discussion.

Thank you,

Ekaterina


Re: [DISCUSS] CASSANDRA-15234

2021-09-02 Thread bened...@apache.org
Thanks for bringing this to the list Ekaterina!

It’s worth noting that the two don’t have to be in conflict: we could offer two 
template yaml with the parameters grouped differently, for users to decide for 
themselves.

The proposals primarily define parameter names differently, with my proposal 
going by kind->place, and the other proposal maintaining (mostly) the existing 
name form (which is a bit more like place->kind). While the example yaml groups 
by kind, you can convert nested definitions into a ‘dot’ form (e.g. 
limits.concurrency.reads) for use in a different grouping.

One advantage of grouping parameters together is that it aids maintaining 
coherency of naming between systems, and also potentially permits a more 
succinct config file and better discovery. But it’s far from a silver bullet, 
as value judgements have to be made about where the grouping lines are. I’m 
sure anything we settle on will be a huge improvement over the status quo, 
however.




From: Ekaterina Dimitrova 
Date: Thursday, 2 September 2021 at 16:32
To: dev@cassandra.apache.org 
Subject: [DISCUSS] CASSANDRA-15234
Hi team,

I would like to bring to the attention of the community CASSANDRA-15234,
standardise config and JVM parameters.

This is work we discussed back in Summer 2020 just before our first 4.0
Beta release. During the discussion we figured out that there is more than
one option to do the job and not enough time to get user feedback and
finish it so this was delayed post-4.0 And here I am, bringing it back to
the table.

This work’s goal is:

   -

   To standardize naming - that we did by agreeing to the form noun_verb
   -

   Provision of values with units while maintaining backward compatibility.


Those two parts are more or less already done.

More interesting is the third part - reorganizing the cassandra.yaml file.

My personal approach was to split it into sections, done here

.

Another proposal is done by Benedict; grouping the config parameters.

To make it clearer, he created a yaml

with comments mostly stripped.

In his version, there are basic settings for network, disk etc all grouped
together, followed by operator tuneables mostly under limits within which
we now have throughput, concurrency, capacity. This leads to settings for
some features being kept separate (most notably for caching), but helps the
operator understand what they have to play with for controlling resource
consumption.

I am interested to hear what people think about the two options or if
anyone has another idea to share, open discussion.

Thank you,

Ekaterina


Re: [DISCUSS] CASSANDRA-15234

2021-09-02 Thread David Capwell
Thanks for bringing this back up; Caleb and I were talking about the lack of 
clarity with regard to CASSANDRA-16896, fleshing this out would make those 
configs nicer!

>   To standardize naming - that we did by agreeing to the form noun_verb

If we can document this, it would be great as stuff like “enabled” are 
inconsistent so not sure if I did it properly =D

> 
>   Provision of values with units while maintaining backward compatibility.

+1

I really hate local_read_size_threshold_kb; I would love 
local_read_size_threshold: 10kb.  Once we have the infrastructure in place 
(believe your patch before had these tools) I would love to switch!


> Another proposal is done by Benedict; grouping the config parameters.

Yep, this is what triggered Caleb and I to talk about this thread!  To group or 
not to group; that is the question

Personally I like grouping from an organization point of view so am in favor of 
that; though I will agree that it can be hard for some tools (such as bash 
templating), but feel we can always find a common ground


> On Sep 2, 2021, at 8:44 AM, bened...@apache.org wrote:
> 
> Thanks for bringing this to the list Ekaterina!
> 
> It’s worth noting that the two don’t have to be in conflict: we could offer 
> two template yaml with the parameters grouped differently, for users to 
> decide for themselves.
> 
> The proposals primarily define parameter names differently, with my proposal 
> going by kind->place, and the other proposal maintaining (mostly) the 
> existing name form (which is a bit more like place->kind). While the example 
> yaml groups by kind, you can convert nested definitions into a ‘dot’ form 
> (e.g. limits.concurrency.reads) for use in a different grouping.
> 
> One advantage of grouping parameters together is that it aids maintaining 
> coherency of naming between systems, and also potentially permits a more 
> succinct config file and better discovery. But it’s far from a silver bullet, 
> as value judgements have to be made about where the grouping lines are. I’m 
> sure anything we settle on will be a huge improvement over the status quo, 
> however.
> 
> 
> 
> 
> From: Ekaterina Dimitrova 
> Date: Thursday, 2 September 2021 at 16:32
> To: dev@cassandra.apache.org 
> Subject: [DISCUSS] CASSANDRA-15234
> Hi team,
> 
> I would like to bring to the attention of the community CASSANDRA-15234,
> standardise config and JVM parameters.
> 
> This is work we discussed back in Summer 2020 just before our first 4.0
> Beta release. During the discussion we figured out that there is more than
> one option to do the job and not enough time to get user feedback and
> finish it so this was delayed post-4.0 And here I am, bringing it back to
> the table.
> 
> This work’s goal is:
> 
>   -
> 
>   To standardize naming - that we did by agreeing to the form noun_verb
>   -
> 
>   Provision of values with units while maintaining backward compatibility.
> 
> 
> Those two parts are more or less already done.
> 
> More interesting is the third part - reorganizing the cassandra.yaml file.
> 
> My personal approach was to split it into sections, done here
> 
> .
> 
> Another proposal is done by Benedict; grouping the config parameters.
> 
> To make it clearer, he created a yaml
> 
> with comments mostly stripped.
> 
> In his version, there are basic settings for network, disk etc all grouped
> together, followed by operator tuneables mostly under limits within which
> we now have throughput, concurrency, capacity. This leads to settings for
> some features being kept separate (most notably for caching), but helps the
> operator understand what they have to play with for controlling resource
> consumption.
> 
> I am interested to hear what people think about the two options or if
> anyone has another idea to share, open discussion.
> 
> Thank you,
> 
> Ekaterina


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CASSANDRA-15234

2021-09-02 Thread Joshua McKenzie
Reading through the two, the grouping approach seems like it's a lot more
friendly to newcomers as well as providing context specific cues for
relationships between params you're editing. Showing and not telling, if
you will.

Opening up a 1500+ line .yaml file is very daunting, even if most of it is
comments. Can't blame folks for being overwhelmed at the prospect of tuning
Cassandra w/that as our operator config API. :)

~Josh

On Thu, Sep 2, 2021 at 1:48 PM David Capwell 
wrote:

> Thanks for bringing this back up; Caleb and I were talking about the lack
> of clarity with regard to CASSANDRA-16896, fleshing this out would make
> those configs nicer!
>
> >   To standardize naming - that we did by agreeing to the form noun_verb
>
> If we can document this, it would be great as stuff like “enabled” are
> inconsistent so not sure if I did it properly =D
>
> >
> >   Provision of values with units while maintaining backward
> compatibility.
>
> +1
>
> I really hate local_read_size_threshold_kb; I would love
> local_read_size_threshold: 10kb.  Once we have the infrastructure in place
> (believe your patch before had these tools) I would love to switch!
>
>
> > Another proposal is done by Benedict; grouping the config parameters.
>
> Yep, this is what triggered Caleb and I to talk about this thread!  To
> group or not to group; that is the question
>
> Personally I like grouping from an organization point of view so am in
> favor of that; though I will agree that it can be hard for some tools (such
> as bash templating), but feel we can always find a common ground
>
>
> > On Sep 2, 2021, at 8:44 AM, bened...@apache.org wrote:
> >
> > Thanks for bringing this to the list Ekaterina!
> >
> > It’s worth noting that the two don’t have to be in conflict: we could
> offer two template yaml with the parameters grouped differently, for users
> to decide for themselves.
> >
> > The proposals primarily define parameter names differently, with my
> proposal going by kind->place, and the other proposal maintaining (mostly)
> the existing name form (which is a bit more like place->kind). While the
> example yaml groups by kind, you can convert nested definitions into a
> ‘dot’ form (e.g. limits.concurrency.reads) for use in a different grouping.
> >
> > One advantage of grouping parameters together is that it aids
> maintaining coherency of naming between systems, and also potentially
> permits a more succinct config file and better discovery. But it’s far from
> a silver bullet, as value judgements have to be made about where the
> grouping lines are. I’m sure anything we settle on will be a huge
> improvement over the status quo, however.
> >
> >
> >
> >
> > From: Ekaterina Dimitrova 
> > Date: Thursday, 2 September 2021 at 16:32
> > To: dev@cassandra.apache.org 
> > Subject: [DISCUSS] CASSANDRA-15234
> > Hi team,
> >
> > I would like to bring to the attention of the community CASSANDRA-15234,
> > standardise config and JVM parameters.
> >
> > This is work we discussed back in Summer 2020 just before our first 4.0
> > Beta release. During the discussion we figured out that there is more
> than
> > one option to do the job and not enough time to get user feedback and
> > finish it so this was delayed post-4.0 And here I am, bringing it back to
> > the table.
> >
> > This work’s goal is:
> >
> >   -
> >
> >   To standardize naming - that we did by agreeing to the form noun_verb
> >   -
> >
> >   Provision of values with units while maintaining backward
> compatibility.
> >
> >
> > Those two parts are more or less already done.
> >
> > More interesting is the third part - reorganizing the cassandra.yaml
> file.
> >
> > My personal approach was to split it into sections, done here
> > <
> https://github.com/ekaterinadimitrova2/cassandra/blob/b4eebe080835da79d032f9314262c268b71172a8/conf/cassandra.yaml
> >
> > .
> >
> > Another proposal is done by Benedict; grouping the config parameters.
> >
> > To make it clearer, he created a yaml
> > <
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> >
> > with comments mostly stripped.
> >
> > In his version, there are basic settings for network, disk etc all
> grouped
> > together, followed by operator tuneables mostly under limits within which
> > we now have throughput, concurrency, capacity. This leads to settings for
> > some features being kept separate (most notably for caching), but helps
> the
> > operator understand what they have to play with for controlling resource
> > consumption.
> >
> > I am interested to hear what people think about the two options or if
> > anyone has another idea to share, open discussion.
> >
> > Thank you,
> >
> > Ekaterina
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>