Kubernetes Operator: Can We Preserve CassKop's Flexibility?

2020-10-07 Thread Tom Offermann
I've been following the discussion about Kubernetes operators with a great
deal of interest. At New Relic, we're about to move our Cassandra Clusters
from bare-metal hosts in our datacenters to Kubernetes clusters in AWS, so
we've been looking closely at the current operators.

Our goals:

* Don't write our own operator.

* Choose the community standard, if possible. If not possible, choose an
operator with active development, usage, and community.

* Choose an operator that can work with our existing way of managing
clusters. Most significantly, at New Relic we do not use virtual nodes in
our Cassandra clusters. Instead, we continue to assign initial_tokens to
individual nodes. While we certainly don't expect an operator to support
this use case by default,  we do hope that an operator will make it
possible.

* Don't run a forked version of the operator.

Both [cass-operator][1] and [CassKop][2] worked very well and we were
really impressed with both of them. Heading into the evaluation, we
expected to choose Datastax's cass-operator. Given Datastax's position in
the Cassandra community, and given that they wrote the most widely-used
Cassandra clients, they seemed like they would be in the best position to
provide the community standard.

We ended up choosing CassKop.

However, I don't want this to email to be viewed as lobbying for choosing
one operator over another. I'm excited about the possibility that's
currently being discussed of merging development efforts and incorporating
CassKop features into cass-operator.

I do want to highlight some of the advantages that CassKop currently offers
for our use case, in the hope that we can preserve those advantages going
forward. (Or, even improve them!)

1. CassKop offers a huge amount of flexibility for modifying Cassandra
configuration files. If needed, you can swap in your own [bootstrap][3]
docker image to manipulate the Cassandra configuration files, but
oftentimes you don't even need to do that. Since CassKop offers the ability
to define a pre_run.sh script that will run in the bootstrap container, you
can get pretty far with some shell scripting. In our pre_run.sh, we do
per-pod configuration to assign initial token values.

We didn't see an easy way to perform per-pod configuration with
cass-operator. There is no equivalent pre_run.sh hook in
[cass-config-builder][4], which is the init container in cass-operator
that's comparable to CassKop's bootstrap container.

2. CassKop is less opinionated about which Cassandra version you want to
run. My understanding is that cass-config-builder adds a layer of
abstraction so that it will produce configuration that is tailored to
certain versions of open-source and DSE Cassandra. Which works great,
unless you want to run a version of Cassandra that isn't supported. We were
surprised to see that cass-operator only works with a [handful of Cassandra
versions][5].

There didn't seem to be an easy way to use cass-operator with an earlier
version of Cassandra than those that are officially supported.

3. CassKop requires adoption of fewer, less-complex components. CassKop's
bootstrap container was easier for us to wrap our heads around than
cass-config-builder. In addition, using cass-operator also required the
usage of the [management-api][6] sidecar. This means that the adoption of a
new operator also required the adoption of a new sidecar as well. Perhaps
this is overstated, but it felt like choosing cass-operator required
embracing a whole ecosystem, rather than simply an operator.

Now, if the management-api sidecar was widely used throughout the
community, then I wouldn't feel the same reluctance to use it. Knowing that
it was going to be the community standard moving forward would be a big
help. But, until it achieves that role as the standard, then choosing
cass-operator means choosing both an operator and a sidecar, when there's
no guarantee that either of them will become the standard. It's a bigger
commitment.

I realize that the concerns we have when choosing an operator may not be
shared by all. I raise these points with the hope that we can keep them in
mind. It's possible to build flexibility into a Cassandra operator, so that
it can be used in ways that deviate from the default, or even used in ways
that the original authors didn't anticipate.

I do want to thank both Orange and Datastax for all of the work they've put
into their operators, as well as everyone here discussing the best way to
move forward. We are super appreciative and I'm optimistic that some of us
at New Relic will be in a position soon to be able to contribute to these
efforts.

Thanks,
Tom

[1]: https://github.com/datastax/cass-operator
[2]: https://github.com/Orange-OpenSource/casskop
[3]:
https://github.com/Orange-OpenSource/casskop/tree/master/docker/bootstrap
[4]: https://github.com/datastax/cass-config-builder
[5]:
https://github.com/datastax/cass-operator/blob/master/operator/deploy/crds/cassandra.datastax.com_cassandradatace

Re: 4.0 GA scope: the opt-in approach (CALL TO ACTION)

2020-10-07 Thread Joshua McKenzie
Thanks for taking action on that Scott.

Just want to ping the list here as a reminder for everyone: 48 hours to go!
Reminder: *anything you think is crucial for us to get in before 4.0 GA,
please remove the 4.0-triage FixVersion from the tickets by Friday*.

Thanks.



On Tue, Oct 06, 2020 at 11:57 PM, Scott Andreas 
wrote:

> Thank you, Josh! Just took a pass and opted in 22 of the 55 tickets with
> the triage keyword as of this evening, most of which are active this month
> or are for flaky/failing tests.
>
> – Scott
>
> 
> From: Joshua McKenzie  Sent: Monday, October 5,
> 2020 11:01 AM
> To: dev@cassandra.apache.org
> Subject: Re: 4.0 GA scope: the opt-in approach (CALL TO ACTION)
>
> Friendly reminder: please check the link in the previous email and remove
> the 4.0-triage version from any tickets you want to keep included in 4.0
> GA.
>
> Thanks.
>
> ~Josh
>
> On Fri, Oct 02, 2020 at 5:58 PM, Joshua McKenzie 
> wrote:
>
> As discussed on the contributor call, we collectively agreed to try
> something new to determine scope for 4.0. Rather than going ticket by
> ticket or "asking for forgiveness" and having people move things out
> individually, we've flagged all tickets in the 4.0 scope that are still
> open with the fixversion '4.0-triage' with the intent to "opt things in".
>
> Link: https://issues.apache.org/jira/issues/
> ?jql=project%20%3D%20cassandra%20and%20fixversion%20%3D%204.0-triage
>
> If there's a ticket you want to keep in the 4.0 release, please edit the
> ticket and remove the '4.0-triage' fixversion. Let's target having this
> done by End of Day Friday, October 9th (one week from now).
>
> If you don't have access to remove that fixver from a ticket, please reach
> out to me (jmckenzie), Jordan West, or Jon Meredith on the-asf slack in
> #cassandra-dev or via DM and we'll help you out.
>
> At the end of day on Oct 9th, we'll go through and move every ticket that
> still has 4.0-triage into 4.0.x and have our scope for 4.0 GA.
>
> Sound good?
>
> ~Josh
>
> - To
> unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional
> commands, e-mail: dev-h...@cassandra.apache.org
>


Re: 4.0 GA scope: the opt-in approach (CALL TO ACTION)

2020-10-07 Thread David Capwell
Updated the link to exclude resolved; down to 27 remaining (was 32)

https://issues.apache.org/jira/issues/?jql=project%20%3D%20cassandra%20and%20fixversion%20%3D%204.0-triage%20and%20status%20!%3D%20Resolved

> On Oct 7, 2020, at 12:16 PM, Joshua McKenzie  wrote:
> 
> Thanks for taking action on that Scott.
> 
> Just want to ping the list here as a reminder for everyone: 48 hours to go!
> Reminder: *anything you think is crucial for us to get in before 4.0 GA,
> please remove the 4.0-triage FixVersion from the tickets by Friday*.
> 
> Thanks.
> 
> 
> 
> On Tue, Oct 06, 2020 at 11:57 PM, Scott Andreas 
> wrote:
> 
>> Thank you, Josh! Just took a pass and opted in 22 of the 55 tickets with
>> the triage keyword as of this evening, most of which are active this month
>> or are for flaky/failing tests.
>> 
>> – Scott
>> 
>> 
>> From: Joshua McKenzie  Sent: Monday, October 5,
>> 2020 11:01 AM
>> To: dev@cassandra.apache.org
>> Subject: Re: 4.0 GA scope: the opt-in approach (CALL TO ACTION)
>> 
>> Friendly reminder: please check the link in the previous email and remove
>> the 4.0-triage version from any tickets you want to keep included in 4.0
>> GA.
>> 
>> Thanks.
>> 
>> ~Josh
>> 
>> On Fri, Oct 02, 2020 at 5:58 PM, Joshua McKenzie 
>> wrote:
>> 
>> As discussed on the contributor call, we collectively agreed to try
>> something new to determine scope for 4.0. Rather than going ticket by
>> ticket or "asking for forgiveness" and having people move things out
>> individually, we've flagged all tickets in the 4.0 scope that are still
>> open with the fixversion '4.0-triage' with the intent to "opt things in".
>> 
>> Link: https://issues.apache.org/jira/issues/
>> ?jql=project%20%3D%20cassandra%20and%20fixversion%20%3D%204.0-triage
>> 
>> If there's a ticket you want to keep in the 4.0 release, please edit the
>> ticket and remove the '4.0-triage' fixversion. Let's target having this
>> done by End of Day Friday, October 9th (one week from now).
>> 
>> If you don't have access to remove that fixver from a ticket, please reach
>> out to me (jmckenzie), Jordan West, or Jon Meredith on the-asf slack in
>> #cassandra-dev or via DM and we'll help you out.
>> 
>> At the end of day on Oct 9th, we'll go through and move every ticket that
>> still has 4.0-triage into 4.0.x and have our scope for 4.0 GA.
>> 
>> Sound good?
>> 
>> ~Josh
>> 
>> - To
>> unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional
>> commands, e-mail: dev-h...@cassandra.apache.org
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Kubernetes Operator: Can We Preserve CassKop's Flexibility?

2020-10-07 Thread Cyril Scetbon
Thank you Tom for your support, as one of the main contributors of CassKop I’m 
happy to see that the efforts we put in it to try to support as many 
configurations as possible is well appreciated.

When we first started to talk about creating a kubernetes operator we always 
mentioned the features that we added and the importance of trying to fulfill 
the needs of every user. All those choices have a reason, a situation that 
happened on production, a configuration that we used to apply to some of our 
clusters or a situation that could potentially happen and that we needed to 
overcome. An example is the fact that IPs could change when a kubernetes node 
restarts, and possibly IPs could be exchanged between 2 nodes of the same 
cluster. We then implemented a detection algorithm  that when it sees it 
happening tries to restart pods which should get new IPs and solve the problem 
https://orange-opensource.github.io/casskop/docs/3_configuration_deployment/9_advanced_configuration#cross-ip-management

The features we tried to add solved use cases that happened on production or 
that could happen due to the environment and we tried to make it as simple and 
intuitive as possible. We put also a lot of efforts in the documentation which 
is not perfect but serves the purpose of explaining and detailing how to use 
CassKop. 

Soon, when we start talking about porting our features, we’ll of course support 
the importance of making it opened (tbh we had in mind to make it supported by 
any recent Cassandra versions and even ScyllaDB) as much as possible, simple, 
configurable and adaptable if possible. Of course not all versions are 
supported even by CassKop cause we make some Jolokia calls and if the JMX bean 
change some important operations could stop working (We check that a datacenter 
has no data replicated to it before decommissioning 

 it for instance).

I had a few discussions with some of the cass-operator developers and I think 
we understood each other and know that in order for it to be adopted and the 
work to be fruitful no feature should be lost on the way and if there is a 
better way to do things we’ll find it together. Orange also uses CassKop and 
will keep using it as long as the crucial features are not available. We’ll 
also have to find a way to migrate from CassKop to Cass-operator without 
breaking everything. But let’s start walking before running 😉

—
Cyril Scetbon

> On Oct 7, 2020, at 2:23 PM, Tom Offermann  
> wrote:
> 
> I've been following the discussion about Kubernetes operators with a great
> deal of interest. At New Relic, we're about to move our Cassandra Clusters
> from bare-metal hosts in our datacenters to Kubernetes clusters in AWS, so
> we've been looking closely at the current operators.
> 
> Our goals:
> 
> * Don't write our own operator.
> 
> * Choose the community standard, if possible. If not possible, choose an
> operator with active development, usage, and community.
> 
> * Choose an operator that can work with our existing way of managing
> clusters. Most significantly, at New Relic we do not use virtual nodes in
> our Cassandra clusters. Instead, we continue to assign initial_tokens to
> individual nodes. While we certainly don't expect an operator to support
> this use case by default,  we do hope that an operator will make it
> possible.
> 
> * Don't run a forked version of the operator.
> 
> Both [cass-operator][1] and [CassKop][2] worked very well and we were
> really impressed with both of them. Heading into the evaluation, we
> expected to choose Datastax's cass-operator. Given Datastax's position in
> the Cassandra community, and given that they wrote the most widely-used
> Cassandra clients, they seemed like they would be in the best position to
> provide the community standard.
> 
> We ended up choosing CassKop.
> 
> However, I don't want this to email to be viewed as lobbying for choosing
> one operator over another. I'm excited about the possibility that's
> currently being discussed of merging development efforts and incorporating
> CassKop features into cass-operator.
> 
> I do want to highlight some of the advantages that CassKop currently offers
> for our use case, in the hope that we can preserve those advantages going
> forward. (Or, even improve them!)
> 
> 1. CassKop offers a huge amount of flexibility for modifying Cassandra
> configuration files. If needed, you can swap in your own [bootstrap][3]
> docker image to manipulate the Cassandra configuration files, but
> oftentimes you don't even need to do that. Since CassKop offers the ability
> to define a pre_run.sh script that will run in the bootstrap container, you
> can get pretty far with some shell scripting. In our pre_run.sh, we do
> per-pod configuration to assign initial token values.
> 
> We didn't see an easy way to perform per-pod configuration with
> cass-operator. There is no eq

Re: cassandra.logdir

2020-10-07 Thread Cyril Scetbon
Done https://issues.apache.org/jira/browse/CASSANDRA-16199

Best
—
Cyril Scetbon

> On Sep 1, 2020, at 12:22 PM, Joshua McKenzie  wrote:
> 
> Go for it!
> 
> On Mon, Aug 31, 2020 at 10:23 PM Cyril Scetbon 
> wrote:
> 
>> Hey guys,
>> 
>> Experimenting with Cassandra 4.0 I’m seeing that when CASSANDRA_LOG_DIR is
>> set and ${cassandra.logdir} is used in logback.xml nodetool doesn’t use the
>> env variable. It’s different for cassandra for instance
>> https://github.com/apache/cassandra/blob/324267b3c0676ad31bd4f2fac0e2e673a9257a37/bin/cassandra#L186
>> <
>> https://github.com/apache/cassandra/blob/324267b3c0676ad31bd4f2fac0e2e673a9257a37/bin/cassandra#L186>.
>> I feel like it should be added to
>> https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/bin/nodetool
>> <
>> https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/bin/nodetool>,
>> any objection to creating a ticket to do it ?
>> 
>> Thanks
>> —
>> Cyril
>> 
>>