Re: Proposing an Apache Cassandra Management process

2018-09-09 Thread Jonathan Haddad
It's interesting to me that someone would consider features of Reaper as
"technical debt"... an odd way to word it.  When we had proposed donating
Reaper to the project, I had made the assumption people would realize the
benefit of folding in a successful project.  A working codebase with a
large user base gives us an immediate tool, and it would let us migrate
everyone from using our tool to using an official tool.

>From the discussions we've had so far, nobody except Blake seems to care
about those users.  To me, the highest priority of this process is
considering what's best for the Cassandra community.  Providing an easy,
clear migration path forward for the thousands of Reaper users, supporting
all 4 backends and cassandra versions is the logical the way of
transitioning everyone to using the official tool.

If supporting everyone using 2.x with a postgres backend for storing
schedules isn't something anyone cares about, there's not much benefit from
using Reaper.  Gutting a bunch of the project won't help people migrate
over, and if we're not migrating everyone over, then we (TLP) will still
have to maintain Reaper.  That means we'll be indefinitely maintaining a
fork of the official admin, and probably not contributing to the main one,
our time is limited.  That's exactly what's going to happen if we start
with anything other than Reaper.  We've got a lot of teams asking for
features, and if people pay us to keep working on Reaper, we'll be doing
that.

So, with that in mind, if we're putting this to a vote, I'd like to be
clear - taking Reaper as-is about community - not it's technical prowess.
If we're not enabling users to move off our version and directly to the
official tool, I don't see a point in using Reaper at all except as a
reference to correctly run subrange repair and have a nice UI.  There's
plenty of things I'd do differently if I built it from scratch, I'm not
putting the codebase up on a pedestal.

This entire discussion has been incredibly frustrating for me, considering
we're talking about building an alternative to a well tested and widely
deployed codebase that likely won't produce anything of value for at least
a year (Cassandra 5?).  I'm also pretty sure most of the folks that have
cried out "tech debt" have spent less than 5 minutes looking at the
codebase.  I hope your enthusiasm at the moment would carry over if you
build a tech-debt free admin tool from the ground up.

TL;DR: If nobody cares about the Reaper community (which is a very large
subset of the cassandra community) there's no reason to start with Reaper,
it's not about the tech it's about the people.

Jon


On Sat, Sep 8, 2018 at 4:57 PM sankalp kohli  wrote:

> Most of the discussions have been around whether we take some side car or
> do a cherry pick approach where we add a feature at a time to side car and
> use the best implementation.
> We have been discussing this first in April and now for several days. I
> think we all want some progress here. We will start a vote soon..may be
> next week to decide which approach we will take. I don't see any other
> option to make progress besides a vote!!
>
> PS: I think these discussions are very good for the community and it shows
> people care about Apache Cassandra and it is a sign of healthy community.
> Several people offering to donate the side car or help build is showing the
> interest everyone has in it.
>
>
> On Sat, Sep 8, 2018 at 11:17 AM Joseph Lynch 
> wrote:
>
> > On Fri, Sep 7, 2018 at 10:00 PM Blake Eggleston 
> > wrote:
> > >
> > > Right, I understand the arguments for starting a new project. I’m not
> > saying reaper is, technically speaking, the best place to start. The
> point
> > I’m trying to make is that the non-technical advantages of using an
> > existing project as a starting point may outweigh the technical benefits
> of
> > a clean slate. Whether that’s the case or not, it’s not a strictly
> > technical decision, and the non-technical advantages of starting with
> > reaper need to be weighed.
> > >
> >
> > Technical debt doesn't just refer to the technical solutions, having
> > an existing user base means that a project has made promises in the
> > past (in the case of Priam 5+ years ago) that the new project would
> > have to keep if we make keeping users of those sidecars a goal (which
> > for the record I don't think should be a goal, I think the goal is to
> > make Cassandra the database work out of the box in the 4.x series).
> >
> > For example, Priam has to continue supporting the following as users
> > actively use them (including Netflix):
> > * Parallel token assignment and creation allowing parallel bootstrap
> > and parallel doubling of hundred node clusters at once (as long as you
> > use single tokens and run in AWS with asgs).
> > * 3+ backup solutions, as well as assumptions about where in e.g. S3
> > backups are present and what format they're present in.
> > * Numerous configuration options and UI elements (mostly 5 year

Proposing an Apache Cassandra Management process

2018-09-09 Thread Stefan Podkowinski
Does it have to be a single project with functionality provided by
multiple plugins? Designing a plugin API at this point seems to be a bit
early and comes with additional complexity around managing plugins in
general.

I was more thinking into the direction of: "what can we do to enable
people to create any kind of side car or tooling solution?". Thinks like:

Common cluster discovery and management API
* Detect local Cassandra processes
* Discover and receive events on cluster topology
* Get assigned tokens for nodes
* Read node configuration
* Health checks (as already proposed)

Any side cars should be easy to install on nodes that already run Cassandra
* Scripts for packaging (tar, deb, rpm)
* Templates for systemd support, optionally with auto-startup dependency
on the Cassandra main process

Integration testing
* Provide basic testing framework for mocking cluster state and messages

Support for other languages / avoid having to use JMX
* JMX bridge (HTTP? gRPC?, already implemented in #14346?)

Obviously the whole side car discussion is not moving into a direction
everyone's happy with. Would it be an option to take a step back and
start implementing such a tooling framework with scripts and libraries
for the features described above, as a small GitHub project, instead of
putting an existing side-car solution up for vote? If that would work
and we get people collaborating on code shared between existing
side-cars, then we could take the next step and think about either
revisit the "official Cassandra side-car" topic, or add the created
client tooling framework as official sub-project to the Cassandra
project (maybe via Apache incubator).


On 08.09.18 02:49, Joseph Lynch wrote:
> On Fri, Sep 7, 2018 at 5:03 PM Jonathan Haddad  wrote:
>> We haven’t even defined any requirements for an admin tool. It’s hard to
>> make a case for anything without agreement on what we’re trying to build.
>>
> We were/are trying to sketch out scope/requirements in the #14395 and
> #14346 tickets as well as their associated design documents. I think
> the general proposed direction is a distributed 1:1 management sidecar
> process similar in architecture to Netflix's Priam except explicitly
> built to be general and pluggable by anyone rather than tightly
> coupled to AWS.
>
> Dinesh, Vinay and I were aiming for low amounts of scope at first and
> take things in an iterative approach with just enough upfront design
> but not so much we are unable to make any progress at all. For example
> maybe something like:
>
> 1. Get a super simple and non controversial sidecar process that ships
> with Cassandra and exposes a lightweight HTTP interface to e.g. some
> basic JMX endpoints
> 2a. Add a pluggable execution engine for cron/oneshot/scheduled jobs
> with the basic interfaces and state store and such
> 2b. Start scoping and implementing the full HTTP interface, e.g.
> backup status, cluster health status, etc ...
> 3a. Start integrating implementations of the jobs from 2a such as
> snapshot, backup, cluster restart, daemon + sstable upgrade, repair,
> etc
> 3b. Start integrating UI components that pair with the HTTP interface from 2b
> 4. ?? Perhaps start unlocking next generation operations like moving
> "background" activities like compaction, streaming, repair etc into
> one or more sidecar contained processes to ensure the main daemon only
> handles read+write requests
>
> There are going to be a lot of questions to answer, and I think trying
> to answer them all up front will mean that we get nowhere or make
> unfortunate compromises that cripple the project from the start. If
> people think we need to do more design and discussion than we have
> been doing then we can spend more time on the design, but personally
> I'd rather start iterating on code and prove value incrementally. If
> it doesn't work out we won't release it GA to the community ...
>
> -Joey
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Proposing an Apache Cassandra Management process

2018-09-09 Thread Elliott Sims
A big one to add to your list there, IMO as a user:
* API for determining detailed repair state (and history?).  Essentially,
something beyond just "Is some sort of repair running?" so that tools like
Reaper can parallelize better.

On Sun, Sep 9, 2018 at 8:30 AM, Stefan Podkowinski  wrote:

> Does it have to be a single project with functionality provided by
> multiple plugins? Designing a plugin API at this point seems to be a bit
> early and comes with additional complexity around managing plugins in
> general.
>
> I was more thinking into the direction of: "what can we do to enable
> people to create any kind of side car or tooling solution?". Thinks like:
>
> Common cluster discovery and management API
> * Detect local Cassandra processes
> * Discover and receive events on cluster topology
> * Get assigned tokens for nodes
> * Read node configuration
> * Health checks (as already proposed)
>
> Any side cars should be easy to install on nodes that already run Cassandra
> * Scripts for packaging (tar, deb, rpm)
> * Templates for systemd support, optionally with auto-startup dependency
> on the Cassandra main process
>
> Integration testing
> * Provide basic testing framework for mocking cluster state and messages
>
> Support for other languages / avoid having to use JMX
> * JMX bridge (HTTP? gRPC?, already implemented in #14346?)
>
> Obviously the whole side car discussion is not moving into a direction
> everyone's happy with. Would it be an option to take a step back and
> start implementing such a tooling framework with scripts and libraries
> for the features described above, as a small GitHub project, instead of
> putting an existing side-car solution up for vote? If that would work
> and we get people collaborating on code shared between existing
> side-cars, then we could take the next step and think about either
> revisit the "official Cassandra side-car" topic, or add the created
> client tooling framework as official sub-project to the Cassandra
> project (maybe via Apache incubator).
>
>
> On 08.09.18 02:49, Joseph Lynch wrote:
> > On Fri, Sep 7, 2018 at 5:03 PM Jonathan Haddad 
> wrote:
> >> We haven’t even defined any requirements for an admin tool. It’s hard to
> >> make a case for anything without agreement on what we’re trying to
> build.
> >>
> > We were/are trying to sketch out scope/requirements in the #14395 and
> > #14346 tickets as well as their associated design documents. I think
> > the general proposed direction is a distributed 1:1 management sidecar
> > process similar in architecture to Netflix's Priam except explicitly
> > built to be general and pluggable by anyone rather than tightly
> > coupled to AWS.
> >
> > Dinesh, Vinay and I were aiming for low amounts of scope at first and
> > take things in an iterative approach with just enough upfront design
> > but not so much we are unable to make any progress at all. For example
> > maybe something like:
> >
> > 1. Get a super simple and non controversial sidecar process that ships
> > with Cassandra and exposes a lightweight HTTP interface to e.g. some
> > basic JMX endpoints
> > 2a. Add a pluggable execution engine for cron/oneshot/scheduled jobs
> > with the basic interfaces and state store and such
> > 2b. Start scoping and implementing the full HTTP interface, e.g.
> > backup status, cluster health status, etc ...
> > 3a. Start integrating implementations of the jobs from 2a such as
> > snapshot, backup, cluster restart, daemon + sstable upgrade, repair,
> > etc
> > 3b. Start integrating UI components that pair with the HTTP interface
> from 2b
> > 4. ?? Perhaps start unlocking next generation operations like moving
> > "background" activities like compaction, streaming, repair etc into
> > one or more sidecar contained processes to ensure the main daemon only
> > handles read+write requests
> >
> > There are going to be a lot of questions to answer, and I think trying
> > to answer them all up front will mean that we get nowhere or make
> > unfortunate compromises that cripple the project from the start. If
> > people think we need to do more design and discussion than we have
> > been doing then we can spend more time on the design, but personally
> > I'd rather start iterating on code and prove value incrementally. If
> > it doesn't work out we won't release it GA to the community ...
> >
> > -Joey
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Proposing an Apache Cassandra Management process

2018-09-09 Thread sankalp kohli
I agree with Jon and I think folks who are talking about tech debts in
Reaper should elaborate with examples about these tech debts. Can we be
more precise and list them down? I see it spread out over this long email
thread!!

On Sun, Sep 9, 2018 at 6:29 AM Elliott Sims  wrote:

> A big one to add to your list there, IMO as a user:
> * API for determining detailed repair state (and history?).  Essentially,
> something beyond just "Is some sort of repair running?" so that tools like
> Reaper can parallelize better.
>
> On Sun, Sep 9, 2018 at 8:30 AM, Stefan Podkowinski 
> wrote:
>
> > Does it have to be a single project with functionality provided by
> > multiple plugins? Designing a plugin API at this point seems to be a bit
> > early and comes with additional complexity around managing plugins in
> > general.
> >
> > I was more thinking into the direction of: "what can we do to enable
> > people to create any kind of side car or tooling solution?". Thinks like:
> >
> > Common cluster discovery and management API
> > * Detect local Cassandra processes
> > * Discover and receive events on cluster topology
> > * Get assigned tokens for nodes
> > * Read node configuration
> > * Health checks (as already proposed)
> >
> > Any side cars should be easy to install on nodes that already run
> Cassandra
> > * Scripts for packaging (tar, deb, rpm)
> > * Templates for systemd support, optionally with auto-startup dependency
> > on the Cassandra main process
> >
> > Integration testing
> > * Provide basic testing framework for mocking cluster state and messages
> >
> > Support for other languages / avoid having to use JMX
> > * JMX bridge (HTTP? gRPC?, already implemented in #14346?)
> >
> > Obviously the whole side car discussion is not moving into a direction
> > everyone's happy with. Would it be an option to take a step back and
> > start implementing such a tooling framework with scripts and libraries
> > for the features described above, as a small GitHub project, instead of
> > putting an existing side-car solution up for vote? If that would work
> > and we get people collaborating on code shared between existing
> > side-cars, then we could take the next step and think about either
> > revisit the "official Cassandra side-car" topic, or add the created
> > client tooling framework as official sub-project to the Cassandra
> > project (maybe via Apache incubator).
> >
> >
> > On 08.09.18 02:49, Joseph Lynch wrote:
> > > On Fri, Sep 7, 2018 at 5:03 PM Jonathan Haddad 
> > wrote:
> > >> We haven’t even defined any requirements for an admin tool. It’s hard
> to
> > >> make a case for anything without agreement on what we’re trying to
> > build.
> > >>
> > > We were/are trying to sketch out scope/requirements in the #14395 and
> > > #14346 tickets as well as their associated design documents. I think
> > > the general proposed direction is a distributed 1:1 management sidecar
> > > process similar in architecture to Netflix's Priam except explicitly
> > > built to be general and pluggable by anyone rather than tightly
> > > coupled to AWS.
> > >
> > > Dinesh, Vinay and I were aiming for low amounts of scope at first and
> > > take things in an iterative approach with just enough upfront design
> > > but not so much we are unable to make any progress at all. For example
> > > maybe something like:
> > >
> > > 1. Get a super simple and non controversial sidecar process that ships
> > > with Cassandra and exposes a lightweight HTTP interface to e.g. some
> > > basic JMX endpoints
> > > 2a. Add a pluggable execution engine for cron/oneshot/scheduled jobs
> > > with the basic interfaces and state store and such
> > > 2b. Start scoping and implementing the full HTTP interface, e.g.
> > > backup status, cluster health status, etc ...
> > > 3a. Start integrating implementations of the jobs from 2a such as
> > > snapshot, backup, cluster restart, daemon + sstable upgrade, repair,
> > > etc
> > > 3b. Start integrating UI components that pair with the HTTP interface
> > from 2b
> > > 4. ?? Perhaps start unlocking next generation operations like moving
> > > "background" activities like compaction, streaming, repair etc into
> > > one or more sidecar contained processes to ensure the main daemon only
> > > handles read+write requests
> > >
> > > There are going to be a lot of questions to answer, and I think trying
> > > to answer them all up front will mean that we get nowhere or make
> > > unfortunate compromises that cripple the project from the start. If
> > > people think we need to do more design and discussion than we have
> > > been doing then we can spend more time on the design, but personally
> > > I'd rather start iterating on code and prove value incrementally. If
> > > it doesn't work out we won't release it GA to the community ...
> > >
> > > -Joey
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For addition