Re: Proposing an Apache Cassandra Management process

Mick Semb Wever Mon, 20 Aug 2018 04:24:01 -0700


On Fri, 17 Aug 2018, at 14:27, Sankalp Kohli wrote:
> I am bumping this thread because patch has landed for this with repair 
> functionality.



We are looking to contribute Reaper to the Cassandra project. 

Looking at the patch it's very similar in its base design already, but Reaper 
does has a lot more to offer. We have all been working hard to move it to also 
being a side-car so it can be contributed. This raises a number of relevant 
questions to this thread: would we then accept both works in the Cassandra 
project, and what burden would it put on the current PMC to maintain both works.

I share Stefan's concern that consensus had not been met around a side-car, and 
that it was somehow default accepted before a patch landed. This seems at odds 
when we're already struggling to keep up with the incoming 
patches/contributions, and there could be other git repos in the project we 
will need to support in the future too. But I'm also curious about the whole 
"Community over Code" angle to this, how do we encourage multiple external 
works to collaborate together building value in both the technical and 
community.  
 
The Reaper project has worked hard in building both its user and contributor 
base. And I would have thought these, including having the contributor base 
overlap with the C* PMC, were prerequisites before moving a larger body of work 
into the project (separate git repo or not). I guess this isn't so much 
"Community over Code", but it illustrates a concern regarding abandoned code 
when there's no existing track record of maintaining it as OSS, as opposed to 
expecting an existing "show, don't tell" culture. Reaper for example has 
stronger indicators for ongoing support and an existing OSS user base: today C* 
committers having contributed to Reaper are Jon, Stefan, Nate, and myself, 
amongst the 40 contributors in total. And we've been making steps to involve it 
more into the C* community (eg users ML), without being too presumptuous. On 
the technical side: Reaper supports (or can easily) all the concerns that the 
proposal here raises: distributed nodetool commands, centralising jmx 
interfacing, scheduling ops (repairs, snapshots, compactions, cleanups, etc), 
monitoring and diagnostics, etc etc. It's designed so that it can be a single 
instance, instance-per-datacenter, or side-car (per process). When there are 
multiple instances in a datacenter you get HA. You have a choice of different 
storage backends (memory, postgres, c*). You can ofc use a separate C* cluster 
as a backend so to separate infrastructure data from production data. And it's 
got an UI for C* Diagnostics already (which imposes a different jmx interface 
of polling for events rather than subscribing to jmx notifications which we 
know is problematic, thanks to Stefan). Anyway, that's my plug for Reaper :-)

There's been little effort in evaluating these two bodies of work, one which is 
largely unknown to us, and my concern is how we would fairly support both going 
into the future? 

Another option would be that this side-car patch first exists as a github 
project for a period of time, on par to how Reaper has been. This will help 
evaluate its use and to first build up its contributors. This makes it easier 
for the C* PMC to choose which projects it would want to formally maintain, and 
to do so based on factors beyond merits of the technical. We may even see it 
converge (or collaborate more) with Reaper, a win for everyone.

regards,
Mick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Proposing an Apache Cassandra Management process

Reply via email to