Re: Proposing an Apache Cassandra Management process

Jonathan Haddad Fri, 07 Sep 2018 17:03:29 -0700

We haven’t even defined any requirements for an admin tool. It’s hard to
make a case for anything without agreement on what we’re trying to build.


On Fri, Sep 7, 2018 at 7:17 PM Jeff Jirsa <jji...@gmail.com> wrote:

> How can we continue moving this forward?
>
> Mick/Jon/TLP folks, is there a path here where we commit the
> Netflix-provided management process, and you augment Reaper to work with
> it?
> Is there a way we can make a larger umbrella that's modular that can
> support either/both?
> Does anyone believe there's a clear, objective argument that one is
> strictly better than the other? I haven't seen one.
>
>
>
> On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala
> <rtangir...@netflix.com.invalid> wrote:
>
> > +1 to everything that Joey articulated with emphasis on the fact that
> > contributions should be evaluated based on the merit of code and their
> > value add to the whole offering. I  hope it does not matter whether that
> > contribution comes from PMC member or a person who is not a committer. I
> > would like the process to be such that it encourages the new members to
> be
> > a part of the community and not shy away from contributing to the code
> > assuming their contributions are valued differently than committers or
> PMC
> > members. It would be sad to see the contributions decrease if we go down
> > that path.
> >
> > *Regards,*
> >
> > *Roopa Tangirala*
> >
> > Engineering Manager CDE
> >
> > *(408) 438-3156 - mobile*
> >
> >
> >
> >
> >
> >
> > On Mon, Aug 20, 2018 at 2:58 PM Joseph Lynch <joe.e.ly...@gmail.com>
> > wrote:
> >
> > > > We are looking to contribute Reaper to the Cassandra project.
> > > >
> > > Just to clarify are you proposing contributing Reaper as a project via
> > > donation or you are planning on contributing the features of Reaper as
> > > patches to Cassandra? If the former how far along are you on the
> donation
> > > process? If the latter, when do you think you would have patches ready
> > for
> > > consideration / review?
> > >
> > >
> > > > Looking at the patch it's very similar in its base design already,
> but
> > > > Reaper does has a lot more to offer. We have all been working hard to
> > > move
> > > > it to also being a side-car so it can be contributed. This raises a
> > > number
> > > > of relevant questions to this thread: would we then accept both works
> > in
> > > > the Cassandra project, and what burden would it put on the current
> PMC
> > to
> > > > maintain both works.
> > > >
> > > I would hope that we would collaborate on merging the best parts of all
> > > into the official Cassandra sidecar, taking the always on, shared
> > nothing,
> > > highly available system that we've contributed a patchset for and
> adding
> > in
> > > many of the repair features (e.g. schedules, a nice web UI) that Reaper
> > > has.
> > >
> > >
> > > > I share Stefan's concern that consensus had not been met around a
> > > > side-car, and that it was somehow default accepted before a patch
> > landed.
> > >
> > >
> > > I feel this is not correct or fair. The sidecar and repair discussions
> > have
> > > been anything _but_ "default accepted". The timeline of consensus
> > building
> > > involving the management sidecar and repair scheduling plans:
> > >
> > > Dec 2016: Vinay worked with Jon and Alex to try to collaborate on
> Reaper
> > to
> > > come up with design goals for a repair scheduler that could work at
> > Netflix
> > > scale.
> > >
> > > ~Feb 2017: Netflix believes that the fundamental design gaps prevented
> us
> > > from using Reaper as it relies heavily on remote JMX connections and
> > > central coordination.
> > >
> > > Sep. 2017: Vinay gives a lightning talk at NGCC about a highly
> available
> > > and distributed repair scheduling sidecar/tool. He is encouraged by
> > > multiple committers to build repair scheduling into the daemon itself
> and
> > > not as a sidecar so the database is truly eventually consistent.
> > >
> > > ~Jun. 2017 - Feb. 2018: Based on internal need and the positive
> feedback
> > at
> > > NGCC, Vinay and myself prototype the distributed repair scheduler
> within
> > > Priam and roll it out at Netflix scale.
> > >
> > > Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20
> page
> > > design document for adding repair scheduling to the daemon itself and
> > open
> > > the design up for feedback from the community. We get feedback from
> Alex,
> > > Blake, Nate, Stefan, and Mick. As far as I know there were zero
> proposals
> > > to contribute Reaper at this point. We hear the consensus that the
> > > community would prefer repair scheduling in a separate distributed
> > sidecar
> > > rather than in the daemon itself and we re-work the design to match
> this
> > > consensus, re-aligning with our original proposal at NGCC.
> > >
> > > Apr 2018: Blake brings the discussion of repair scheduling to the dev
> > list
> > > (
> > >
> > >
> >
> https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E
> > > ).
> > > Many community members give positive feedback that we should solve it
> as
> > > part of Cassandra and there is still no mention of contributing Reaper
> at
> > > this point. The last message is my attempted summary giving context on
> > how
> > > we want to take the best of all the sidecars (OpsCenter, Priam, Reaper)
> > and
> > > ship them with Cassandra.
> > >
> > > Apr. 2018: Dinesh opens CASSANDRA-14395 along with a public design
> > document
> > > for gathering feedback on a general management sidecar. Sankalp and
> > Dinesh
> > > encourage Vinay and myself to kickstart that sidecar using the repair
> > > scheduler patch
> > >
> > > Apr 2018: Dinesh reaches out to the dev list (
> > >
> > >
> >
> https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E
> > > )
> > > about the general management process to gain further feedback. All
> > feedback
> > > remains positive as it is a potential place for multiple community
> > members
> > > to contribute their various sidecar functionality.
> > >
> > > May-Jul 2017: Vinay and I work on creating a basic sidecar for running
> > the
> > > repair scheduler based on the feedback from the community in
> > > CASSANDRA-14346 and CASSANDRA-14395
> > >
> > > Jun 2018: I bump CASSANDRA-14346 indicating we're still working on
> this,
> > > nobody objects
> > >
> > > Jul 2018: Sankalp asks on the dev list if anyone has feature Jiras
> anyone
> > > needs review for before 4.0, I mention again that we've nearly got the
> > > basic sidecar and repair scheduling work done and will need help with
> > > review. No one responds.
> > >
> > > Aug 2018: We submit a patch that brings a basic distributed sidecar and
> > > robust distributed repair to Cassandra itself. Dinesh mentions that he
> > will
> > > try to review. Now folks appear concerned about it being in tree and
> > > instead maybe it should go in a different repo all together. I don't
> > think
> > > we have consensus on the repo choice yet.
> > >
> > > This seems at odds when we're already struggling to keep up with the
> > > > incoming patches/contributions, and there could be other git repos in
> > the
> > > > project we will need to support in the future too. But I'm also
> curious
> > > > about the whole "Community over Code" angle to this, how do we
> > encourage
> > > > multiple external works to collaborate together building value in
> both
> > > the
> > > > technical and community.
> > > >
> > >
> > > I viewed this management sidecar as a way for us to stop, as a
> community,
> > > building the same thing over and over again. Netflix maintains Priam,
> > Last
> > > pickle maintains Reaper, Datastax maintains OpsCenter. Why can't we
> take
> > > the best of Reaper (e.g. schedules, diagnostic events, UI) and leave
> the
> > > worst (e.g. centralized design with lots of locking) and combine it
> with
> > > the best of Priam (robust shared nothing sidecar that makes Cassandra
> > > management easy) and leave the worst (a bunch of technical debt), and
> > > iterate towards one sidecar that allows Cassandra users to just run
> their
> > > database.
> > >
> > >
> > > > The Reaper project has worked hard in building both its user and
> > > > contributor base. And I would have thought these, including having
> the
> > > > contributor base overlap with the C* PMC, were prerequisites before
> > > moving
> > > > a larger body of work into the project (separate git repo or not). I
> > > guess
> > > > this isn't so much "Community over Code", but it illustrates a
> concern
> > > > regarding abandoned code when there's no existing track record of
> > > > maintaining it as OSS, as opposed to expecting an existing "show,
> don't
> > > > tell" culture. Reaper for example has stronger indicators for ongoing
> > > > support and an existing OSS user base: today C* committers having
> > > > contributed to Reaper are Jon, Stefan, Nate, and myself, amongst the
> 40
> > > > contributors in total. And we've been making steps to involve it more
> > > into
> > > > the C* community (eg users ML), without being too presumptuous.
> > >
> > > I worry about this logic to be frank. Why do significant contributions
> > need
> > > to come only from established C* PMC members? Shouldn't we strive to
> > > consider relative merits of code that has actually been submitted to
> the
> > > project on the basis of the code and not who sent the patches?
> > >
> > >
> > > > On the technical side: Reaper supports (or can easily) all the
> concerns
> > > > that the proposal here raises: distributed nodetool commands,
> > > centralising
> > > > jmx interfacing, scheduling ops (repairs, snapshots, compactions,
> > > cleanups,
> > > > etc), monitoring and diagnostics, etc etc. It's designed so that it
> can
> > > be
> > > > a single instance, instance-per-datacenter, or side-car (per
> process).
> > > When
> > > > there are multiple instances in a datacenter you get HA. You have a
> > > choice
> > > > of different storage backends (memory, postgres, c*). You can ofc
> use a
> > > > separate C* cluster as a backend so to separate infrastructure data
> > from
> > > > production data. And it's got an UI for C* Diagnostics already (which
> > > > imposes a different jmx interface of polling for events rather than
> > > > subscribing to jmx notifications which we know is problematic, thanks
> > to
> > > > Stefan). Anyway, that's my plug for Reaper :-)
> > > >
> > > Could we get some of these suggestions into the
> > > CASSANDRA-14346/CASSANDRA-14395 jiras and we can debate the technical
> > > merits there?
> > >
> > > There's been little effort in evaluating these two bodies of work, one
> > > > which is largely unknown to us, and my concern is how we would fairly
> > > > support both going into the future?
> > > >
> > >
> > > > Another option would be that this side-car patch first exists as a
> > github
> > > > project for a period of time, on par to how Reaper has been. This
> will
> > > help
> > > > evaluate its use and to first build up its contributors. This makes
> it
> > > > easier for the C* PMC to choose which projects it would want to
> > formally
> > > > maintain, and to do so based on factors beyond merits of the
> technical.
> > > We
> > > > may even see it converge (or collaborate more) with Reaper, a win for
> > > > everyone.
> > > >
> > > We could have put our distributed repair scheduler as part of Priam
> ages
> > > ago which would have been much easier for us and also has an existing
> > > community, but we don't want to because that will encourage the
> community
> > > to remain fractured on the most important management processes. Instead
> > we
> > > seek to work with the community to take the lessons learned from all
> the
> > > various available sidecars owned by different organizations (Datastax,
> > > Netflix, TLP) and fix this once for the whole community. Can we work
> > > together to make Cassandra just work for our users out of the box?
> > >
> > > -Joey
> > >
> >
>
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Proposing an Apache Cassandra Management process

Reply via email to