We haven’t even defined any requirements for an admin tool. It’s hard to make a case for anything without agreement on what we’re trying to build.
On Fri, Sep 7, 2018 at 7:17 PM Jeff Jirsa <jji...@gmail.com> wrote: > How can we continue moving this forward? > > Mick/Jon/TLP folks, is there a path here where we commit the > Netflix-provided management process, and you augment Reaper to work with > it? > Is there a way we can make a larger umbrella that's modular that can > support either/both? > Does anyone believe there's a clear, objective argument that one is > strictly better than the other? I haven't seen one. > > > > On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala > <rtangir...@netflix.com.invalid> wrote: > > > +1 to everything that Joey articulated with emphasis on the fact that > > contributions should be evaluated based on the merit of code and their > > value add to the whole offering. I hope it does not matter whether that > > contribution comes from PMC member or a person who is not a committer. I > > would like the process to be such that it encourages the new members to > be > > a part of the community and not shy away from contributing to the code > > assuming their contributions are valued differently than committers or > PMC > > members. It would be sad to see the contributions decrease if we go down > > that path. > > > > *Regards,* > > > > *Roopa Tangirala* > > > > Engineering Manager CDE > > > > *(408) 438-3156 - mobile* > > > > > > > > > > > > > > On Mon, Aug 20, 2018 at 2:58 PM Joseph Lynch <joe.e.ly...@gmail.com> > > wrote: > > > > > > We are looking to contribute Reaper to the Cassandra project. > > > > > > > Just to clarify are you proposing contributing Reaper as a project via > > > donation or you are planning on contributing the features of Reaper as > > > patches to Cassandra? If the former how far along are you on the > donation > > > process? If the latter, when do you think you would have patches ready > > for > > > consideration / review? > > > > > > > > > > Looking at the patch it's very similar in its base design already, > but > > > > Reaper does has a lot more to offer. We have all been working hard to > > > move > > > > it to also being a side-car so it can be contributed. This raises a > > > number > > > > of relevant questions to this thread: would we then accept both works > > in > > > > the Cassandra project, and what burden would it put on the current > PMC > > to > > > > maintain both works. > > > > > > > I would hope that we would collaborate on merging the best parts of all > > > into the official Cassandra sidecar, taking the always on, shared > > nothing, > > > highly available system that we've contributed a patchset for and > adding > > in > > > many of the repair features (e.g. schedules, a nice web UI) that Reaper > > > has. > > > > > > > > > > I share Stefan's concern that consensus had not been met around a > > > > side-car, and that it was somehow default accepted before a patch > > landed. > > > > > > > > > I feel this is not correct or fair. The sidecar and repair discussions > > have > > > been anything _but_ "default accepted". The timeline of consensus > > building > > > involving the management sidecar and repair scheduling plans: > > > > > > Dec 2016: Vinay worked with Jon and Alex to try to collaborate on > Reaper > > to > > > come up with design goals for a repair scheduler that could work at > > Netflix > > > scale. > > > > > > ~Feb 2017: Netflix believes that the fundamental design gaps prevented > us > > > from using Reaper as it relies heavily on remote JMX connections and > > > central coordination. > > > > > > Sep. 2017: Vinay gives a lightning talk at NGCC about a highly > available > > > and distributed repair scheduling sidecar/tool. He is encouraged by > > > multiple committers to build repair scheduling into the daemon itself > and > > > not as a sidecar so the database is truly eventually consistent. > > > > > > ~Jun. 2017 - Feb. 2018: Based on internal need and the positive > feedback > > at > > > NGCC, Vinay and myself prototype the distributed repair scheduler > within > > > Priam and roll it out at Netflix scale. > > > > > > Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20 > page > > > design document for adding repair scheduling to the daemon itself and > > open > > > the design up for feedback from the community. We get feedback from > Alex, > > > Blake, Nate, Stefan, and Mick. As far as I know there were zero > proposals > > > to contribute Reaper at this point. We hear the consensus that the > > > community would prefer repair scheduling in a separate distributed > > sidecar > > > rather than in the daemon itself and we re-work the design to match > this > > > consensus, re-aligning with our original proposal at NGCC. > > > > > > Apr 2018: Blake brings the discussion of repair scheduling to the dev > > list > > > ( > > > > > > > > > https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E > > > ). > > > Many community members give positive feedback that we should solve it > as > > > part of Cassandra and there is still no mention of contributing Reaper > at > > > this point. The last message is my attempted summary giving context on > > how > > > we want to take the best of all the sidecars (OpsCenter, Priam, Reaper) > > and > > > ship them with Cassandra. > > > > > > Apr. 2018: Dinesh opens CASSANDRA-14395 along with a public design > > document > > > for gathering feedback on a general management sidecar. Sankalp and > > Dinesh > > > encourage Vinay and myself to kickstart that sidecar using the repair > > > scheduler patch > > > > > > Apr 2018: Dinesh reaches out to the dev list ( > > > > > > > > > https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E > > > ) > > > about the general management process to gain further feedback. All > > feedback > > > remains positive as it is a potential place for multiple community > > members > > > to contribute their various sidecar functionality. > > > > > > May-Jul 2017: Vinay and I work on creating a basic sidecar for running > > the > > > repair scheduler based on the feedback from the community in > > > CASSANDRA-14346 and CASSANDRA-14395 > > > > > > Jun 2018: I bump CASSANDRA-14346 indicating we're still working on > this, > > > nobody objects > > > > > > Jul 2018: Sankalp asks on the dev list if anyone has feature Jiras > anyone > > > needs review for before 4.0, I mention again that we've nearly got the > > > basic sidecar and repair scheduling work done and will need help with > > > review. No one responds. > > > > > > Aug 2018: We submit a patch that brings a basic distributed sidecar and > > > robust distributed repair to Cassandra itself. Dinesh mentions that he > > will > > > try to review. Now folks appear concerned about it being in tree and > > > instead maybe it should go in a different repo all together. I don't > > think > > > we have consensus on the repo choice yet. > > > > > > This seems at odds when we're already struggling to keep up with the > > > > incoming patches/contributions, and there could be other git repos in > > the > > > > project we will need to support in the future too. But I'm also > curious > > > > about the whole "Community over Code" angle to this, how do we > > encourage > > > > multiple external works to collaborate together building value in > both > > > the > > > > technical and community. > > > > > > > > > > I viewed this management sidecar as a way for us to stop, as a > community, > > > building the same thing over and over again. Netflix maintains Priam, > > Last > > > pickle maintains Reaper, Datastax maintains OpsCenter. Why can't we > take > > > the best of Reaper (e.g. schedules, diagnostic events, UI) and leave > the > > > worst (e.g. centralized design with lots of locking) and combine it > with > > > the best of Priam (robust shared nothing sidecar that makes Cassandra > > > management easy) and leave the worst (a bunch of technical debt), and > > > iterate towards one sidecar that allows Cassandra users to just run > their > > > database. > > > > > > > > > > The Reaper project has worked hard in building both its user and > > > > contributor base. And I would have thought these, including having > the > > > > contributor base overlap with the C* PMC, were prerequisites before > > > moving > > > > a larger body of work into the project (separate git repo or not). I > > > guess > > > > this isn't so much "Community over Code", but it illustrates a > concern > > > > regarding abandoned code when there's no existing track record of > > > > maintaining it as OSS, as opposed to expecting an existing "show, > don't > > > > tell" culture. Reaper for example has stronger indicators for ongoing > > > > support and an existing OSS user base: today C* committers having > > > > contributed to Reaper are Jon, Stefan, Nate, and myself, amongst the > 40 > > > > contributors in total. And we've been making steps to involve it more > > > into > > > > the C* community (eg users ML), without being too presumptuous. > > > > > > I worry about this logic to be frank. Why do significant contributions > > need > > > to come only from established C* PMC members? Shouldn't we strive to > > > consider relative merits of code that has actually been submitted to > the > > > project on the basis of the code and not who sent the patches? > > > > > > > > > > On the technical side: Reaper supports (or can easily) all the > concerns > > > > that the proposal here raises: distributed nodetool commands, > > > centralising > > > > jmx interfacing, scheduling ops (repairs, snapshots, compactions, > > > cleanups, > > > > etc), monitoring and diagnostics, etc etc. It's designed so that it > can > > > be > > > > a single instance, instance-per-datacenter, or side-car (per > process). > > > When > > > > there are multiple instances in a datacenter you get HA. You have a > > > choice > > > > of different storage backends (memory, postgres, c*). You can ofc > use a > > > > separate C* cluster as a backend so to separate infrastructure data > > from > > > > production data. And it's got an UI for C* Diagnostics already (which > > > > imposes a different jmx interface of polling for events rather than > > > > subscribing to jmx notifications which we know is problematic, thanks > > to > > > > Stefan). Anyway, that's my plug for Reaper :-) > > > > > > > Could we get some of these suggestions into the > > > CASSANDRA-14346/CASSANDRA-14395 jiras and we can debate the technical > > > merits there? > > > > > > There's been little effort in evaluating these two bodies of work, one > > > > which is largely unknown to us, and my concern is how we would fairly > > > > support both going into the future? > > > > > > > > > > > Another option would be that this side-car patch first exists as a > > github > > > > project for a period of time, on par to how Reaper has been. This > will > > > help > > > > evaluate its use and to first build up its contributors. This makes > it > > > > easier for the C* PMC to choose which projects it would want to > > formally > > > > maintain, and to do so based on factors beyond merits of the > technical. > > > We > > > > may even see it converge (or collaborate more) with Reaper, a win for > > > > everyone. > > > > > > > We could have put our distributed repair scheduler as part of Priam > ages > > > ago which would have been much easier for us and also has an existing > > > community, but we don't want to because that will encourage the > community > > > to remain fractured on the most important management processes. Instead > > we > > > seek to work with the community to take the lessons learned from all > the > > > various available sidecars owned by different organizations (Datastax, > > > Netflix, TLP) and fix this once for the whole community. Can we work > > > together to make Cassandra just work for our users out of the box? > > > > > > -Joey > > > > > > -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade