I’d also like to see the end state you describe: reaper UI wrapping the Netflix management process with pluggable scheduling (either as is with reaper now, or using the Netflix scheduler), but I don’t think that means we need to start with reaper - if personally prefer the opposite direction, starting with something small and isolated and layering on top.
-- Jeff Jirsa > On Sep 7, 2018, at 5:42 PM, Blake Eggleston <beggles...@apple.com> wrote: > > I think we should accept the reaper project as is and make that cassandra > management process 1.0, then integrate the netflix scheduler (and other new > features) into that. > > The ultimate goal would be for the netflix scheduler to become the default > repair scheduler, but I think using reaper as the starting point makes it > easier to get there. > > Reaper would bring a prod user base that would realistically take 2-3 years > to build up with a new project. As an operator, switching to a cassandra > management process that’s basically a re-brand of an existing and commonly > used management process isn’t super risky. Asking operators to switch to a > new process is a much harder sell. > > On September 7, 2018 at 4:17:10 PM, Jeff Jirsa (jji...@gmail.com) wrote: > > How can we continue moving this forward? > > Mick/Jon/TLP folks, is there a path here where we commit the > Netflix-provided management process, and you augment Reaper to work with it? > Is there a way we can make a larger umbrella that's modular that can > support either/both? > Does anyone believe there's a clear, objective argument that one is > strictly better than the other? I haven't seen one. > > > > On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala > <rtangir...@netflix.com.invalid> wrote: > >> +1 to everything that Joey articulated with emphasis on the fact that >> contributions should be evaluated based on the merit of code and their >> value add to the whole offering. I hope it does not matter whether that >> contribution comes from PMC member or a person who is not a committer. I >> would like the process to be such that it encourages the new members to be >> a part of the community and not shy away from contributing to the code >> assuming their contributions are valued differently than committers or PMC >> members. It would be sad to see the contributions decrease if we go down >> that path. >> >> *Regards,* >> >> *Roopa Tangirala* >> >> Engineering Manager CDE >> >> *(408) 438-3156 - mobile* >> >> >> >> >> >> >> On Mon, Aug 20, 2018 at 2:58 PM Joseph Lynch <joe.e.ly...@gmail.com> >> wrote: >> >>>> We are looking to contribute Reaper to the Cassandra project. >>>> >>> Just to clarify are you proposing contributing Reaper as a project via >>> donation or you are planning on contributing the features of Reaper as >>> patches to Cassandra? If the former how far along are you on the donation >>> process? If the latter, when do you think you would have patches ready >> for >>> consideration / review? >>> >>> >>>> Looking at the patch it's very similar in its base design already, but >>>> Reaper does has a lot more to offer. We have all been working hard to >>> move >>>> it to also being a side-car so it can be contributed. This raises a >>> number >>>> of relevant questions to this thread: would we then accept both works >> in >>>> the Cassandra project, and what burden would it put on the current PMC >> to >>>> maintain both works. >>>> >>> I would hope that we would collaborate on merging the best parts of all >>> into the official Cassandra sidecar, taking the always on, shared >> nothing, >>> highly available system that we've contributed a patchset for and adding >> in >>> many of the repair features (e.g. schedules, a nice web UI) that Reaper >>> has. >>> >>> >>>> I share Stefan's concern that consensus had not been met around a >>>> side-car, and that it was somehow default accepted before a patch >> landed. >>> >>> >>> I feel this is not correct or fair. The sidecar and repair discussions >> have >>> been anything _but_ "default accepted". The timeline of consensus >> building >>> involving the management sidecar and repair scheduling plans: >>> >>> Dec 2016: Vinay worked with Jon and Alex to try to collaborate on Reaper >> to >>> come up with design goals for a repair scheduler that could work at >> Netflix >>> scale. >>> >>> ~Feb 2017: Netflix believes that the fundamental design gaps prevented us >>> from using Reaper as it relies heavily on remote JMX connections and >>> central coordination. >>> >>> Sep. 2017: Vinay gives a lightning talk at NGCC about a highly available >>> and distributed repair scheduling sidecar/tool. He is encouraged by >>> multiple committers to build repair scheduling into the daemon itself and >>> not as a sidecar so the database is truly eventually consistent. >>> >>> ~Jun. 2017 - Feb. 2018: Based on internal need and the positive feedback >> at >>> NGCC, Vinay and myself prototype the distributed repair scheduler within >>> Priam and roll it out at Netflix scale. >>> >>> Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20 page >>> design document for adding repair scheduling to the daemon itself and >> open >>> the design up for feedback from the community. We get feedback from Alex, >>> Blake, Nate, Stefan, and Mick. As far as I know there were zero proposals >>> to contribute Reaper at this point. We hear the consensus that the >>> community would prefer repair scheduling in a separate distributed >> sidecar >>> rather than in the daemon itself and we re-work the design to match this >>> consensus, re-aligning with our original proposal at NGCC. >>> >>> Apr 2018: Blake brings the discussion of repair scheduling to the dev >> list >>> ( >>> >>> >> https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E >> >>> ). >>> Many community members give positive feedback that we should solve it as >>> part of Cassandra and there is still no mention of contributing Reaper at >>> this point. The last message is my attempted summary giving context on >> how >>> we want to take the best of all the sidecars (OpsCenter, Priam, Reaper) >> and >>> ship them with Cassandra. >>> >>> Apr. 2018: Dinesh opens CASSANDRA-14395 along with a public design >> document >>> for gathering feedback on a general management sidecar. Sankalp and >> Dinesh >>> encourage Vinay and myself to kickstart that sidecar using the repair >>> scheduler patch >>> >>> Apr 2018: Dinesh reaches out to the dev list ( >>> >>> >> https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E >> >>> ) >>> about the general management process to gain further feedback. All >> feedback >>> remains positive as it is a potential place for multiple community >> members >>> to contribute their various sidecar functionality. >>> >>> May-Jul 2017: Vinay and I work on creating a basic sidecar for running >> the >>> repair scheduler based on the feedback from the community in >>> CASSANDRA-14346 and CASSANDRA-14395 >>> >>> Jun 2018: I bump CASSANDRA-14346 indicating we're still working on this, >>> nobody objects >>> >>> Jul 2018: Sankalp asks on the dev list if anyone has feature Jiras anyone >>> needs review for before 4.0, I mention again that we've nearly got the >>> basic sidecar and repair scheduling work done and will need help with >>> review. No one responds. >>> >>> Aug 2018: We submit a patch that brings a basic distributed sidecar and >>> robust distributed repair to Cassandra itself. Dinesh mentions that he >> will >>> try to review. Now folks appear concerned about it being in tree and >>> instead maybe it should go in a different repo all together. I don't >> think >>> we have consensus on the repo choice yet. >>> >>> This seems at odds when we're already struggling to keep up with the >>>> incoming patches/contributions, and there could be other git repos in >> the >>>> project we will need to support in the future too. But I'm also curious >>>> about the whole "Community over Code" angle to this, how do we >> encourage >>>> multiple external works to collaborate together building value in both >>> the >>>> technical and community. >>>> >>> >>> I viewed this management sidecar as a way for us to stop, as a community, >>> building the same thing over and over again. Netflix maintains Priam, >> Last >>> pickle maintains Reaper, Datastax maintains OpsCenter. Why can't we take >>> the best of Reaper (e.g. schedules, diagnostic events, UI) and leave the >>> worst (e.g. centralized design with lots of locking) and combine it with >>> the best of Priam (robust shared nothing sidecar that makes Cassandra >>> management easy) and leave the worst (a bunch of technical debt), and >>> iterate towards one sidecar that allows Cassandra users to just run their >>> database. >>> >>> >>>> The Reaper project has worked hard in building both its user and >>>> contributor base. And I would have thought these, including having the >>>> contributor base overlap with the C* PMC, were prerequisites before >>> moving >>>> a larger body of work into the project (separate git repo or not). I >>> guess >>>> this isn't so much "Community over Code", but it illustrates a concern >>>> regarding abandoned code when there's no existing track record of >>>> maintaining it as OSS, as opposed to expecting an existing "show, don't >>>> tell" culture. Reaper for example has stronger indicators for ongoing >>>> support and an existing OSS user base: today C* committers having >>>> contributed to Reaper are Jon, Stefan, Nate, and myself, amongst the 40 >>>> contributors in total. And we've been making steps to involve it more >>> into >>>> the C* community (eg users ML), without being too presumptuous. >>> >>> I worry about this logic to be frank. Why do significant contributions >> need >>> to come only from established C* PMC members? Shouldn't we strive to >>> consider relative merits of code that has actually been submitted to the >>> project on the basis of the code and not who sent the patches? >>> >>> >>>> On the technical side: Reaper supports (or can easily) all the concerns >>>> that the proposal here raises: distributed nodetool commands, >>> centralising >>>> jmx interfacing, scheduling ops (repairs, snapshots, compactions, >>> cleanups, >>>> etc), monitoring and diagnostics, etc etc. It's designed so that it can >>> be >>>> a single instance, instance-per-datacenter, or side-car (per process). >>> When >>>> there are multiple instances in a datacenter you get HA. You have a >>> choice >>>> of different storage backends (memory, postgres, c*). You can ofc use a >>>> separate C* cluster as a backend so to separate infrastructure data >> from >>>> production data. And it's got an UI for C* Diagnostics already (which >>>> imposes a different jmx interface of polling for events rather than >>>> subscribing to jmx notifications which we know is problematic, thanks >> to >>>> Stefan). Anyway, that's my plug for Reaper :-) >>>> >>> Could we get some of these suggestions into the >>> CASSANDRA-14346/CASSANDRA-14395 jiras and we can debate the technical >>> merits there? >>> >>> There's been little effort in evaluating these two bodies of work, one >>>> which is largely unknown to us, and my concern is how we would fairly >>>> support both going into the future? >>>> >>> >>>> Another option would be that this side-car patch first exists as a >> github >>>> project for a period of time, on par to how Reaper has been. This will >>> help >>>> evaluate its use and to first build up its contributors. This makes it >>>> easier for the C* PMC to choose which projects it would want to >> formally >>>> maintain, and to do so based on factors beyond merits of the technical. >>> We >>>> may even see it converge (or collaborate more) with Reaper, a win for >>>> everyone. >>>> >>> We could have put our distributed repair scheduler as part of Priam ages >>> ago which would have been much easier for us and also has an existing >>> community, but we don't want to because that will encourage the community >>> to remain fractured on the most important management processes. Instead >> we >>> seek to work with the community to take the lessons learned from all the >>> various available sidecars owned by different organizations (Datastax, >>> Netflix, TLP) and fix this once for the whole community. Can we work >>> together to make Cassandra just work for our users out of the box? >>> >>> -Joey >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org