Right, I understand the arguments for starting a new project. I’m not saying reaper is, technically speaking, the best place to start. The point I’m trying to make is that the non-technical advantages of using an existing project as a starting point may outweigh the technical benefits of a clean slate. Whether that’s the case or not, it’s not a strictly technical decision, and the non-technical advantages of starting with reaper need to be weighed.
On September 7, 2018 at 8:19:50 PM, Jeff Jirsa (jji...@gmail.com) wrote: The benefit is that it more closely matched the design doc, from 5 months ago, which is decidedly not about coordinating repair - it’s about a general purpose management tool, where repair is one of many proposed tasks https://docs.google.com/document/d/1UV9pE81NaIUF3g4L1wxq09nT11AkSQcMijgLFwGsY3s/edit By starting with a tool that is built to run repair, you’re sacrificing generality and accepting something purpose built for one sub task. It’s an important subtask, and it’s a nice tool, but it’s not an implementation of the proposal, it’s an alternative that happens to do some of what was proposed. -- Jeff Jirsa > On Sep 7, 2018, at 6:53 PM, Blake Eggleston <beggles...@apple.com> wrote: > > What’s the benefit of doing it that way vs starting with reaper and > integrating the netflix scheduler? If reaper was just a really inappropriate > choice for the cassandra management process, I could see that being a better > approach, but I don’t think that’s the case. > > If our management process isn’t a drop in replacement for reaper, then reaper > will continue to exist, which will split the user and developers base between > the 2 projects. That won't be good for either project. > > On September 7, 2018 at 6:12:01 PM, Jeff Jirsa (jji...@gmail.com) wrote: > > I’d also like to see the end state you describe: reaper UI wrapping the > Netflix management process with pluggable scheduling (either as is with > reaper now, or using the Netflix scheduler), but I don’t think that means we > need to start with reaper - if personally prefer the opposite direction, > starting with something small and isolated and layering on top. > > -- > Jeff Jirsa > > >> On Sep 7, 2018, at 5:42 PM, Blake Eggleston <beggles...@apple.com> wrote: >> >> I think we should accept the reaper project as is and make that cassandra >> management process 1.0, then integrate the netflix scheduler (and other new >> features) into that. >> >> The ultimate goal would be for the netflix scheduler to become the default >> repair scheduler, but I think using reaper as the starting point makes it >> easier to get there. >> >> Reaper would bring a prod user base that would realistically take 2-3 years >> to build up with a new project. As an operator, switching to a cassandra >> management process that’s basically a re-brand of an existing and commonly >> used management process isn’t super risky. Asking operators to switch to a >> new process is a much harder sell. >> >> On September 7, 2018 at 4:17:10 PM, Jeff Jirsa (jji...@gmail.com) wrote: >> >> How can we continue moving this forward? >> >> Mick/Jon/TLP folks, is there a path here where we commit the >> Netflix-provided management process, and you augment Reaper to work with it? >> >> Is there a way we can make a larger umbrella that's modular that can >> support either/both? >> Does anyone believe there's a clear, objective argument that one is >> strictly better than the other? I haven't seen one. >> >> >> >> On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala >> <rtangir...@netflix.com.invalid> wrote: >> >>> +1 to everything that Joey articulated with emphasis on the fact that >>> contributions should be evaluated based on the merit of code and their >>> value add to the whole offering. I hope it does not matter whether that >>> contribution comes from PMC member or a person who is not a committer. I >>> would like the process to be such that it encourages the new members to be >>> a part of the community and not shy away from contributing to the code >>> assuming their contributions are valued differently than committers or PMC >>> members. It would be sad to see the contributions decrease if we go down >>> that path. >>> >>> *Regards,* >>> >>> *Roopa Tangirala* >>> >>> Engineering Manager CDE >>> >>> *(408) 438-3156 - mobile* >>> >>> >>> >>> >>> >>> >>> On Mon, Aug 20, 2018 at 2:58 PM Joseph Lynch <joe.e.ly...@gmail.com> >>> wrote: >>> >>>>> We are looking to contribute Reaper to the Cassandra project. >>>>> >>>> Just to clarify are you proposing contributing Reaper as a project via >>>> donation or you are planning on contributing the features of Reaper as >>>> patches to Cassandra? If the former how far along are you on the donation >>>> process? If the latter, when do you think you would have patches ready >>> for >>>> consideration / review? >>>> >>>> >>>>> Looking at the patch it's very similar in its base design already, but >>>>> Reaper does has a lot more to offer. We have all been working hard to >>>> move >>>>> it to also being a side-car so it can be contributed. This raises a >>>> number >>>>> of relevant questions to this thread: would we then accept both works >>> in >>>>> the Cassandra project, and what burden would it put on the current PMC >>> to >>>>> maintain both works. >>>>> >>>> I would hope that we would collaborate on merging the best parts of all >>>> into the official Cassandra sidecar, taking the always on, shared >>> nothing, >>>> highly available system that we've contributed a patchset for and adding >>> in >>>> many of the repair features (e.g. schedules, a nice web UI) that Reaper >>>> has. >>>> >>>> >>>>> I share Stefan's concern that consensus had not been met around a >>>>> side-car, and that it was somehow default accepted before a patch >>> landed. >>>> >>>> >>>> I feel this is not correct or fair. The sidecar and repair discussions >>> have >>>> been anything _but_ "default accepted". The timeline of consensus >>> building >>>> involving the management sidecar and repair scheduling plans: >>>> >>>> Dec 2016: Vinay worked with Jon and Alex to try to collaborate on Reaper >>> to >>>> come up with design goals for a repair scheduler that could work at >>> Netflix >>>> scale. >>>> >>>> ~Feb 2017: Netflix believes that the fundamental design gaps prevented us >>>> from using Reaper as it relies heavily on remote JMX connections and >>>> central coordination. >>>> >>>> Sep. 2017: Vinay gives a lightning talk at NGCC about a highly available >>>> and distributed repair scheduling sidecar/tool. He is encouraged by >>>> multiple committers to build repair scheduling into the daemon itself and >>>> not as a sidecar so the database is truly eventually consistent. >>>> >>>> ~Jun. 2017 - Feb. 2018: Based on internal need and the positive feedback >>> at >>>> NGCC, Vinay and myself prototype the distributed repair scheduler within >>>> Priam and roll it out at Netflix scale. >>>> >>>> Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20 page >>>> design document for adding repair scheduling to the daemon itself and >>> open >>>> the design up for feedback from the community. We get feedback from Alex, >>>> Blake, Nate, Stefan, and Mick. As far as I know there were zero proposals >>>> to contribute Reaper at this point. We hear the consensus that the >>>> community would prefer repair scheduling in a separate distributed >>> sidecar >>>> rather than in the daemon itself and we re-work the design to match this >>>> consensus, re-aligning with our original proposal at NGCC. >>>> >>>> Apr 2018: Blake brings the discussion of repair scheduling to the dev >>> list >>>> ( >>>> >>>> >>> https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E >>> >>>> ). >>>> Many community members give positive feedback that we should solve it as >>>> part of Cassandra and there is still no mention of contributing Reaper at >>>> this point. The last message is my attempted summary giving context on >>> how >>>> we want to take the best of all the sidecars (OpsCenter, Priam, Reaper) >>> and >>>> ship them with Cassandra. >>>> >>>> Apr. 2018: Dinesh opens CASSANDRA-14395 along with a public design >>> document >>>> for gathering feedback on a general management sidecar. Sankalp and >>> Dinesh >>>> encourage Vinay and myself to kickstart that sidecar using the repair >>>> scheduler patch >>>> >>>> Apr 2018: Dinesh reaches out to the dev list ( >>>> >>>> >>> https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E >>> >>>> ) >>>> about the general management process to gain further feedback. All >>> feedback >>>> remains positive as it is a potential place for multiple community >>> members >>>> to contribute their various sidecar functionality. >>>> >>>> May-Jul 2017: Vinay and I work on creating a basic sidecar for running >>> the >>>> repair scheduler based on the feedback from the community in >>>> CASSANDRA-14346 and CASSANDRA-14395 >>>> >>>> Jun 2018: I bump CASSANDRA-14346 indicating we're still working on this, >>>> nobody objects >>>> >>>> Jul 2018: Sankalp asks on the dev list if anyone has feature Jiras anyone >>>> needs review for before 4.0, I mention again that we've nearly got the >>>> basic sidecar and repair scheduling work done and will need help with >>>> review. No one responds. >>>> >>>> Aug 2018: We submit a patch that brings a basic distributed sidecar and >>>> robust distributed repair to Cassandra itself. Dinesh mentions that he >>> will >>>> try to review. Now folks appear concerned about it being in tree and >>>> instead maybe it should go in a different repo all together. I don't >>> think >>>> we have consensus on the repo choice yet. >>>> >>>> This seems at odds when we're already struggling to keep up with the >>>>> incoming patches/contributions, and there could be other git repos in >>> the >>>>> project we will need to support in the future too. But I'm also curious >>>>> about the whole "Community over Code" angle to this, how do we >>> encourage >>>>> multiple external works to collaborate together building value in both >>>> the >>>>> technical and community. >>>>> >>>> >>>> I viewed this management sidecar as a way for us to stop, as a community, >>>> building the same thing over and over again. Netflix maintains Priam, >>> Last >>>> pickle maintains Reaper, Datastax maintains OpsCenter. Why can't we take >>>> the best of Reaper (e.g. schedules, diagnostic events, UI) and leave the >>>> worst (e.g. centralized design with lots of locking) and combine it with >>>> the best of Priam (robust shared nothing sidecar that makes Cassandra >>>> management easy) and leave the worst (a bunch of technical debt), and >>>> iterate towards one sidecar that allows Cassandra users to just run their >>>> database. >>>> >>>> >>>>> The Reaper project has worked hard in building both its user and >>>>> contributor base. And I would have thought these, including having the >>>>> contributor base overlap with the C* PMC, were prerequisites before >>>> moving >>>>> a larger body of work into the project (separate git repo or not). I >>>> guess >>>>> this isn't so much "Community over Code", but it illustrates a concern >>>>> regarding abandoned code when there's no existing track record of >>>>> maintaining it as OSS, as opposed to expecting an existing "show, don't >>>>> tell" culture. Reaper for example has stronger indicators for ongoing >>>>> support and an existing OSS user base: today C* committers having >>>>> contributed to Reaper are Jon, Stefan, Nate, and myself, amongst the 40 >>>>> contributors in total. And we've been making steps to involve it more >>>> into >>>>> the C* community (eg users ML), without being too presumptuous. >>>> >>>> I worry about this logic to be frank. Why do significant contributions >>> need >>>> to come only from established C* PMC members? Shouldn't we strive to >>>> consider relative merits of code that has actually been submitted to the >>>> project on the basis of the code and not who sent the patches? >>>> >>>> >>>>> On the technical side: Reaper supports (or can easily) all the concerns >>>>> that the proposal here raises: distributed nodetool commands, >>>> centralising >>>>> jmx interfacing, scheduling ops (repairs, snapshots, compactions, >>>> cleanups, >>>>> etc), monitoring and diagnostics, etc etc. It's designed so that it can >>>> be >>>>> a single instance, instance-per-datacenter, or side-car (per process). >>>> When >>>>> there are multiple instances in a datacenter you get HA. You have a >>>> choice >>>>> of different storage backends (memory, postgres, c*). You can ofc use a >>>>> separate C* cluster as a backend so to separate infrastructure data >>> from >>>>> production data. And it's got an UI for C* Diagnostics already (which >>>>> imposes a different jmx interface of polling for events rather than >>>>> subscribing to jmx notifications which we know is problematic, thanks >>> to >>>>> Stefan). Anyway, that's my plug for Reaper :-) >>>>> >>>> Could we get some of these suggestions into the >>>> CASSANDRA-14346/CASSANDRA-14395 jiras and we can debate the technical >>>> merits there? >>>> >>>> There's been little effort in evaluating these two bodies of work, one >>>>> which is largely unknown to us, and my concern is how we would fairly >>>>> support both going into the future? >>>>> >>>> >>>>> Another option would be that this side-car patch first exists as a >>> github >>>>> project for a period of time, on par to how Reaper has been. This will >>>> help >>>>> evaluate its use and to first build up its contributors. This makes it >>>>> easier for the C* PMC to choose which projects it would want to >>> formally >>>>> maintain, and to do so based on factors beyond merits of the technical. >>>> We >>>>> may even see it converge (or collaborate more) with Reaper, a win for >>>>> everyone. >>>>> >>>> We could have put our distributed repair scheduler as part of Priam ages >>>> ago which would have been much easier for us and also has an existing >>>> community, but we don't want to because that will encourage the community >>>> to remain fractured on the most important management processes. Instead >>> we >>>> seek to work with the community to take the lessons learned from all the >>>> various available sidecars owned by different organizations (Datastax, >>>> Netflix, TLP) and fix this once for the whole community. Can we work >>>> together to make Cassandra just work for our users out of the box? >>>> >>>> -Joey >>>> >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org >