How can we continue moving this forward? Mick/Jon/TLP folks, is there a path here where we commit the Netflix-provided management process, and you augment Reaper to work with it? Is there a way we can make a larger umbrella that's modular that can support either/both? Does anyone believe there's a clear, objective argument that one is strictly better than the other? I haven't seen one.
On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala <rtangir...@netflix.com.invalid> wrote: > +1 to everything that Joey articulated with emphasis on the fact that > contributions should be evaluated based on the merit of code and their > value add to the whole offering. I hope it does not matter whether that > contribution comes from PMC member or a person who is not a committer. I > would like the process to be such that it encourages the new members to be > a part of the community and not shy away from contributing to the code > assuming their contributions are valued differently than committers or PMC > members. It would be sad to see the contributions decrease if we go down > that path. > > *Regards,* > > *Roopa Tangirala* > > Engineering Manager CDE > > *(408) 438-3156 - mobile* > > > > > > > On Mon, Aug 20, 2018 at 2:58 PM Joseph Lynch <joe.e.ly...@gmail.com> > wrote: > > > > We are looking to contribute Reaper to the Cassandra project. > > > > > Just to clarify are you proposing contributing Reaper as a project via > > donation or you are planning on contributing the features of Reaper as > > patches to Cassandra? If the former how far along are you on the donation > > process? If the latter, when do you think you would have patches ready > for > > consideration / review? > > > > > > > Looking at the patch it's very similar in its base design already, but > > > Reaper does has a lot more to offer. We have all been working hard to > > move > > > it to also being a side-car so it can be contributed. This raises a > > number > > > of relevant questions to this thread: would we then accept both works > in > > > the Cassandra project, and what burden would it put on the current PMC > to > > > maintain both works. > > > > > I would hope that we would collaborate on merging the best parts of all > > into the official Cassandra sidecar, taking the always on, shared > nothing, > > highly available system that we've contributed a patchset for and adding > in > > many of the repair features (e.g. schedules, a nice web UI) that Reaper > > has. > > > > > > > I share Stefan's concern that consensus had not been met around a > > > side-car, and that it was somehow default accepted before a patch > landed. > > > > > > I feel this is not correct or fair. The sidecar and repair discussions > have > > been anything _but_ "default accepted". The timeline of consensus > building > > involving the management sidecar and repair scheduling plans: > > > > Dec 2016: Vinay worked with Jon and Alex to try to collaborate on Reaper > to > > come up with design goals for a repair scheduler that could work at > Netflix > > scale. > > > > ~Feb 2017: Netflix believes that the fundamental design gaps prevented us > > from using Reaper as it relies heavily on remote JMX connections and > > central coordination. > > > > Sep. 2017: Vinay gives a lightning talk at NGCC about a highly available > > and distributed repair scheduling sidecar/tool. He is encouraged by > > multiple committers to build repair scheduling into the daemon itself and > > not as a sidecar so the database is truly eventually consistent. > > > > ~Jun. 2017 - Feb. 2018: Based on internal need and the positive feedback > at > > NGCC, Vinay and myself prototype the distributed repair scheduler within > > Priam and roll it out at Netflix scale. > > > > Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20 page > > design document for adding repair scheduling to the daemon itself and > open > > the design up for feedback from the community. We get feedback from Alex, > > Blake, Nate, Stefan, and Mick. As far as I know there were zero proposals > > to contribute Reaper at this point. We hear the consensus that the > > community would prefer repair scheduling in a separate distributed > sidecar > > rather than in the daemon itself and we re-work the design to match this > > consensus, re-aligning with our original proposal at NGCC. > > > > Apr 2018: Blake brings the discussion of repair scheduling to the dev > list > > ( > > > > > https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E > > ). > > Many community members give positive feedback that we should solve it as > > part of Cassandra and there is still no mention of contributing Reaper at > > this point. The last message is my attempted summary giving context on > how > > we want to take the best of all the sidecars (OpsCenter, Priam, Reaper) > and > > ship them with Cassandra. > > > > Apr. 2018: Dinesh opens CASSANDRA-14395 along with a public design > document > > for gathering feedback on a general management sidecar. Sankalp and > Dinesh > > encourage Vinay and myself to kickstart that sidecar using the repair > > scheduler patch > > > > Apr 2018: Dinesh reaches out to the dev list ( > > > > > https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E > > ) > > about the general management process to gain further feedback. All > feedback > > remains positive as it is a potential place for multiple community > members > > to contribute their various sidecar functionality. > > > > May-Jul 2017: Vinay and I work on creating a basic sidecar for running > the > > repair scheduler based on the feedback from the community in > > CASSANDRA-14346 and CASSANDRA-14395 > > > > Jun 2018: I bump CASSANDRA-14346 indicating we're still working on this, > > nobody objects > > > > Jul 2018: Sankalp asks on the dev list if anyone has feature Jiras anyone > > needs review for before 4.0, I mention again that we've nearly got the > > basic sidecar and repair scheduling work done and will need help with > > review. No one responds. > > > > Aug 2018: We submit a patch that brings a basic distributed sidecar and > > robust distributed repair to Cassandra itself. Dinesh mentions that he > will > > try to review. Now folks appear concerned about it being in tree and > > instead maybe it should go in a different repo all together. I don't > think > > we have consensus on the repo choice yet. > > > > This seems at odds when we're already struggling to keep up with the > > > incoming patches/contributions, and there could be other git repos in > the > > > project we will need to support in the future too. But I'm also curious > > > about the whole "Community over Code" angle to this, how do we > encourage > > > multiple external works to collaborate together building value in both > > the > > > technical and community. > > > > > > > I viewed this management sidecar as a way for us to stop, as a community, > > building the same thing over and over again. Netflix maintains Priam, > Last > > pickle maintains Reaper, Datastax maintains OpsCenter. Why can't we take > > the best of Reaper (e.g. schedules, diagnostic events, UI) and leave the > > worst (e.g. centralized design with lots of locking) and combine it with > > the best of Priam (robust shared nothing sidecar that makes Cassandra > > management easy) and leave the worst (a bunch of technical debt), and > > iterate towards one sidecar that allows Cassandra users to just run their > > database. > > > > > > > The Reaper project has worked hard in building both its user and > > > contributor base. And I would have thought these, including having the > > > contributor base overlap with the C* PMC, were prerequisites before > > moving > > > a larger body of work into the project (separate git repo or not). I > > guess > > > this isn't so much "Community over Code", but it illustrates a concern > > > regarding abandoned code when there's no existing track record of > > > maintaining it as OSS, as opposed to expecting an existing "show, don't > > > tell" culture. Reaper for example has stronger indicators for ongoing > > > support and an existing OSS user base: today C* committers having > > > contributed to Reaper are Jon, Stefan, Nate, and myself, amongst the 40 > > > contributors in total. And we've been making steps to involve it more > > into > > > the C* community (eg users ML), without being too presumptuous. > > > > I worry about this logic to be frank. Why do significant contributions > need > > to come only from established C* PMC members? Shouldn't we strive to > > consider relative merits of code that has actually been submitted to the > > project on the basis of the code and not who sent the patches? > > > > > > > On the technical side: Reaper supports (or can easily) all the concerns > > > that the proposal here raises: distributed nodetool commands, > > centralising > > > jmx interfacing, scheduling ops (repairs, snapshots, compactions, > > cleanups, > > > etc), monitoring and diagnostics, etc etc. It's designed so that it can > > be > > > a single instance, instance-per-datacenter, or side-car (per process). > > When > > > there are multiple instances in a datacenter you get HA. You have a > > choice > > > of different storage backends (memory, postgres, c*). You can ofc use a > > > separate C* cluster as a backend so to separate infrastructure data > from > > > production data. And it's got an UI for C* Diagnostics already (which > > > imposes a different jmx interface of polling for events rather than > > > subscribing to jmx notifications which we know is problematic, thanks > to > > > Stefan). Anyway, that's my plug for Reaper :-) > > > > > Could we get some of these suggestions into the > > CASSANDRA-14346/CASSANDRA-14395 jiras and we can debate the technical > > merits there? > > > > There's been little effort in evaluating these two bodies of work, one > > > which is largely unknown to us, and my concern is how we would fairly > > > support both going into the future? > > > > > > > > Another option would be that this side-car patch first exists as a > github > > > project for a period of time, on par to how Reaper has been. This will > > help > > > evaluate its use and to first build up its contributors. This makes it > > > easier for the C* PMC to choose which projects it would want to > formally > > > maintain, and to do so based on factors beyond merits of the technical. > > We > > > may even see it converge (or collaborate more) with Reaper, a win for > > > everyone. > > > > > We could have put our distributed repair scheduler as part of Priam ages > > ago which would have been much easier for us and also has an existing > > community, but we don't want to because that will encourage the community > > to remain fractured on the most important management processes. Instead > we > > seek to work with the community to take the lessons learned from all the > > various available sidecars owned by different organizations (Datastax, > > Netflix, TLP) and fix this once for the whole community. Can we work > > together to make Cassandra just work for our users out of the box? > > > > -Joey > > >