On Mon, Jul 17, 2017 at 4:31 PM, Edward Capriolo <[email protected]> wrote:
> > On Mon, Jul 17, 2017 at 3:57 PM, Gary Dusbabek <[email protected]> > wrote: > >> Sorry for the late reply. I am on holiday. >> >> I think part of the problem is that the community is so small. It's >> difficult right now to get PRs merged for lack of reviewers. And in cases >> where participants disagree, and there is no consensus, no real work can >> get done. >> >> For example, I would love to push through a big refactoring that improves >> the coupling problem in the code base. It is near impossible to write good >> unit tests currently. And it's difficult to write features if you cannot >> easily test them. However, I don't feel like there is support for this >> kind >> of change. >> >> So in short, when there are competing visions, and not a small community, >> it will be difficult to make headway. >> >> As for the CRDTs, etc. I don't think there is any need for them right now, >> personally. They are a scratch with no itch. :) >> >> Gary. >> >> >> On Tue, Jul 11, 2017 at 11:58 AM, Edward Capriolo <[email protected]> >> wrote: >> >> > On Tue, Jul 11, 2017 at 11:15 AM, Русак Максим <[email protected]> >> > wrote: >> > >> > > Hello, Gossip community. >> > > Today I want to discuss our vision of Gossip project, its purpose and >> > > future steps. >> > > I think the main problem is that even I have not clear vision of our >> > goals >> > > and future steps, I think all members of our small community - 5-10 >> > > members, have their our unique vision - it's illy. >> > > Are we just implementation of Gossip? Or do we want to implement much >> > more >> > > algorithms and to solve more problems? If yes, what problems? >> > > Who is our user in both cases? >> > > I think without this understanding and obtaining first users quickly >> > > community can fall apart. >> > > >> > > I think our goal now: >> > > 1. Formulate goals and principles of Apache Gossip >> > > 2. After that we'll understand who is our exemplary user, which >> problems >> > > we can solve for him >> > > 3. Then we'll understand the shortest path to a real adaptation. We'll >> > get >> > > one real user and do all stuff to make Gossip decent for him. >> > > >> > > I'm GSoC participant, I have a lot of time now to work and I want to >> move >> > > Gossip to the new level. My tasks are CRDTs, SWIM and Consensus. >> > > For example, I can't understand which of these tasks will lead us to >> > users >> > > and to what kind of users? >> > > CRDT umbrella task (GOSSIP-67) has a lot of CRDTs, I implemented >> almost >> > > all of them, two remaining Crdts are so rare and complicated that I >> think >> > > there is no need to implement them. I think even some of already >> > > implemented are not necessary. >> > > The same situation with SWIM. We have some algorithm now, but the >> system >> > > in general is not usable by anybody, we can't understand is this >> > algorithm >> > > good or not? We mustn't fabricate needs of our users, we should >> analyze >> > > problems of real users. >> > > The same with Consensus. Is it in our plan and does it correspond to >> our >> > > vision? Is there anybody who is interested in it? >> > > >> > > "Features for features" is not our goal. "Features for solving users' >> > > pain" have sense. >> > > I want to bring you one example of strong community and successful >> > > company. It's Hashicorp. They have SWIM implemented and running in >> > > production on thousands of machines. And they not just implement the >> most >> > > modern algorithms. They do research and innovations. And it's not only >> > due >> > > to their passion to algorithms, it's due to pain of their users, their >> > > clear vision and desire to solve users' problems. >> > > It's the only way to build big robust community (and company like >> > > Hashicorp) - formulate purpose and aim on obtaining users. >> > > So let's think about our understanding of Apache Gossip and decide >> > whether >> > > SWIM or Consensus is highest priority to obtain first users or not? >> > > If we decide that it is, I'll do it with pleasure. If not, let's >> compose >> > > plan to first users and I'll bring them. >> > > >> > > Thanks, Maxim Rusak >> > > >> > >> > Maxim, >> > >> > Apache follows a "Scratch an itch" philosophy. >> > https://commons.apache.org/volunteering.html. This is different from a >> > traditional software product or consulting company. We do not need to >> make >> > a "road map" or decide who are "users" are. You and I are both >> volunteers >> > to the Gossip effort. >> > >> > If you say that the other two CRDT types we have ticket are ticket are >> rare >> > and complicated, we can close them as WONT_FIX, or we can leave them >> open >> > in case someone else wants to work on them. That is a discussion we >> should >> > have possibly by a case by case basis possibly inside the ticket. >> > >> > Implicitly we understand that for Gossip to be successfully then people >> > have to use it. A key part of that is having features that matter to >> > people. >> > >> > "So let's think about our understanding of Apache Gossip and decide >> whether >> > SWIM or Consensus is highest priority to obtain first users or not?" >> > >> > I am not a business analyst. These are things I know: >> > >> > 1) Riak has CRDT support >> > 2) Spark uses a gossip layer >> > 3) Cassandra Uses a gossip layer >> > 4) zookeeper has watchers (close to our event listeners) >> > 5) hashicorp has a product you mentioned >> > 6) akka has crdt support >> > >> > We outlined some possible end-goals user cases from Gossip when we >> proposed >> > it to the incubator. I have also worked with some other apache projects >> > looking for possible implementations of Gossip such as: >> > https://issues.apache.org/jira/browse/IGNITE-4837. >> > >> > I do not understand YOUR confusion about YOUR GSOC proposal for SWIM. >> The >> > ticket is self explaining: We want to implement SWIM, so that gossip can >> > scale to larger numbers of nodes. We DO not need to do it expressly to >> > "find users" because Gossip is not a for profit company. YOU are >> working on >> > the ticket because YOU find it interesting and the committers agreed it >> was >> > interesting enough to make a GSOC proposal for it, GSOC found it >> > interesting enough not to reject it as spam. >> > >> > Gossip is not a for profit company, but that does not mean we should not >> > attempt to solve problems of real users, have a road map, or get the >> > software in many peoples hands. How do we do that? >> > >> > There is no simple answer. I think the primary vehicle is blogging and >> > community. For example I asked everyone to write up their GSOC work into >> > blogs: >> > >> > >> > - Wrote a blog regarding Data Change Event Listeners >> > - https://medium.com/@mirage20/listening-to-data-change- >> > events-in-apache-gossip-a0f0a4ea4c21 >> > <https://medium.com/@mirage20/listening-to-data-change- >> > events-in-apache-gossip-a0f0a4ea4c21> >> > - Wrote a blog regarding Data Replication Control >> > - https://medium.com/@mirage20/data-replication-control-in- >> > apache-gossip-35777771e2bb >> > >> > I can tweet out these blogs, some people follow me, they might re-tweet, >> > word of mouth we get users who try software or committers interested in >> > scratching their own itch. >> > >> > I suggest some reading about the Apache-Way: >> > https://www.apache.org/foundation/how-it-works.html . >> > >> > Also I suggest starting to fill out details of your tickets and creating >> > specific threads on the message board. IE what are you researching about >> > swim? What were the conclusions? What are other alternatives? The >> ticket is >> > basically empty https://issues.apache.org/jira/browse/GOSSIP-51. >> > >> > > The CRDTs are important for downstream applications. For example, the CRDT > types are going to make it much easier to do ...anything. Zookeeper has > features like writing ORDERED_EPHEMERAL nodes so you can mix and match > writes and reads with different semantics and glue together a lock, or a > leader election, etc. > > Shared and per-node data provides only a put(x,y) and get(x). Because > Gossip replication happens lazily the scheme to acquire a lock or elect a > leader might be something like a structure that crosses a number of keys. > CRDTs give us the key building block to manipulate complex types in a > masterless way. > > IE. If I am writing "storm" I need a place to store topology, great I can > denormalize that to key/value and store it in shared data. Next, I need a > way that 10-100 storm nodes can agree on who is doing what topology. With > the CRDTs and the voting (Mirage) in flight we will have that. > > "For example, I would love to push through a big refactoring that improves > the coupling problem in the code base. It is near impossible to write good > unit tests currently. And it's difficult to write features if you cannot > easily test them. However, I don't feel like there is support for this kind > of change." > > From my prospective mentally the refactoring tickets are a slight bit hard > to track. The either tend to be a series of small ones that eat up a lot of > admin bandwidth or a bulky one that starts small and gets large. I do not > have a problem with refactor tickets specifically, but I would rather see > them in the scope of features. For example, the change to support SWIM and > our current Gossiper we are forced to think about the problem differently > and have two concrete cases so that we can design the correct API. Sorry > something was hanging out there that you feel was un-acked. > No, nothing was unacked. I'm not a fan of having large or medium refactors as part of features. It makes them harder to review. IMO the refactors we need do not constitute the small variety either. > > "So in short, when there are competing visions, and not a small community, > it will be difficult to make headway." > > I am not sure I agree. Early on in Gossip I was approaching things like > the larger (apache) projects I worked on. I was kinda used to "hey > committers here is a patch" someone would roll around and review and then > tell me a fix, repeat a few times merge. > > We just do not have the bodies for that. If you want to make a change (as > a committer) you do not really have to wait around for consensus. For a > committer there is an implicit "WILL MERGE IN 2 DAYS IF NO COMMENT". If you > are not a committer (or want to wait for my blessing) you are probably > going to have to send me a singing telegram or two. > This is the part I needed to hear. :) I think we're all good now. Cheers, Gary. > Doing the apache releases takes cycles, mentoring the GSOC proposals takes > cycles, life, jobs, etc. > > As for Gossip having a direction, I don't want Gossip to follow the lead > of other "owned" apache projects. "Hey we are 'EdTech' the commercial > consulting/solutions arm in the engine room of 'apache gossip' we have all > the committers and we have a ROAD MAP and our CTO KNOWS WHAT TO DO BASED ON > WHAT OUR INVESTORS WANT TO HEAR and if your working on something else > ......tough crap." :) > > I am trying to move at a pace that others can follow/play along. I _COULD_ > have implemented all the CRDTs, but I did one and left the rest open. This > leaves a door open for others to make meaningful contributions. That is > essentially what I am trying to do, guide. If i had more time (job, gossip > (reviews, releases, gsoc), other apache pmc roles, 2 year old) i would > probably do more outreach like meetups and blogs. There are less bodies on > deck then I expected at this phase but such is life. I see some projects > are in the incubator for 3-4 years, not trying to go that long, but not > trying to rush either. >
