Re: Vision letter, reqest for discussion

Gary Dusbabek Tue, 18 Jul 2017 14:31:09 -0700

On Mon, Jul 17, 2017 at 4:31 PM, Edward Capriolo <[email protected]>
wrote:


>
> On Mon, Jul 17, 2017 at 3:57 PM, Gary Dusbabek <[email protected]>
> wrote:
>
>> Sorry for the late reply. I am on holiday.
>>
>> I think part of the problem is that the community is so small. It's
>> difficult right now to get PRs merged for lack of reviewers. And in cases
>> where participants disagree, and there is no consensus, no real work can
>> get done.
>>
>> For example, I would love to push through a big refactoring that improves
>> the coupling problem in the code base. It is near impossible to write good
>> unit tests currently. And it's difficult to write features if you cannot
>> easily test them. However, I don't feel like there is support for this
>> kind
>> of change.
>>
>> So in short, when there are competing visions, and not a small community,
>> it will be difficult to make headway.
>>
>> As for the CRDTs, etc. I don't think there is any need for them right now,
>> personally. They are a scratch with no itch. :)
>>
>> Gary.
>>
>>
>> On Tue, Jul 11, 2017 at 11:58 AM, Edward Capriolo <[email protected]>
>> wrote:
>>
>> > On Tue, Jul 11, 2017 at 11:15 AM, Русак Максим <[email protected]>
>> > wrote:
>> >
>> > > Hello, Gossip community.
>> > > Today I want to discuss our vision of Gossip project, its purpose and
>> > > future steps.
>> > > I think the main problem is that even I have not clear vision of our
>> > goals
>> > > and future steps, I think all members of our small community - 5-10
>> > > members, have their our unique vision - it's illy.
>> > > Are we just implementation of Gossip? Or do we want to implement much
>> > more
>> > > algorithms and to solve more problems? If yes, what problems?
>> > > Who is our user in both cases?
>> > > I think without this understanding and obtaining first users quickly
>> > > community can fall apart.
>> > >
>> > > I think our goal now:
>> > > 1. Formulate goals and principles of Apache Gossip
>> > > 2. After that we'll understand who is our exemplary user, which
>> problems
>> > > we can solve for him
>> > > 3. Then we'll understand the shortest path to a real adaptation. We'll
>> > get
>> > > one real user and do all stuff to make Gossip decent for him.
>> > >
>> > > I'm GSoC participant, I have a lot of time now to work and I want to
>> move
>> > > Gossip to the new level. My tasks are CRDTs, SWIM and Consensus.
>> > > For example, I can't understand which of these tasks will lead us to
>> > users
>> > > and to what kind of users?
>> > > CRDT umbrella task (GOSSIP-67) has a lot of CRDTs, I implemented
>> almost
>> > > all of them, two remaining Crdts are so rare and complicated that I
>> think
>> > > there is no need to implement them. I think even some of already
>> > > implemented are not necessary.
>> > > The same situation with SWIM. We have some algorithm now, but the
>> system
>> > > in general is not usable by anybody, we can't understand is this
>> > algorithm
>> > > good or not? We mustn't fabricate needs of our users, we should
>> analyze
>> > > problems of real users.
>> > > The same with Consensus. Is it in our plan and does it correspond to
>> our
>> > > vision? Is there anybody who is interested in it?
>> > >
>> > > "Features for features" is not our goal. "Features for solving users'
>> > > pain" have sense.
>> > > I want to bring you one example of strong community and successful
>> > > company. It's Hashicorp. They have SWIM implemented and running in
>> > > production on thousands of machines. And they not just implement the
>> most
>> > > modern algorithms. They do research and innovations. And it's not only
>> > due
>> > > to their passion to algorithms, it's due to pain of their users, their
>> > > clear vision and desire to solve users' problems.
>> > > It's the only way to build big robust community (and company like
>> > > Hashicorp) - formulate purpose and aim on obtaining users.
>> > > So let's think about our understanding of Apache Gossip and decide
>> > whether
>> > > SWIM or Consensus is highest priority to obtain first users or not?
>> > > If we decide that it is, I'll do it with pleasure. If not, let's
>> compose
>> > > plan to first users and I'll bring them.
>> > >
>> > > Thanks, Maxim Rusak
>> > >
>> >
>> > Maxim,
>> >
>> > Apache follows a "Scratch an itch" philosophy.
>> > https://commons.apache.org/volunteering.html. This is different from a
>> > traditional software product or consulting company. We do not need to
>> make
>> > a "road map" or decide who are "users" are. You and I are both
>> volunteers
>> > to the Gossip effort.
>> >
>> > If you say that the other two CRDT types we have ticket are ticket are
>> rare
>> > and complicated, we can close them as WONT_FIX, or we can leave them
>> open
>> > in case someone else wants to work on them. That is a discussion we
>> should
>> > have possibly by a case by case basis possibly inside the ticket.
>> >
>> > Implicitly we understand that for Gossip to be successfully then people
>> > have to use it. A key part of that is having features that matter to
>> > people.
>> >
>> > "So let's think about our understanding of Apache Gossip and decide
>> whether
>> > SWIM or Consensus is highest priority to obtain first users or not?"
>> >
>> > I am not a business analyst. These are things I know:
>> >
>> > 1) Riak has CRDT support
>> > 2) Spark uses a gossip layer
>> > 3) Cassandra Uses a gossip layer
>> > 4) zookeeper has watchers (close to our event listeners)
>> > 5) hashicorp has a product you mentioned
>> > 6) akka has crdt support
>> >
>> > We outlined some possible end-goals user cases from Gossip when we
>> proposed
>> > it to the incubator. I have also worked with some other apache projects
>> > looking for possible implementations of Gossip such as:
>> > https://issues.apache.org/jira/browse/IGNITE-4837.
>> >
>> > I do not understand YOUR confusion about YOUR GSOC proposal for SWIM.
>> The
>> > ticket is self explaining: We want to implement SWIM, so that gossip can
>> > scale to larger numbers of nodes. We DO not need to do it expressly to
>> > "find users" because Gossip is not a for profit company. YOU are
>> working on
>> > the ticket because YOU find it interesting and the committers agreed it
>> was
>> > interesting enough to make a GSOC proposal for it, GSOC found it
>> > interesting enough not to reject it as spam.
>> >
>> > Gossip is not a for profit company, but that does not mean we should not
>> > attempt to solve problems of real users, have a road map, or get the
>> > software in many peoples hands. How do we do that?
>> >
>> > There is no simple answer. I think the primary vehicle is blogging and
>> > community. For example I asked everyone to write up their GSOC work into
>> > blogs:
>> >
>> >
>> >    - Wrote a blog regarding Data Change Event Listeners
>> >       -  https://medium.com/@mirage20/listening-to-data-change-
>> >       events-in-apache-gossip-a0f0a4ea4c21
>> >       <https://medium.com/@mirage20/listening-to-data-change-
>> > events-in-apache-gossip-a0f0a4ea4c21>
>> >    - Wrote a blog regarding Data Replication Control
>> >       - https://medium.com/@mirage20/data-replication-control-in-
>> >       apache-gossip-35777771e2bb
>> >
>> > I can tweet out these blogs, some people follow me, they might re-tweet,
>> > word of mouth we get users who try software or committers interested in
>> > scratching their own itch.
>> >
>> > I suggest some reading about the Apache-Way:
>> > https://www.apache.org/foundation/how-it-works.html .
>> >
>> > Also I suggest starting to fill out details of your tickets and creating
>> > specific threads on the message board. IE what are you researching about
>> > swim? What were the conclusions? What are other alternatives? The
>> ticket is
>> > basically empty https://issues.apache.org/jira/browse/GOSSIP-51.
>> >
>>
>
> The CRDTs are important for downstream applications. For example, the CRDT
> types are going to make it much easier to do ...anything. Zookeeper has
> features like writing ORDERED_EPHEMERAL nodes so you can mix and match
> writes and reads with different semantics and glue together a lock, or a
> leader election, etc.
>
> Shared and per-node data provides only a put(x,y) and get(x). Because
> Gossip replication happens lazily the scheme to acquire a lock or elect a
> leader might be something like a structure that crosses a number of keys.
> CRDTs give us the key building block to manipulate complex types in a
> masterless way.
>
> IE. If I am writing "storm" I need a place to store topology, great I can
> denormalize that to key/value and store it in shared data. Next, I need a
> way that 10-100 storm nodes can agree on who is doing what topology. With
> the CRDTs and the  voting (Mirage) in flight we will have that.
>
> "For example, I would love to push through a big refactoring that improves
> the coupling problem in the code base. It is near impossible to write good
> unit tests currently. And it's difficult to write features if you cannot
> easily test them. However, I don't feel like there is support for this kind
> of change."
>
> From my prospective mentally the refactoring tickets are a slight bit hard
> to track. The either tend to be a series of small ones that eat up a lot of
> admin bandwidth or a bulky one that starts small and gets large. I do not
> have a problem with refactor tickets specifically, but I would rather see
> them in the scope of features. For example, the change to support SWIM and
> our current Gossiper we are forced to think about the problem differently
> and have two concrete cases so that we can design the correct API. Sorry
> something was hanging out there that you feel was un-acked.
>

No, nothing was unacked. I'm not a fan of having large or medium refactors
as part of features. It makes them harder to review. IMO the refactors we
need do not constitute the small variety either.


>
> "So in short, when there are competing visions, and not a small community,
> it will be difficult to make headway."
>
> I am not sure I agree. Early on in Gossip I was approaching things like
> the larger (apache) projects I worked on. I was kinda used to "hey
> committers here is a patch" someone would roll around and review and then
> tell me a fix, repeat a few times merge.
>
> We just do not have the bodies for that.  If you want to make a change (as
> a committer) you do not really have to wait around for consensus. For a
> committer there is an implicit "WILL MERGE IN 2 DAYS IF NO COMMENT". If you
> are not a committer (or want to wait for my blessing) you are probably
> going to have to send me a singing telegram or two.
>

This is the part I needed to hear. :) I think we're all good now.

Cheers,

Gary.


> Doing the apache releases takes cycles, mentoring the GSOC proposals takes
> cycles, life, jobs, etc.
>
> As for Gossip having a direction, I don't want Gossip to follow the lead
> of other "owned" apache projects. "Hey we are 'EdTech' the commercial
> consulting/solutions arm in the engine room of 'apache gossip' we have all
> the committers and we have a ROAD MAP and our CTO KNOWS WHAT TO DO BASED ON
> WHAT OUR INVESTORS WANT TO HEAR and if your working on something else
> ......tough crap."  :)
>
> I am trying to move at a pace that others can follow/play along. I _COULD_
> have implemented all the CRDTs, but I did one and left the rest open. This
> leaves a door open for others to make meaningful contributions. That is
> essentially what I am trying to do, guide. If i had more time (job, gossip
> (reviews, releases, gsoc), other apache pmc roles, 2 year old) i would
> probably do more outreach like meetups and blogs. There are less bodies on
> deck then I expected at this phase but such is life. I see some projects
> are in the incubator for 3-4 years, not trying to go that long, but not
> trying to rush either.
>

Re: Vision letter, reqest for discussion

Reply via email to