Re: [DISCUSS] Druid incubation proposal

Jitendra Pandey Wed, 21 Feb 2018 19:04:47 -0800

+1  
Druid will be a great addition to ASF.

On 2/21/18, 5:06 PM, "Ashutosh Chauhan" <hashut...@apache.org> wrote:


    +1 for Druid in ASF.
    I have been involved with Hive Druid integration. If you are looking for
    mentors, happy to help.
    
    Thanks,
    Ashutosh
    
    On Fri, Feb 16, 2018 at 2:20 PM, Tom Barber <t...@spicule.co.uk> wrote:
    
    > I can second most of that from the peanut gallery, my high level
    > interactions with a few Druid folk and keeping a watchful eye on a very
    > exciting project over the last few years.
    >
    > I think the Druid project would make an excellent addition to the ASF
    > portfolio.
    >
    > Tom
    >
    >
    > On 16/02/18 22:17, Julian Hyde wrote:
    >
    >> As Champion for this proposal, let me say that the Druid project will be
    >> an excellent addition to the ASF. I have been an observer of the project
    >> for a couple of years, and in many respects it is already operating in 
the
    >> Apache Way. Druid had paid developers from a number of companies, some of
    >> whom were in competition, and its governance was strong enough to 
navigate
    >> the choppy waters that that can create.
    >>
    >> A number of Druid committers subsequently started to work on Apache
    >> projects (Gian on Calcite, and Slim and Nishant on Hive) and so already
    >> know what to expect.
    >>
    >> You can get a sense of the project dynamic by reading the archives of
    >> their dev list: https://groups.google.com/forum/#!forum/druid-development
    >> <https://groups.google.com/forum/#!forum/druid-development>
    >>
    >> Julian
    >>
    >>
    >> On Feb 16, 2018, at 12:15 PM, Gian Merlino <g...@apache.org> wrote:
    >>>
    >>> Hi all,
    >>>
    >>> I would like to open up a discussion about incubating Druid at Apache.
    >>> I've
    >>> included a proposal in this mail and have also posted a draft at
    >>> https://wiki.apache.org/incubator/DruidProposal. More information about
    >>> Druid is also available on our project web site at: http://druid.io/
    >>>
    >>> Thanks for your consideration!
    >>>
    >>> Gian
    >>>
    >>> = Druid Proposal =
    >>>
    >>> == Abstract ==
    >>>
    >>> Druid is a high-performance, column-oriented, distributed data store.
    >>>
    >>> == Proposal ==
    >>>
    >>> Druid is an open source data store designed for real-time exploratory
    >>> analytics on large data sets. Druid's key features are a column-oriented
    >>> storage layout, a distributed shared-nothing architecture, and ability 
to
    >>> generate and leverage indexing and caching structures. Druid is 
typically
    >>> deployed in clusters of tens to hundreds of nodes, and has the ability 
to
    >>> load data from Apache Kafka and Apache Hadoop, among other data sources.
    >>> Druid offers two query languages: a SQL dialect (powered by Apache
    >>> Calcite)
    >>> and a JSON-over-HTTP API.
    >>>
    >>> Druid was originally developed to power a slice-and-dice analytical UI
    >>> built on top of large event streams. The original use case for Druid
    >>> targeted ingest rates of millions of records/sec, retention of over a
    >>> year
    >>> of data, and query latencies of sub-second to a few seconds. Many people
    >>> can benefit from such capability, and many already have (see
    >>> http://druid.io/druid-powered.html). In addition, new use cases have
    >>> emerged since Druid's original development, such as OLAP acceleration of
    >>> data warehouse tables and more highly concurrent applications operating
    >>> with relatively narrower queries.
    >>>
    >>> == Background ==
    >>>
    >>> Druid is a data store designed for fast analytics. It would typically be
    >>> used in lieu of more general purpose query systems like Hadoop 
!MapReduce
    >>> or Spark when query latency is of the utmost importance. Druid is often
    >>> used as a data store for powering GUI analytical applications.
    >>>
    >>> The buzzwordy description of Druid is a high-performance,
    >>> column-oriented,
    >>> distributed data store. What we mean by this is:
    >>>
    >>> * "high performance": Druid aims to provide low query latency and high
    >>> ingest rates possible.
    >>> * "column-oriented": Druid stores data in a column-oriented format, like
    >>> most other systems designed for analytics. It can also store indexes
    >>> along
    >>> with the columns.
    >>> * "distributed": Druid is deployed in clusters, typically of tens to
    >>> hundreds of nodes.
    >>> * "data store": Druid loads your data and stores a copy of it on the
    >>> cluster's local disks (and may cache it in memory). It doesn't query 
your
    >>> data from some other storage system.
    >>>
    >>> == Rationale ==
    >>>
    >>> Druid is a mature, active project with a large number of production
    >>> installations, dozens of contributors to each release, and multiple
    >>> vendors
    >>> offering professional support. Given Druid's strong community, its close
    >>> integration with many other Apache projects (such as Kafka, Hadoop, and
    >>> Calcite), and its pre-existing Apache-inspired governance structure, we
    >>> feel that Apache is the best home for the project on a long-term basis.
    >>>
    >>> == Current Status ==
    >>>
    >>> === Meritocracy ===
    >>> Since Druid was first open sourced the original developers have 
solicited
    >>> contributions from others, including through our blog, the project
    >>> mailing
    >>> lists, and through accepting !GitHub pull requests. We have an
    >>> Apache-inspired governance structure with a PMC and committers, and our
    >>> committer ranks include a good number of people from outside the 
original
    >>> development team.
    >>>
    >>> === Community ===
    >>>
    >>> The Druid core developers have sought to nurture a community throughout
    >>> the
    >>> life of the project. We use !GitHub as the focal point for bug reports
    >>> and
    >>> code contributions, and the mailing lists for most other discussion. To
    >>> try
    >>> to make people feel welcome, we've also spelled this out on a
    >>> "CONTRIBUTE"
    >>> link from the project page: http://druid.io/community/. Today we have an
    >>> active contributor base (a typical release has ~40 contributors) and
    >>> mailing list.
    >>>
    >>> === Core Developers ===
    >>>
    >>> Druid enjoys good diversity of committer affiliation. The most active
    >>> developers over the past year are affiliated with four different
    >>> companies:
    >>> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are
    >>> also
    >>> committers on other ASF projects as well, including Apache Airflow,
    >>> Apache
    >>> Curator, and Apache Calcite. The original developers of Druid remain
    >>> involved in the project.
    >>>
    >>> === Alignment ===
    >>>
    >>> Druid's current governance structure is Apache-inspired with a PMC and
    >>> committers chosen by a meritocratic process. Additionally, Druid
    >>> integrates
    >>> with a number of other Apache projects, including Kafka, Hadoop, Hive,
    >>> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
    >>>
    >>> == Known Risks ==
    >>>
    >>> === Orphaned products ===
    >>>
    >>> The risk of Druid becoming orphaned is low, due to a diverse committer
    >>> base
    >>> that is invested in the future of the project.
    >>>
    >>> === Inexperience with Open Source ===
    >>>
    >>> Druid's core developers have been running it as a community-oriented 
open
    >>> source project for some time now, and many of them are committers on
    >>> other
    >>> open source projects as well, including Apache Airflow, Apache Curator,
    >>> and
    >>> Apache Calcite.
    >>>
    >>> === Homogenous Developers ===
    >>>
    >>> Druid's current diversity of committer affiliation means that we have
    >>> become accustomed to working collaboratively and in the open. We hope
    >>> that
    >>> a transition to the ASF helps Druid's contributor base become even more
    >>> diverse.
    >>>
    >>> === Reliance on Salaried Developers ===
    >>>
    >>> Druid's user base and contributor base skews heavily towards salaried
    >>> developers. We believe this is natural since Druid is a technology
    >>> designed
    >>> to be deployed on large clusters, and due to this, tends to be deployed
    >>> by
    >>> organizations rather than by individuals. Nevertheless, many current
    >>> Druid
    >>> developers have continued working on the project even through job
    >>> changes,
    >>> which we take to be a good sign of developer commitment and personal
    >>> interest.
    >>>
    >>> === Relationships with Other Apache Products ===
    >>>
    >>> Druid integrates with a number of other Apache projects. Druid 
internally
    >>> uses Calcite for SQL planning, and Curator and !ZooKeeper for
    >>> coordination.
    >>> Druid can read data in Avro or Parquet format. Druid can load data from
    >>> streams in Kafka or from files in Hadoop. Druid integrates with Hive as
    >>> an
    >>> option for SQL query acceleration. Druid data can be visualized by
    >>> Superset
    >>> (incubating).
    >>>
    >>> === A Excessive Fascination with the Apache Brand ===
    >>>
    >>> Druid is a successful project with a diverse community. The main reason
    >>> for
    >>> pursuing incubation is to find a stable, long term home for the project
    >>> with a well known governance philosophy.
    >>>
    >>> == Required Resources ==
    >>>
    >>> === Mailing lists ===
    >>>
    >>> We would like to migrate the existing Druid mailing lists from Google
    >>> Groups to Apache.
    >>>
    >>> * druid-user@googlegroups -> us...@druid.incubator.apache.org
    >>> * druid-development@googlegroups -> d...@druid.incubator.apache.org
    >>>
    >>> === Source control ===
    >>>
    >>> Druid development currently takes place on !GitHub. We would like to
    >>> continue using !GitHub, if possible, in order to preserve the workflows
    >>> the
    >>> community has developed around !GitHub pull requests.
    >>>
    >>> === Issue tracking ===
    >>> Druid currently uses !GitHub issues for issue tracking. We would like to
    >>> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
    >>>
    >>> == Documentation ==
    >>>
    >>> Druid's documentation can be found at http://druid.io/docs/latest/.
    >>>
    >>> == Initial Source ==
    >>>
    >>> Druid was initially open-sourced by Metamarkets in 2012 and has been run
    >>> in
    >>> a community-governed fashion since then. The code is currently hosted at
    >>> https://github.com/druid-io/ and includes the following repositories:
    >>>
    >>> * druid (primary repository)
    >>> * druid-console (web console for Druid)
    >>> * druid-io.github.io (source for Druid's website at http://druid.io/)
    >>> * tranquility (realtime stream push client for Druid)
    >>> * docker-druid (Docker image for Druid)
    >>> * pydruid (Python library)
    >>> * RDruid (R library)
    >>> * oss-parent (Maven POM files)
    >>>
    >>> == Source and Intellectual Property Submission Plan ==
    >>>
    >>> A complete set of the open source code needs to be licensed from the
    >>> owning
    >>> organization to the Foundation. Commercial legal counsel for the owning
    >>> organization will review the standard Foundation licensing paperwork and
    >>> propose any updates as needed. This license will enable Apache to
    >>> incubate
    >>> and manage the Druid project moving forward.
    >>>
    >>> Other Druid paraphernalia to be transferred to Apache consists of:
    >>>
    >>> * !GitHub organization at https://github.com/druid-io/
    >>> * Twitter account at https://twitter.com/druidio
    >>> * "druid.io" domain name
    >>> * "Druid" trademark assignment per Foundation standard paper.  The
    >>> trademark assignment paperwork shall be reviewed by the owning
    >>> organization's commercial and IP counsel
    >>> * CLAs - all rights in the code licensed above should encompass the CLAs
    >>> that existed between developers and owning organization
    >>>
    >>> A copyright license to the code, trademark assignment of Druid, and
    >>> transfer of other paraphernalia to Apache should be sufficient to cover
    >>> all
    >>> rights required by Apache to operate the project.
    >>>
    >>> == External Dependencies ==
    >>> External dependencies distributed with Druid currently all have one of
    >>> the
    >>> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with
    >>> one
    >>> exception: the optional Druid MySQL metadata store extension depends on
    >>> MySQL Connector/J, which is GPL licensed. Druid currently packages this
    >>> as
    >>> a separate download; see our current presentation on:
    >>> http://druid.io/downloads.html. As part of incubation we intend to
    >>> determine the best strategy for handling the MySQL extension.
    >>>
    >>> == Cryptography ==
    >>> Not applicable.
    >>>
    >>> == Initial Committers ==
    >>>
    >>> The initial committers for incubation are the current set of committers
    >>> on
    >>> Druid who have expressed interest in being involved in Apache 
incubation.
    >>> Affiliations are listed where relevant. We may seek to add other
    >>> committers
    >>> during incubation; for example, we would want to add any current Druid
    >>> committers who express an interest after incubation begins.
    >>>
    >>> * Charles Allen (char...@allen-net.com) (Snap)
    >>> * David Lim (david.clarence....@gmail.com) (Imply)
    >>> * Eric Tschetter (ched...@apache.org) (Splunk)
    >>> * Fangjin Yang (f...@imply.io) (Imply)
    >>> * Gian Merlino (g...@apache.org) (Imply)
    >>> * Himanshu Gupta (g.himan...@gmail.com) (Oath)
    >>> * Jihoon Son (jihoon...@apache.org) (Imply)
    >>> * Jonathan Wei (jon....@imply.io) (Imply)
    >>> * Maxime Beauchemin (maximebeauche...@gmail.com) (Lyft)
    >>> * Mohamed Slim Bouguerra (slim.bougue...@gmail.com) (Hortonworks)
    >>> * Nishant Bangarwa (nish...@apache.org) (Hortonworks)
    >>> * Parag Jain (paragjai...@gmail.com) (Oath)
    >>> * Roman Leventov (leventov...@gmail.com) (Metamarkets)
    >>> * Xavier Léauté (xav...@leaute.com) (Confluent)
    >>>
    >>> == Sponsors ==
    >>>
    >>> * Champion: Julian Hyde
    >>> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
    >>> * Sponsoring entity: Apache Incubator
    >>>
    >>
    >>
    >
    > --
    >
    >
    > Spicule Limited is registered in England & Wales. Company Number:
    > 09954122. Registered office: First Floor, Telecom House, 125-135 Preston
    > Road, Brighton, England, BN1 6AF. VAT No. 251478891.
    >
    >
    > All engagements are subject to Spicule Terms and Conditions of Business.
    > This email and its contents are intended solely for the individual to whom
    > it is addressed and may contain information that is confidential,
    > privileged or otherwise protected from disclosure, distributing or 
copying.
    > Any views or opinions presented in this email are solely those of the
    > author and do not necessarily represent those of Spicule Limited. The
    > company accepts no liability for any damage caused by any virus 
transmitted
    > by this email. If you have received this message in error, please notify 
us
    > immediately by reply email before deleting it from your system. Service of
    > legal notice cannot be effected on Spicule Limited by email.
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
    > For additional commands, e-mail: general-h...@incubator.apache.org
    >
    >

Re: [DISCUSS] Druid incubation proposal

Reply via email to