Re: [DISCUSS] Druid incubation proposal

Ashutosh Chauhan Wed, 21 Feb 2018 17:07:29 -0800

+1 for Druid in ASF.
I have been involved with Hive Druid integration. If you are looking for
mentors, happy to help.


Thanks,
Ashutosh

On Fri, Feb 16, 2018 at 2:20 PM, Tom Barber <t...@spicule.co.uk> wrote:

> I can second most of that from the peanut gallery, my high level
> interactions with a few Druid folk and keeping a watchful eye on a very
> exciting project over the last few years.
>
> I think the Druid project would make an excellent addition to the ASF
> portfolio.
>
> Tom
>
>
> On 16/02/18 22:17, Julian Hyde wrote:
>
>> As Champion for this proposal, let me say that the Druid project will be
>> an excellent addition to the ASF. I have been an observer of the project
>> for a couple of years, and in many respects it is already operating in the
>> Apache Way. Druid had paid developers from a number of companies, some of
>> whom were in competition, and its governance was strong enough to navigate
>> the choppy waters that that can create.
>>
>> A number of Druid committers subsequently started to work on Apache
>> projects (Gian on Calcite, and Slim and Nishant on Hive) and so already
>> know what to expect.
>>
>> You can get a sense of the project dynamic by reading the archives of
>> their dev list: https://groups.google.com/forum/#!forum/druid-development
>> <https://groups.google.com/forum/#!forum/druid-development>
>>
>> Julian
>>
>>
>> On Feb 16, 2018, at 12:15 PM, Gian Merlino <g...@apache.org> wrote:
>>>
>>> Hi all,
>>>
>>> I would like to open up a discussion about incubating Druid at Apache.
>>> I've
>>> included a proposal in this mail and have also posted a draft at
>>> https://wiki.apache.org/incubator/DruidProposal. More information about
>>> Druid is also available on our project web site at: http://druid.io/
>>>
>>> Thanks for your consideration!
>>>
>>> Gian
>>>
>>> = Druid Proposal =
>>>
>>> == Abstract ==
>>>
>>> Druid is a high-performance, column-oriented, distributed data store.
>>>
>>> == Proposal ==
>>>
>>> Druid is an open source data store designed for real-time exploratory
>>> analytics on large data sets. Druid's key features are a column-oriented
>>> storage layout, a distributed shared-nothing architecture, and ability to
>>> generate and leverage indexing and caching structures. Druid is typically
>>> deployed in clusters of tens to hundreds of nodes, and has the ability to
>>> load data from Apache Kafka and Apache Hadoop, among other data sources.
>>> Druid offers two query languages: a SQL dialect (powered by Apache
>>> Calcite)
>>> and a JSON-over-HTTP API.
>>>
>>> Druid was originally developed to power a slice-and-dice analytical UI
>>> built on top of large event streams. The original use case for Druid
>>> targeted ingest rates of millions of records/sec, retention of over a
>>> year
>>> of data, and query latencies of sub-second to a few seconds. Many people
>>> can benefit from such capability, and many already have (see
>>> http://druid.io/druid-powered.html). In addition, new use cases have
>>> emerged since Druid's original development, such as OLAP acceleration of
>>> data warehouse tables and more highly concurrent applications operating
>>> with relatively narrower queries.
>>>
>>> == Background ==
>>>
>>> Druid is a data store designed for fast analytics. It would typically be
>>> used in lieu of more general purpose query systems like Hadoop !MapReduce
>>> or Spark when query latency is of the utmost importance. Druid is often
>>> used as a data store for powering GUI analytical applications.
>>>
>>> The buzzwordy description of Druid is a high-performance,
>>> column-oriented,
>>> distributed data store. What we mean by this is:
>>>
>>> * "high performance": Druid aims to provide low query latency and high
>>> ingest rates possible.
>>> * "column-oriented": Druid stores data in a column-oriented format, like
>>> most other systems designed for analytics. It can also store indexes
>>> along
>>> with the columns.
>>> * "distributed": Druid is deployed in clusters, typically of tens to
>>> hundreds of nodes.
>>> * "data store": Druid loads your data and stores a copy of it on the
>>> cluster's local disks (and may cache it in memory). It doesn't query your
>>> data from some other storage system.
>>>
>>> == Rationale ==
>>>
>>> Druid is a mature, active project with a large number of production
>>> installations, dozens of contributors to each release, and multiple
>>> vendors
>>> offering professional support. Given Druid's strong community, its close
>>> integration with many other Apache projects (such as Kafka, Hadoop, and
>>> Calcite), and its pre-existing Apache-inspired governance structure, we
>>> feel that Apache is the best home for the project on a long-term basis.
>>>
>>> == Current Status ==
>>>
>>> === Meritocracy ===
>>> Since Druid was first open sourced the original developers have solicited
>>> contributions from others, including through our blog, the project
>>> mailing
>>> lists, and through accepting !GitHub pull requests. We have an
>>> Apache-inspired governance structure with a PMC and committers, and our
>>> committer ranks include a good number of people from outside the original
>>> development team.
>>>
>>> === Community ===
>>>
>>> The Druid core developers have sought to nurture a community throughout
>>> the
>>> life of the project. We use !GitHub as the focal point for bug reports
>>> and
>>> code contributions, and the mailing lists for most other discussion. To
>>> try
>>> to make people feel welcome, we've also spelled this out on a
>>> "CONTRIBUTE"
>>> link from the project page: http://druid.io/community/. Today we have an
>>> active contributor base (a typical release has ~40 contributors) and
>>> mailing list.
>>>
>>> === Core Developers ===
>>>
>>> Druid enjoys good diversity of committer affiliation. The most active
>>> developers over the past year are affiliated with four different
>>> companies:
>>> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are
>>> also
>>> committers on other ASF projects as well, including Apache Airflow,
>>> Apache
>>> Curator, and Apache Calcite. The original developers of Druid remain
>>> involved in the project.
>>>
>>> === Alignment ===
>>>
>>> Druid's current governance structure is Apache-inspired with a PMC and
>>> committers chosen by a meritocratic process. Additionally, Druid
>>> integrates
>>> with a number of other Apache projects, including Kafka, Hadoop, Hive,
>>> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
>>>
>>> == Known Risks ==
>>>
>>> === Orphaned products ===
>>>
>>> The risk of Druid becoming orphaned is low, due to a diverse committer
>>> base
>>> that is invested in the future of the project.
>>>
>>> === Inexperience with Open Source ===
>>>
>>> Druid's core developers have been running it as a community-oriented open
>>> source project for some time now, and many of them are committers on
>>> other
>>> open source projects as well, including Apache Airflow, Apache Curator,
>>> and
>>> Apache Calcite.
>>>
>>> === Homogenous Developers ===
>>>
>>> Druid's current diversity of committer affiliation means that we have
>>> become accustomed to working collaboratively and in the open. We hope
>>> that
>>> a transition to the ASF helps Druid's contributor base become even more
>>> diverse.
>>>
>>> === Reliance on Salaried Developers ===
>>>
>>> Druid's user base and contributor base skews heavily towards salaried
>>> developers. We believe this is natural since Druid is a technology
>>> designed
>>> to be deployed on large clusters, and due to this, tends to be deployed
>>> by
>>> organizations rather than by individuals. Nevertheless, many current
>>> Druid
>>> developers have continued working on the project even through job
>>> changes,
>>> which we take to be a good sign of developer commitment and personal
>>> interest.
>>>
>>> === Relationships with Other Apache Products ===
>>>
>>> Druid integrates with a number of other Apache projects. Druid internally
>>> uses Calcite for SQL planning, and Curator and !ZooKeeper for
>>> coordination.
>>> Druid can read data in Avro or Parquet format. Druid can load data from
>>> streams in Kafka or from files in Hadoop. Druid integrates with Hive as
>>> an
>>> option for SQL query acceleration. Druid data can be visualized by
>>> Superset
>>> (incubating).
>>>
>>> === A Excessive Fascination with the Apache Brand ===
>>>
>>> Druid is a successful project with a diverse community. The main reason
>>> for
>>> pursuing incubation is to find a stable, long term home for the project
>>> with a well known governance philosophy.
>>>
>>> == Required Resources ==
>>>
>>> === Mailing lists ===
>>>
>>> We would like to migrate the existing Druid mailing lists from Google
>>> Groups to Apache.
>>>
>>> * druid-user@googlegroups -> us...@druid.incubator.apache.org
>>> * druid-development@googlegroups -> d...@druid.incubator.apache.org
>>>
>>> === Source control ===
>>>
>>> Druid development currently takes place on !GitHub. We would like to
>>> continue using !GitHub, if possible, in order to preserve the workflows
>>> the
>>> community has developed around !GitHub pull requests.
>>>
>>> === Issue tracking ===
>>> Druid currently uses !GitHub issues for issue tracking. We would like to
>>> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
>>>
>>> == Documentation ==
>>>
>>> Druid's documentation can be found at http://druid.io/docs/latest/.
>>>
>>> == Initial Source ==
>>>
>>> Druid was initially open-sourced by Metamarkets in 2012 and has been run
>>> in
>>> a community-governed fashion since then. The code is currently hosted at
>>> https://github.com/druid-io/ and includes the following repositories:
>>>
>>> * druid (primary repository)
>>> * druid-console (web console for Druid)
>>> * druid-io.github.io (source for Druid's website at http://druid.io/)
>>> * tranquility (realtime stream push client for Druid)
>>> * docker-druid (Docker image for Druid)
>>> * pydruid (Python library)
>>> * RDruid (R library)
>>> * oss-parent (Maven POM files)
>>>
>>> == Source and Intellectual Property Submission Plan ==
>>>
>>> A complete set of the open source code needs to be licensed from the
>>> owning
>>> organization to the Foundation. Commercial legal counsel for the owning
>>> organization will review the standard Foundation licensing paperwork and
>>> propose any updates as needed. This license will enable Apache to
>>> incubate
>>> and manage the Druid project moving forward.
>>>
>>> Other Druid paraphernalia to be transferred to Apache consists of:
>>>
>>> * !GitHub organization at https://github.com/druid-io/
>>> * Twitter account at https://twitter.com/druidio
>>> * "druid.io" domain name
>>> * "Druid" trademark assignment per Foundation standard paper.  The
>>> trademark assignment paperwork shall be reviewed by the owning
>>> organization's commercial and IP counsel
>>> * CLAs - all rights in the code licensed above should encompass the CLAs
>>> that existed between developers and owning organization
>>>
>>> A copyright license to the code, trademark assignment of Druid, and
>>> transfer of other paraphernalia to Apache should be sufficient to cover
>>> all
>>> rights required by Apache to operate the project.
>>>
>>> == External Dependencies ==
>>> External dependencies distributed with Druid currently all have one of
>>> the
>>> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with
>>> one
>>> exception: the optional Druid MySQL metadata store extension depends on
>>> MySQL Connector/J, which is GPL licensed. Druid currently packages this
>>> as
>>> a separate download; see our current presentation on:
>>> http://druid.io/downloads.html. As part of incubation we intend to
>>> determine the best strategy for handling the MySQL extension.
>>>
>>> == Cryptography ==
>>> Not applicable.
>>>
>>> == Initial Committers ==
>>>
>>> The initial committers for incubation are the current set of committers
>>> on
>>> Druid who have expressed interest in being involved in Apache incubation.
>>> Affiliations are listed where relevant. We may seek to add other
>>> committers
>>> during incubation; for example, we would want to add any current Druid
>>> committers who express an interest after incubation begins.
>>>
>>> * Charles Allen (char...@allen-net.com) (Snap)
>>> * David Lim (david.clarence....@gmail.com) (Imply)
>>> * Eric Tschetter (ched...@apache.org) (Splunk)
>>> * Fangjin Yang (f...@imply.io) (Imply)
>>> * Gian Merlino (g...@apache.org) (Imply)
>>> * Himanshu Gupta (g.himan...@gmail.com) (Oath)
>>> * Jihoon Son (jihoon...@apache.org) (Imply)
>>> * Jonathan Wei (jon....@imply.io) (Imply)
>>> * Maxime Beauchemin (maximebeauche...@gmail.com) (Lyft)
>>> * Mohamed Slim Bouguerra (slim.bougue...@gmail.com) (Hortonworks)
>>> * Nishant Bangarwa (nish...@apache.org) (Hortonworks)
>>> * Parag Jain (paragjai...@gmail.com) (Oath)
>>> * Roman Leventov (leventov...@gmail.com) (Metamarkets)
>>> * Xavier Léauté (xav...@leaute.com) (Confluent)
>>>
>>> == Sponsors ==
>>>
>>> * Champion: Julian Hyde
>>> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
>>> * Sponsoring entity: Apache Incubator
>>>
>>
>>
>
> --
>
>
> Spicule Limited is registered in England & Wales. Company Number:
> 09954122. Registered office: First Floor, Telecom House, 125-135 Preston
> Road, Brighton, England, BN1 6AF. VAT No. 251478891.
>
>
> All engagements are subject to Spicule Terms and Conditions of Business.
> This email and its contents are intended solely for the individual to whom
> it is addressed and may contain information that is confidential,
> privileged or otherwise protected from disclosure, distributing or copying.
> Any views or opinions presented in this email are solely those of the
> author and do not necessarily represent those of Spicule Limited. The
> company accepts no liability for any damage caused by any virus transmitted
> by this email. If you have received this message in error, please notify us
> immediately by reply email before deleting it from your system. Service of
> legal notice cannot be effected on Spicule Limited by email.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [DISCUSS] Druid incubation proposal

Reply via email to