It seems that we have consensus (and indeed, an ectopic vote is happening in this discuss thread). I will start a formal vote. All of you who replied '+1' on this thread, thanks for your support, and please cast your vote on the formal thread.
Julian On Thu, Feb 22, 2018 at 9:36 AM, Pramod Immaneni <[email protected]> wrote: > +1 > > On Fri, Feb 16, 2018 at 12:15 PM, Gian Merlino <[email protected]> wrote: > >> Hi all, >> >> I would like to open up a discussion about incubating Druid at Apache. I've >> included a proposal in this mail and have also posted a draft at >> https://wiki.apache.org/incubator/DruidProposal. More information about >> Druid is also available on our project web site at: http://druid.io/ >> >> Thanks for your consideration! >> >> Gian >> >> = Druid Proposal = >> >> == Abstract == >> >> Druid is a high-performance, column-oriented, distributed data store. >> >> == Proposal == >> >> Druid is an open source data store designed for real-time exploratory >> analytics on large data sets. Druid's key features are a column-oriented >> storage layout, a distributed shared-nothing architecture, and ability to >> generate and leverage indexing and caching structures. Druid is typically >> deployed in clusters of tens to hundreds of nodes, and has the ability to >> load data from Apache Kafka and Apache Hadoop, among other data sources. >> Druid offers two query languages: a SQL dialect (powered by Apache Calcite) >> and a JSON-over-HTTP API. >> >> Druid was originally developed to power a slice-and-dice analytical UI >> built on top of large event streams. The original use case for Druid >> targeted ingest rates of millions of records/sec, retention of over a year >> of data, and query latencies of sub-second to a few seconds. Many people >> can benefit from such capability, and many already have (see >> http://druid.io/druid-powered.html). In addition, new use cases have >> emerged since Druid's original development, such as OLAP acceleration of >> data warehouse tables and more highly concurrent applications operating >> with relatively narrower queries. >> >> == Background == >> >> Druid is a data store designed for fast analytics. It would typically be >> used in lieu of more general purpose query systems like Hadoop !MapReduce >> or Spark when query latency is of the utmost importance. Druid is often >> used as a data store for powering GUI analytical applications. >> >> The buzzwordy description of Druid is a high-performance, column-oriented, >> distributed data store. What we mean by this is: >> >> * "high performance": Druid aims to provide low query latency and high >> ingest rates possible. >> * "column-oriented": Druid stores data in a column-oriented format, like >> most other systems designed for analytics. It can also store indexes along >> with the columns. >> * "distributed": Druid is deployed in clusters, typically of tens to >> hundreds of nodes. >> * "data store": Druid loads your data and stores a copy of it on the >> cluster's local disks (and may cache it in memory). It doesn't query your >> data from some other storage system. >> >> == Rationale == >> >> Druid is a mature, active project with a large number of production >> installations, dozens of contributors to each release, and multiple vendors >> offering professional support. Given Druid's strong community, its close >> integration with many other Apache projects (such as Kafka, Hadoop, and >> Calcite), and its pre-existing Apache-inspired governance structure, we >> feel that Apache is the best home for the project on a long-term basis. >> >> == Current Status == >> >> === Meritocracy === >> Since Druid was first open sourced the original developers have solicited >> contributions from others, including through our blog, the project mailing >> lists, and through accepting !GitHub pull requests. We have an >> Apache-inspired governance structure with a PMC and committers, and our >> committer ranks include a good number of people from outside the original >> development team. >> >> === Community === >> >> The Druid core developers have sought to nurture a community throughout the >> life of the project. We use !GitHub as the focal point for bug reports and >> code contributions, and the mailing lists for most other discussion. To try >> to make people feel welcome, we've also spelled this out on a "CONTRIBUTE" >> link from the project page: http://druid.io/community/. Today we have an >> active contributor base (a typical release has ~40 contributors) and >> mailing list. >> >> === Core Developers === >> >> Druid enjoys good diversity of committer affiliation. The most active >> developers over the past year are affiliated with four different companies: >> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are also >> committers on other ASF projects as well, including Apache Airflow, Apache >> Curator, and Apache Calcite. The original developers of Druid remain >> involved in the project. >> >> === Alignment === >> >> Druid's current governance structure is Apache-inspired with a PMC and >> committers chosen by a meritocratic process. Additionally, Druid integrates >> with a number of other Apache projects, including Kafka, Hadoop, Hive, >> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper. >> >> == Known Risks == >> >> === Orphaned products === >> >> The risk of Druid becoming orphaned is low, due to a diverse committer base >> that is invested in the future of the project. >> >> === Inexperience with Open Source === >> >> Druid's core developers have been running it as a community-oriented open >> source project for some time now, and many of them are committers on other >> open source projects as well, including Apache Airflow, Apache Curator, and >> Apache Calcite. >> >> === Homogenous Developers === >> >> Druid's current diversity of committer affiliation means that we have >> become accustomed to working collaboratively and in the open. We hope that >> a transition to the ASF helps Druid's contributor base become even more >> diverse. >> >> === Reliance on Salaried Developers === >> >> Druid's user base and contributor base skews heavily towards salaried >> developers. We believe this is natural since Druid is a technology designed >> to be deployed on large clusters, and due to this, tends to be deployed by >> organizations rather than by individuals. Nevertheless, many current Druid >> developers have continued working on the project even through job changes, >> which we take to be a good sign of developer commitment and personal >> interest. >> >> === Relationships with Other Apache Products === >> >> Druid integrates with a number of other Apache projects. Druid internally >> uses Calcite for SQL planning, and Curator and !ZooKeeper for coordination. >> Druid can read data in Avro or Parquet format. Druid can load data from >> streams in Kafka or from files in Hadoop. Druid integrates with Hive as an >> option for SQL query acceleration. Druid data can be visualized by Superset >> (incubating). >> >> === A Excessive Fascination with the Apache Brand === >> >> Druid is a successful project with a diverse community. The main reason for >> pursuing incubation is to find a stable, long term home for the project >> with a well known governance philosophy. >> >> == Required Resources == >> >> === Mailing lists === >> >> We would like to migrate the existing Druid mailing lists from Google >> Groups to Apache. >> >> * druid-user@googlegroups -> [email protected] >> * druid-development@googlegroups -> [email protected] >> >> === Source control === >> >> Druid development currently takes place on !GitHub. We would like to >> continue using !GitHub, if possible, in order to preserve the workflows the >> community has developed around !GitHub pull requests. >> >> === Issue tracking === >> Druid currently uses !GitHub issues for issue tracking. We would like to >> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID. >> >> == Documentation == >> >> Druid's documentation can be found at http://druid.io/docs/latest/. >> >> == Initial Source == >> >> Druid was initially open-sourced by Metamarkets in 2012 and has been run in >> a community-governed fashion since then. The code is currently hosted at >> https://github.com/druid-io/ and includes the following repositories: >> >> * druid (primary repository) >> * druid-console (web console for Druid) >> * druid-io.github.io (source for Druid's website at http://druid.io/) >> * tranquility (realtime stream push client for Druid) >> * docker-druid (Docker image for Druid) >> * pydruid (Python library) >> * RDruid (R library) >> * oss-parent (Maven POM files) >> >> == Source and Intellectual Property Submission Plan == >> >> A complete set of the open source code needs to be licensed from the owning >> organization to the Foundation. Commercial legal counsel for the owning >> organization will review the standard Foundation licensing paperwork and >> propose any updates as needed. This license will enable Apache to incubate >> and manage the Druid project moving forward. >> >> Other Druid paraphernalia to be transferred to Apache consists of: >> >> * !GitHub organization at https://github.com/druid-io/ >> * Twitter account at https://twitter.com/druidio >> * "druid.io" domain name >> * "Druid" trademark assignment per Foundation standard paper. The >> trademark assignment paperwork shall be reviewed by the owning >> organization's commercial and IP counsel >> * CLAs - all rights in the code licensed above should encompass the CLAs >> that existed between developers and owning organization >> >> A copyright license to the code, trademark assignment of Druid, and >> transfer of other paraphernalia to Apache should be sufficient to cover all >> rights required by Apache to operate the project. >> >> == External Dependencies == >> External dependencies distributed with Druid currently all have one of the >> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with one >> exception: the optional Druid MySQL metadata store extension depends on >> MySQL Connector/J, which is GPL licensed. Druid currently packages this as >> a separate download; see our current presentation on: >> http://druid.io/downloads.html. As part of incubation we intend to >> determine the best strategy for handling the MySQL extension. >> >> == Cryptography == >> Not applicable. >> >> == Initial Committers == >> >> The initial committers for incubation are the current set of committers on >> Druid who have expressed interest in being involved in Apache incubation. >> Affiliations are listed where relevant. We may seek to add other committers >> during incubation; for example, we would want to add any current Druid >> committers who express an interest after incubation begins. >> >> * Charles Allen ([email protected]) (Snap) >> * David Lim ([email protected]) (Imply) >> * Eric Tschetter ([email protected]) (Splunk) >> * Fangjin Yang ([email protected]) (Imply) >> * Gian Merlino ([email protected]) (Imply) >> * Himanshu Gupta ([email protected]) (Oath) >> * Jihoon Son ([email protected]) (Imply) >> * Jonathan Wei ([email protected]) (Imply) >> * Maxime Beauchemin ([email protected]) (Lyft) >> * Mohamed Slim Bouguerra ([email protected]) (Hortonworks) >> * Nishant Bangarwa ([email protected]) (Hortonworks) >> * Parag Jain ([email protected]) (Oath) >> * Roman Leventov ([email protected]) (Metamarkets) >> * Xavier Léauté ([email protected]) (Confluent) >> >> == Sponsors == >> >> * Champion: Julian Hyde >> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao >> * Sponsoring entity: Apache Incubator >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
