+1 Druid will be a great addition to ASF. On 2/21/18, 5:06 PM, "Ashutosh Chauhan" <hashut...@apache.org> wrote:
+1 for Druid in ASF. I have been involved with Hive Druid integration. If you are looking for mentors, happy to help. Thanks, Ashutosh On Fri, Feb 16, 2018 at 2:20 PM, Tom Barber <t...@spicule.co.uk> wrote: > I can second most of that from the peanut gallery, my high level > interactions with a few Druid folk and keeping a watchful eye on a very > exciting project over the last few years. > > I think the Druid project would make an excellent addition to the ASF > portfolio. > > Tom > > > On 16/02/18 22:17, Julian Hyde wrote: > >> As Champion for this proposal, let me say that the Druid project will be >> an excellent addition to the ASF. I have been an observer of the project >> for a couple of years, and in many respects it is already operating in the >> Apache Way. Druid had paid developers from a number of companies, some of >> whom were in competition, and its governance was strong enough to navigate >> the choppy waters that that can create. >> >> A number of Druid committers subsequently started to work on Apache >> projects (Gian on Calcite, and Slim and Nishant on Hive) and so already >> know what to expect. >> >> You can get a sense of the project dynamic by reading the archives of >> their dev list: https://groups.google.com/forum/#!forum/druid-development >> <https://groups.google.com/forum/#!forum/druid-development> >> >> Julian >> >> >> On Feb 16, 2018, at 12:15 PM, Gian Merlino <g...@apache.org> wrote: >>> >>> Hi all, >>> >>> I would like to open up a discussion about incubating Druid at Apache. >>> I've >>> included a proposal in this mail and have also posted a draft at >>> https://wiki.apache.org/incubator/DruidProposal. More information about >>> Druid is also available on our project web site at: http://druid.io/ >>> >>> Thanks for your consideration! >>> >>> Gian >>> >>> = Druid Proposal = >>> >>> == Abstract == >>> >>> Druid is a high-performance, column-oriented, distributed data store. >>> >>> == Proposal == >>> >>> Druid is an open source data store designed for real-time exploratory >>> analytics on large data sets. Druid's key features are a column-oriented >>> storage layout, a distributed shared-nothing architecture, and ability to >>> generate and leverage indexing and caching structures. Druid is typically >>> deployed in clusters of tens to hundreds of nodes, and has the ability to >>> load data from Apache Kafka and Apache Hadoop, among other data sources. >>> Druid offers two query languages: a SQL dialect (powered by Apache >>> Calcite) >>> and a JSON-over-HTTP API. >>> >>> Druid was originally developed to power a slice-and-dice analytical UI >>> built on top of large event streams. The original use case for Druid >>> targeted ingest rates of millions of records/sec, retention of over a >>> year >>> of data, and query latencies of sub-second to a few seconds. Many people >>> can benefit from such capability, and many already have (see >>> http://druid.io/druid-powered.html). In addition, new use cases have >>> emerged since Druid's original development, such as OLAP acceleration of >>> data warehouse tables and more highly concurrent applications operating >>> with relatively narrower queries. >>> >>> == Background == >>> >>> Druid is a data store designed for fast analytics. It would typically be >>> used in lieu of more general purpose query systems like Hadoop !MapReduce >>> or Spark when query latency is of the utmost importance. Druid is often >>> used as a data store for powering GUI analytical applications. >>> >>> The buzzwordy description of Druid is a high-performance, >>> column-oriented, >>> distributed data store. What we mean by this is: >>> >>> * "high performance": Druid aims to provide low query latency and high >>> ingest rates possible. >>> * "column-oriented": Druid stores data in a column-oriented format, like >>> most other systems designed for analytics. It can also store indexes >>> along >>> with the columns. >>> * "distributed": Druid is deployed in clusters, typically of tens to >>> hundreds of nodes. >>> * "data store": Druid loads your data and stores a copy of it on the >>> cluster's local disks (and may cache it in memory). It doesn't query your >>> data from some other storage system. >>> >>> == Rationale == >>> >>> Druid is a mature, active project with a large number of production >>> installations, dozens of contributors to each release, and multiple >>> vendors >>> offering professional support. Given Druid's strong community, its close >>> integration with many other Apache projects (such as Kafka, Hadoop, and >>> Calcite), and its pre-existing Apache-inspired governance structure, we >>> feel that Apache is the best home for the project on a long-term basis. >>> >>> == Current Status == >>> >>> === Meritocracy === >>> Since Druid was first open sourced the original developers have solicited >>> contributions from others, including through our blog, the project >>> mailing >>> lists, and through accepting !GitHub pull requests. We have an >>> Apache-inspired governance structure with a PMC and committers, and our >>> committer ranks include a good number of people from outside the original >>> development team. >>> >>> === Community === >>> >>> The Druid core developers have sought to nurture a community throughout >>> the >>> life of the project. We use !GitHub as the focal point for bug reports >>> and >>> code contributions, and the mailing lists for most other discussion. To >>> try >>> to make people feel welcome, we've also spelled this out on a >>> "CONTRIBUTE" >>> link from the project page: http://druid.io/community/. Today we have an >>> active contributor base (a typical release has ~40 contributors) and >>> mailing list. >>> >>> === Core Developers === >>> >>> Druid enjoys good diversity of committer affiliation. The most active >>> developers over the past year are affiliated with four different >>> companies: >>> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are >>> also >>> committers on other ASF projects as well, including Apache Airflow, >>> Apache >>> Curator, and Apache Calcite. The original developers of Druid remain >>> involved in the project. >>> >>> === Alignment === >>> >>> Druid's current governance structure is Apache-inspired with a PMC and >>> committers chosen by a meritocratic process. Additionally, Druid >>> integrates >>> with a number of other Apache projects, including Kafka, Hadoop, Hive, >>> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper. >>> >>> == Known Risks == >>> >>> === Orphaned products === >>> >>> The risk of Druid becoming orphaned is low, due to a diverse committer >>> base >>> that is invested in the future of the project. >>> >>> === Inexperience with Open Source === >>> >>> Druid's core developers have been running it as a community-oriented open >>> source project for some time now, and many of them are committers on >>> other >>> open source projects as well, including Apache Airflow, Apache Curator, >>> and >>> Apache Calcite. >>> >>> === Homogenous Developers === >>> >>> Druid's current diversity of committer affiliation means that we have >>> become accustomed to working collaboratively and in the open. We hope >>> that >>> a transition to the ASF helps Druid's contributor base become even more >>> diverse. >>> >>> === Reliance on Salaried Developers === >>> >>> Druid's user base and contributor base skews heavily towards salaried >>> developers. We believe this is natural since Druid is a technology >>> designed >>> to be deployed on large clusters, and due to this, tends to be deployed >>> by >>> organizations rather than by individuals. Nevertheless, many current >>> Druid >>> developers have continued working on the project even through job >>> changes, >>> which we take to be a good sign of developer commitment and personal >>> interest. >>> >>> === Relationships with Other Apache Products === >>> >>> Druid integrates with a number of other Apache projects. Druid internally >>> uses Calcite for SQL planning, and Curator and !ZooKeeper for >>> coordination. >>> Druid can read data in Avro or Parquet format. Druid can load data from >>> streams in Kafka or from files in Hadoop. Druid integrates with Hive as >>> an >>> option for SQL query acceleration. Druid data can be visualized by >>> Superset >>> (incubating). >>> >>> === A Excessive Fascination with the Apache Brand === >>> >>> Druid is a successful project with a diverse community. The main reason >>> for >>> pursuing incubation is to find a stable, long term home for the project >>> with a well known governance philosophy. >>> >>> == Required Resources == >>> >>> === Mailing lists === >>> >>> We would like to migrate the existing Druid mailing lists from Google >>> Groups to Apache. >>> >>> * druid-user@googlegroups -> us...@druid.incubator.apache.org >>> * druid-development@googlegroups -> d...@druid.incubator.apache.org >>> >>> === Source control === >>> >>> Druid development currently takes place on !GitHub. We would like to >>> continue using !GitHub, if possible, in order to preserve the workflows >>> the >>> community has developed around !GitHub pull requests. >>> >>> === Issue tracking === >>> Druid currently uses !GitHub issues for issue tracking. We would like to >>> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID. >>> >>> == Documentation == >>> >>> Druid's documentation can be found at http://druid.io/docs/latest/. >>> >>> == Initial Source == >>> >>> Druid was initially open-sourced by Metamarkets in 2012 and has been run >>> in >>> a community-governed fashion since then. The code is currently hosted at >>> https://github.com/druid-io/ and includes the following repositories: >>> >>> * druid (primary repository) >>> * druid-console (web console for Druid) >>> * druid-io.github.io (source for Druid's website at http://druid.io/) >>> * tranquility (realtime stream push client for Druid) >>> * docker-druid (Docker image for Druid) >>> * pydruid (Python library) >>> * RDruid (R library) >>> * oss-parent (Maven POM files) >>> >>> == Source and Intellectual Property Submission Plan == >>> >>> A complete set of the open source code needs to be licensed from the >>> owning >>> organization to the Foundation. Commercial legal counsel for the owning >>> organization will review the standard Foundation licensing paperwork and >>> propose any updates as needed. This license will enable Apache to >>> incubate >>> and manage the Druid project moving forward. >>> >>> Other Druid paraphernalia to be transferred to Apache consists of: >>> >>> * !GitHub organization at https://github.com/druid-io/ >>> * Twitter account at https://twitter.com/druidio >>> * "druid.io" domain name >>> * "Druid" trademark assignment per Foundation standard paper. The >>> trademark assignment paperwork shall be reviewed by the owning >>> organization's commercial and IP counsel >>> * CLAs - all rights in the code licensed above should encompass the CLAs >>> that existed between developers and owning organization >>> >>> A copyright license to the code, trademark assignment of Druid, and >>> transfer of other paraphernalia to Apache should be sufficient to cover >>> all >>> rights required by Apache to operate the project. >>> >>> == External Dependencies == >>> External dependencies distributed with Druid currently all have one of >>> the >>> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with >>> one >>> exception: the optional Druid MySQL metadata store extension depends on >>> MySQL Connector/J, which is GPL licensed. Druid currently packages this >>> as >>> a separate download; see our current presentation on: >>> http://druid.io/downloads.html. As part of incubation we intend to >>> determine the best strategy for handling the MySQL extension. >>> >>> == Cryptography == >>> Not applicable. >>> >>> == Initial Committers == >>> >>> The initial committers for incubation are the current set of committers >>> on >>> Druid who have expressed interest in being involved in Apache incubation. >>> Affiliations are listed where relevant. We may seek to add other >>> committers >>> during incubation; for example, we would want to add any current Druid >>> committers who express an interest after incubation begins. >>> >>> * Charles Allen (char...@allen-net.com) (Snap) >>> * David Lim (david.clarence....@gmail.com) (Imply) >>> * Eric Tschetter (ched...@apache.org) (Splunk) >>> * Fangjin Yang (f...@imply.io) (Imply) >>> * Gian Merlino (g...@apache.org) (Imply) >>> * Himanshu Gupta (g.himan...@gmail.com) (Oath) >>> * Jihoon Son (jihoon...@apache.org) (Imply) >>> * Jonathan Wei (jon....@imply.io) (Imply) >>> * Maxime Beauchemin (maximebeauche...@gmail.com) (Lyft) >>> * Mohamed Slim Bouguerra (slim.bougue...@gmail.com) (Hortonworks) >>> * Nishant Bangarwa (nish...@apache.org) (Hortonworks) >>> * Parag Jain (paragjai...@gmail.com) (Oath) >>> * Roman Leventov (leventov...@gmail.com) (Metamarkets) >>> * Xavier Léauté (xav...@leaute.com) (Confluent) >>> >>> == Sponsors == >>> >>> * Champion: Julian Hyde >>> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao >>> * Sponsoring entity: Apache Incubator >>> >> >> > > -- > > > Spicule Limited is registered in England & Wales. Company Number: > 09954122. Registered office: First Floor, Telecom House, 125-135 Preston > Road, Brighton, England, BN1 6AF. VAT No. 251478891. > > > All engagements are subject to Spicule Terms and Conditions of Business. > This email and its contents are intended solely for the individual to whom > it is addressed and may contain information that is confidential, > privileged or otherwise protected from disclosure, distributing or copying. > Any views or opinions presented in this email are solely those of the > author and do not necessarily represent those of Spicule Limited. The > company accepts no liability for any damage caused by any virus transmitted > by this email. If you have received this message in error, please notify us > immediately by reply email before deleting it from your system. Service of > legal notice cannot be effected on Spicule Limited by email. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >