+1 (binding) On Wed, Nov 25, 2015 at 6:26 AM, Patrick Angeles <patrickange...@gmail.com> wrote: > +1 (non-binding) > > On Tue, Nov 24, 2015 at 4:23 PM, Jake Farrell <jfarr...@apache.org> wrote: > >> +1 (binding) >> >> -Jake >> >> On Tue, Nov 24, 2015 at 2:32 PM, Todd Lipcon <t...@apache.org> wrote: >> >> > Hi all, >> > >> > Discussion on the [DISCUSS] thread seems to have wound down, so I'd like >> to >> > call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is >> > pasted below and also available on the wiki at: >> > https://wiki.apache.org/incubator/KuduProposal >> > >> > The proposal is unchanged since the original version, except for the >> > addition of Carl Steinbach as a Mentor. >> > >> > Please cast your votes: >> > >> > [] +1, accept Kudu into the Incubator >> > [] +/-0, positive/negative non-counted expression of feelings >> > [] -1, do not accept Kudu into the incubator (please state reasoning) >> > >> > Given the US holiday this week, I imagine many folks are traveling or >> > otherwise offline. So, let's run the vote for a full week rather than the >> > traditional 72 hours. Unless the IPMC objects to the extended voting >> > period, the vote will close on Tues, Dec 1st at noon PST. >> > >> > Thanks >> > -Todd >> > ----- >> > >> > = Kudu Proposal = >> > >> > == Abstract == >> > >> > Kudu is a distributed columnar storage engine built for the Apache Hadoop >> > ecosystem. >> > >> > == Proposal == >> > >> > Kudu is an open source storage engine for structured data which supports >> > low-latency random access together with efficient analytical access >> > patterns. Kudu distributes data using horizontal partitioning and >> > replicates each partition using Raft consensus, providing low >> > mean-time-to-recovery and low tail latencies. Kudu is designed within the >> > context of the Apache Hadoop ecosystem and supports many integrations >> with >> > other data analytics projects both inside and outside of the Apache >> > Software Foundation. >> > >> > >> > >> > We propose to incubate Kudu as a project of the Apache Software >> Foundation. >> > >> > == Background == >> > >> > In recent years, explosive growth in the amount of data being generated >> and >> > captured by enterprises has resulted in the rapid adoption of open source >> > technology which is able to store massive data sets at scale and at low >> > cost. In particular, the Apache Hadoop ecosystem has become a focal point >> > for such “big data” workloads, because many traditional open source >> > database systems have lagged in offering a scalable alternative. >> > >> > >> > >> > Structured storage in the Hadoop ecosystem has typically been achieved in >> > two ways: for static data sets, data is typically stored on Apache HDFS >> > using binary data formats such as Apache Avro or Apache Parquet. However, >> > neither HDFS nor these formats has any provision for updating individual >> > records, or for efficient random access. Mutable data sets are typically >> > stored in semi-structured stores such as Apache HBase or Apache >> Cassandra. >> > These systems allow for low-latency record-level reads and writes, but >> lag >> > far behind the static file formats in terms of sequential read throughput >> > for applications such as SQL-based analytics or machine learning. >> > >> > >> > >> > Kudu is a new storage system designed and implemented from the ground up >> to >> > fill this gap between high-throughput sequential-access storage systems >> > such as HDFS and low-latency random-access systems such as HBase or >> > Cassandra. While these existing systems continue to hold advantages in >> some >> > situations, Kudu offers a “happy medium” alternative that can >> dramatically >> > simplify the architecture of many common workloads. In particular, Kudu >> > offers a simple API for row-level inserts, updates, and deletes, while >> > providing table scans at throughputs similar to Parquet, a commonly-used >> > columnar format for static data. >> > >> > >> > >> > More information on Kudu can be found at the existing open source project >> > website: http://getkudu.io and in particular in the Kudu white-paper >> PDF: >> > http://getkudu.io/kudu.pdf from which the above was excerpted. >> > >> > == Rationale == >> > >> > As described above, Kudu fills an important gap in the open source >> storage >> > ecosystem. After our initial open source project release in September >> 2015, >> > we have seen a great amount of interest across a diverse set of users and >> > companies. We believe that, as a storage system, it is critical to build >> an >> > equally diverse set of contributors in the development community. Our >> > experiences as committers and PMC members on other Apache projects have >> > taught us the value of diverse communities in ensuring both longevity and >> > high quality for such foundational systems. >> > >> > == Initial Goals == >> > >> > * Move the existing codebase, website, documentation, and mailing lists >> to >> > Apache-hosted infrastructure >> > * Work with the infrastructure team to implement and approve our code >> > review, build, and testing workflows in the context of the ASF >> > * Incremental development and releases per Apache guidelines >> > >> > == Current Status == >> > >> > ==== Releases ==== >> > >> > Kudu has undergone one public release, tagged here >> > https://github.com/cloudera/kudu/tree/kudu0.5.0-release >> > >> > This initial release was not performed in the typical ASF fashion -- no >> > source tarball was released, but rather only convenience binaries made >> > available in Cloudera’s repositories. We will adopt the ASF source >> release >> > process upon joining the incubator. >> > >> > >> > ==== Source ==== >> > >> > Kudu’s source is currently hosted on GitHub at >> > https://github.com/cloudera/kudu >> > >> > This repository will be transitioned to Apache’s git hosting during >> > incubation. >> > >> > >> > >> > ==== Code review ==== >> > >> > Kudu’s code reviews are currently public and hosted on Gerrit at >> > http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu >> > >> > The Kudu developer community is very happy with gerrit and hopes to work >> > with the Apache Infrastructure team to figure out how we can continue to >> > use Gerrit within ASF policies. >> > >> > >> > >> > ==== Issue tracking ==== >> > >> > Kudu’s bug and feature tracking is hosted on JIRA at: >> > https://issues.cloudera.org/projects/KUDU/summary >> > >> > This JIRA instance contains bugs and development discussion dating back 2 >> > years prior to Kudu’s open source release and will provide an initial >> seed >> > for the ASF JIRA. >> > >> > >> > >> > ==== Community discussion ==== >> > >> > Kudu has several public discussion forums, linked here: >> > http://getkudu.io/community.html >> > >> > >> > >> > ==== Build Infrastructure ==== >> > >> > The Kudu Gerrit instance is configured to only allow patches to be >> > committed after running them through an extensive set of pre-commit tests >> > and code lints. The project currently makes use of elastic public cloud >> > resources to perform these tests. Until this point, these resources have >> > been internal to Cloudera, though we are currently investing in moving >> to a >> > publicly accessible infrastructure. >> > >> > >> > >> > ==== Development practices ==== >> > >> > Given that Kudu is a persistent storage engine, the community has a high >> > quality bar for contributions to its core. We have a firm belief that >> high >> > quality is achieved through automation, not manual inspection, and hence >> > put a focus on thorough testing and build infrastructure to ensure that >> > bar. The development community also practices review-then-commit for all >> > changes to ensure that changes are accompanied by appropriate tests, are >> > well commented, etc. >> > >> > Rather than seeing these practices as barriers to contribution, we >> believe >> > that a fully automated and standardized review and testing practice makes >> > it easier for new contributors to have patches accepted. Any new >> developer >> > may post a patch to Gerrit using the same workflow as a seasoned >> > contributor, and the same suite of tests will be automatically run. If >> the >> > tests pass, a committer can quickly review and commit the contribution >> from >> > their web browser. >> > >> > === Meritocracy === >> > >> > We believe strongly in meritocracy in electing committers and PMC >> members. >> > We believe that contributions can come in forms other than just code: for >> > example, one of our initial proposed committers has contributed solely in >> > the area of project documentation. We will encourage contributions and >> > participation of all types, and ensure that contributors are >> appropriately >> > recognized. >> > >> > === Community === >> > >> > Though Kudu is relatively new as an open source project, it has already >> > seen promising growth in its community across several organizations: >> > >> > * '''Cloudera''' is the original development sponsor for Kudu. >> > * '''Xiaomi''' has been helping to develop and optimize Kudu for a new >> > production use case, contributing code, benchmarks, feedback, and >> > conference talks. >> > * '''Intel''' has contributed optimizations related to their hardware >> > technologies. >> > * '''Dropbox''' has been experimenting with Kudu for a machine >> monitoring >> > use case, and has been contributing bug reports and product feedback. >> > * '''Dremio''' is working on integration with Apache Drill and exploring >> > using Kudu in a production use case. >> > * Several community-built Docker images, tutorials, and blog posts have >> > sprouted up since Kudu’s release. >> > >> > >> > >> > By bringing Kudu to Apache, we hope to encourage further contribution >> from >> > the above organizations as well as to engage new users and contributors >> in >> > the community. >> > >> > === Core Developers === >> > >> > Kudu was initially developed as a project at Cloudera. Most of the >> > contributions to date have been by developers employed by Cloudera. >> > >> > >> > >> > Many of the developers are committers or PMC members on other Apache >> > projects. >> > >> > === Alignment === >> > >> > As a project in the big data ecosystem, Kudu is aligned with several >> other >> > ASF projects. Kudu includes input/output format integration with Apache >> > Hadoop, and this integration can also provide a bridge to Apache Spark. >> We >> > are planning to integrate with Apache Hive in the near future. We also >> > integrate closely with Cloudera Impala, which is also currently being >> > proposed for incubation. We have also scheduled a hackathon with the >> Apache >> > Drill team to work on integration with that query engine. >> > >> > == Known Risks == >> > >> > === Orphaned Products === >> > >> > The risk of Kudu being abandoned is low. Cloudera has invested a great >> deal >> > in the initial development of the project, and intends to grow its >> > investment over time as Kudu becomes a product adopted by its customer >> > base. Several other organizations are also experimenting with Kudu for >> > production use cases which would live for many years. >> > >> > === Inexperience with Open Source === >> > >> > Kudu has been released in the open for less than two months. However, >> from >> > our very first public announcement we have been committed to open-source >> > style development: >> > >> > * our code reviews are fully public and documented on a mailing list >> > * our daily development chatter is in a public chat room >> > * we send out weekly “community status” reports highlighting news and >> > contributions >> > * we published our entire JIRA history and discuss bugs in the open >> > * we published our entire Git commit history, going back three years (no >> > squashing) >> > >> > >> > >> > Several of the initial committers are experienced open source developers, >> > several being committers and/or PMC members on other ASF projects >> (Hadoop, >> > HBase, Thrift, Flume, et al). Those who are not ASF committers have >> > experience on non-ASF open source projects (Kiji, open-vm-tools, et al). >> > >> > === Homogenous Developers === >> > >> > The initial committers are employees or former employees of Cloudera. >> > However, the committers are spread across multiple offices (Palo Alto, >> San >> > Francisco, Melbourne), so the team is familiar with working in a >> > distributed environment across varied time zones. >> > >> > >> > >> > The project has received some contributions from developers outside of >> > Cloudera, and is starting to attract a ''user'' community as well. We >> hope >> > to continue to encourage contributions from these developers and >> community >> > members and grow them into committers after they have had time to >> continue >> > their contributions. >> > >> > === Reliance on Salaried Developers === >> > >> > As mentioned above, the majority of development up to this point has been >> > sponsored by Cloudera. We have seen several community users participate >> in >> > discussions who are hobbyists interested in distributed systems and >> > databases, and hope that they will continue their participation in the >> > project going forward. >> > >> > === Relationships with Other Apache Products === >> > >> > Kudu is currently related to the following other Apache projects: >> > >> > * Hadoop: Kudu provides MapReduce input/output formats for integration >> > * Spark: Kudu integrates with Spark via the above-mentioned input >> formats, >> > and work is progressing on support for Spark Data Frames and Spark SQL. >> > >> > >> > >> > The Kudu team has reached out to several other Apache projects to start >> > discussing integrations, including Flume, Kafka, Hive, and Drill. >> > >> > >> > >> > Kudu integrates with Impala, which is also being proposed for incubation. >> > >> > >> > >> > Kudu is already collaborating on ValueVector, a proposed TLP spinning out >> > from the Apache Drill community. >> > >> > >> > >> > We look forward to continuing to integrate and collaborate with these >> > communities. >> > >> > === An Excessive Fascination with the Apache Brand === >> > >> > Many of the initial committers are already experienced Apache committers, >> > and understand the true value provided by the Apache Way and the >> principles >> > of the ASF. We believe that this development and contribution model is >> > especially appropriate for storage products, where Apache’s >> > community-over-code philosophy ensures long term viability and >> > consensus-based participation. >> > >> > == Documentation == >> > >> > * Documentation is written in AsciiDoc and committed in the Kudu source >> > repository: >> > >> > * https://github.com/cloudera/kudu/tree/master/docs >> > >> > >> > >> > * The Kudu web site is version-controlled on the ‘gh-pages’ branch of >> the >> > above repository. >> > >> > * A LaTeX whitepaper is also published, and the source is available >> within >> > the same repository. >> > * APIs are documented within the source code as JavaDoc or C++-style >> > documentation comments. >> > * Many design documents are stored within the source code repository as >> > text files next to the code being documented. >> > >> > == Source and Intellectual Property Submission Plan == >> > >> > The Kudu codebase and web site is currently hosted on GitHub and will be >> > transitioned to the ASF repositories during incubation. Kudu is already >> > licensed under the Apache 2.0 license. >> > >> > >> > >> > Some portions of the code are imported from other open source projects >> > under the Apache 2.0, BSD, or MIT licenses, with copyrights held by >> authors >> > other than the initial committers. These copyright notices are maintained >> > in those files as well as a top-level NOTICE.txt file. We believe this to >> > be permissible under the license terms and ASF policies, and confirmed >> via >> > a recent thread on general@incubator.apache.org . >> > >> > >> > >> > The “Kudu” name is not a registered trademark, though before the initial >> > release of the project, we performed a trademark search and Cloudera’s >> > legal counsel deemed it acceptable in the context of a data storage >> engine. >> > There exists an unrelated open source project by the same name related to >> > deployments on Microsoft’s Azure cloud service. We have been in contact >> > with legal counsel from Microsoft and have obtained their approval for >> the >> > use of the Kudu name. >> > >> > >> > >> > Cloudera currently owns several domain names related to Kudu (getkudu.io >> , >> > kududb.io, et al) which will be transferred to the ASF and redirected to >> > the official page during incubation. >> > >> > >> > >> > Portions of Kudu are protected by pending or published patents owned by >> > Cloudera. Given the protections already granted by the Apache License, we >> > do not anticipate any explicit licensing or transfer of this intellectual >> > property. >> > >> > == External Dependencies == >> > >> > The full set of dependencies and licenses are listed in >> > https://github.com/cloudera/kudu/blob/master/LICENSE.txt >> > >> > and summarized here: >> > >> > * '''Twitter Bootstrap''': Apache 2.0 >> > * '''d3''': BSD 3-clause >> > * '''epoch JS library''': MIT >> > * '''lz4''': BSD 2-clause >> > * '''gflags''': BSD 3-clause >> > * '''glog''': BSD 3-clause >> > * '''gperftools''': BSD 3-clause >> > * '''libev''': BSD 2-clause >> > * '''squeasel''':MIT license >> > * '''protobuf''': BSD 3-clause >> > * '''rapidjson''': MIT >> > * '''snappy''': BSD 3-clause >> > * '''trace-viewer''': BSD 3-clause >> > * '''zlib''': zlib license >> > * '''llvm''': University of Illinois/NCSA Open Source (BSD-alike) >> > * '''bitshuffle''': MIT >> > * '''boost''': Boost license >> > * '''curl''': MIT >> > * '''libunwind''': MIT >> > * '''nvml''': BSD 3-clause >> > * '''cyrus-sasl''': Cyrus SASL license (BSD-alike) >> > * '''openssl''': OpenSSL License (BSD-alike) >> > >> > * '''Guava''': Apache 2.0 >> > * '''StumbleUpon Async''': BSD >> > * '''Apache Hadoop''': Apache 2.0 >> > * '''Apache log4j''': Apache 2.0 >> > * '''Netty''': Apache 2.0 >> > * '''slf4j''': MIT >> > * '''Apache Commons''': Apache 2.0 >> > * '''murmur''': Apache 2.0 >> > >> > >> > '''Build/test-only dependencies''': >> > >> > * '''CMake''': BSD 3-clause >> > * '''gcovr''': BSD 3-clause >> > * '''gmock''': BSD 3-clause >> > * '''Apache Maven''': Apache 2.0 >> > * '''JUnit''': EPL >> > * '''Mockito''': MIT >> > >> > == Cryptography == >> > >> > Kudu does not currently include any cryptography-related code. >> > >> > == Required Resources == >> > >> > === Mailing lists === >> > >> > * priv...@kudu.incubator.apache.org (PMC) >> > * comm...@kudu.incubator.apache.org (git push emails) >> > * iss...@kudu.incubator.apache.org (JIRA issue feed) >> > * d...@kudu.incubator.apache.org (Gerrit code reviews plus dev >> discussion) >> > * u...@kudu.incubator.apache.org (User questions) >> > >> > >> > === Repository === >> > >> > * git://git.apache.org/kudu >> > >> > === Gerrit === >> > >> > We hope to continue using Gerrit for our code review and commit workflow. >> > The Kudu team has already been in contact with Jake Farrell to start >> > discussions on how Gerrit can fit into the ASF. We know that several >> other >> > ASF projects and podlings are also interested in Gerrit. >> > >> > >> > >> > If the Infrastructure team does not have the bandwidth to support Gerrit, >> > we will continue to support our own instance of Gerrit for Kudu, and make >> > the necessary integrations such that commits are properly authenticated >> and >> > maintain sufficient provenance to uphold the ASF standards (e.g. via the >> > solution adopted by the AsterixDB podling). >> > >> > == Issue Tracking == >> > >> > We would like to import our current JIRA project into the ASF JIRA, such >> > that our historical commit messages and code comments continue to >> reference >> > the appropriate bug numbers. >> > >> > == Initial Committers == >> > >> > * Adar Dembo a...@cloudera.com >> > * Alex Feinberg a...@strlen.net >> > * Andrew Wang w...@apache.org >> > * Dan Burkert d...@cloudera.com >> > * David Alves dral...@apache.org >> > * Jean-Daniel Cryans jdcry...@apache.org >> > * Mike Percy mpe...@apache.org >> > * Misty Stanley-Jones mi...@apache.org >> > * Todd Lipcon t...@apache.org >> > >> > The initial list of committers was seeded by listing those contributors >> who >> > have contributed 20 or more patches in the last 12 months, indicating >> that >> > they are active and have achieved merit through participation on the >> > project. We chose not to include other contributors who either have not >> yet >> > contributed a significant number of patches, or whose contributions are >> far >> > in the past and we don’t expect to be active within the ASF. >> > >> > == Affiliations == >> > >> > * Adar Dembo - Cloudera >> > * Alex Feinberg - Forward Networks >> > * Andrew Wang - Cloudera >> > * Dan Burkert - Cloudera >> > * David Alves - Cloudera >> > * Jean-Daniel Cryans - Cloudera >> > * Mike Percy - Cloudera >> > * Misty Stanley-Jones - Cloudera >> > * Todd Lipcon - Cloudera >> > >> > == Sponsors == >> > >> > === Champion === >> > >> > * Todd Lipcon >> > >> > === Nominated Mentors === >> > >> > * Jake Farrell - ASF Member and Infra team member, Acquia >> > * Brock Noland - ASF Member, StreamSets >> > * Michael Stack - ASF Member, Cloudera >> > * Jarek Jarcec Cecho - ASF Member, Cloudera >> > * Chris Mattmann - ASF Member, NASA JPL and USC >> > * Julien Le Dem - Incubator PMC, Dremio >> > * Carl Steinbach - ASF Member, LinkedIn >> > >> > === Sponsoring Entity === >> > >> > The Apache Incubator >> > >>
-- Best Regards, Edward J. Yoon --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org