Forgot to send this email earlier (thanks Henry). With 19 +1 votes (and 10 as binding votes), I'll consider this vote a success.
+1 votes (binding) Todd Lipcon Henry Saptura Lewis John McGibbney Chris Mattmann Jake Farrel Arvind Prabhakar Mark Struberg Andrei Savu Andrew Purtell Roman Shaposhnik +1 votes (non-binding) Jarek Jarcec Cecho Timothy Chen Olivier Lamy Hitesh Shah Bertrand Delacretaz Tom White Brock Noland Julien Le Dem Hyunsik Choi Onwards to: http://incubator.apache.org/projects/parquet.html On Thu, May 22, 2014 at 3:49 PM, Henry Saputra <henry.sapu...@gmail.com>wrote: > Hi Chris, could you re-send the tally up VOTE result with subject > prefixed with [RESULT] ? > > > - Henry > > On Wed, May 21, 2014 at 3:56 PM, Chris Aniszczyk <caniszc...@gmail.com> > wrote: > > With 18 +1 votes (and 10+ as binding votes), I'll consider this vote a > > success. > > > > I'll proceed with the next steps. > > > > Thank you! > > > > > > > > On Sun, May 18, 2014 at 3:57 PM, Todd Lipcon <t...@cloudera.com> wrote: > > > >> +1 from me (the proposed Champion) > >> > >> -Todd > >> > >> > >> On Sun, May 18, 2014 at 2:15 PM, Chris Aniszczyk <caniszc...@gmail.com > >> >wrote: > >> > >> > Based on the results of the discussion thread: > >> > > >> > > >> > http://mail-archives.apache.org/mod_mbox/incubator-general/201405.mbox/%3CCAJg1wMRGhLu4P7LeVQB%2B5K0C-fr-pw2448uj%3D6-3zHag4F1EbA%40mail.gmail.com%3E > >> > > >> > I would like to call a vote on accepting Parquet into the incubator. > >> > https://wiki.apache.org/incubator/ParquetProposal > >> > > >> > [ ] +1 Accept Parquet into the Incubator > >> > [ ] +0 Indifferent to the acceptance of Parquet > >> > [ ] -1 Do not accept Parquet because ... > >> > > >> > The vote will be open until Thursday May 22nd 18:00 UTC. > >> > > >> > = Parquet Proposal = > >> > > >> > == Abstract == > >> > Parquet is a columnar storage format for Hadoop. > >> > > >> > == Proposal == > >> > > >> > We created Parquet to make the advantages of compressed, efficient > >> columnar > >> > data representation available to any project in the Hadoop ecosystem, > >> > regardless of the choice of data processing framework, data model, or > >> > programming language. > >> > > >> > == Background == > >> > > >> > Parquet is built from the ground up with complex nested data > structures > >> in > >> > mind, and uses the repetition/definition level approach to encoding > such > >> > data structures, as popularized by Google Dremel ( > >> > https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We > >> believe > >> > this approach is superior to simple flattening of nested name spaces. > >> > > >> > Parquet is built to support very efficient compression and encoding > >> > schemes. Parquet allows compression schemes to be specified on a > >> per-column > >> > level, and is future-proofed to allow adding more encodings as they > are > >> > invented and implemented. We separate the concepts of encoding and > >> > compression, allowing parquet consumers to implement operators that > work > >> > directly on encoded data without paying decompression and decoding > >> penalty > >> > when possible. > >> > > >> > == Rationale == > >> > > >> > Parquet is built to be used by anyone. We believe that an efficient, > >> > well-implemented columnar storage substrate should be useful to all > >> > frameworks without the cost of extensive and difficult to set up > >> > dependencies. > >> > > >> > Furthermore, the rapid growth of Parquet community is empowered by > open > >> > source. We believe the Apache foundation is a great fit as the > long-term > >> > home for Parquet, as it provides an established process for > >> > community-driven development and decision making by consensus. This is > >> > exactly the model we want for future Parquet development. > >> > > >> > == Initial Goals == > >> > > >> > * Move the existing codebase to Apache > >> > * Integrate with the Apache development process > >> > * Ensure all dependencies are compliant with Apache License version > 2.0 > >> > * Incremental development and releases per Apache guidelines > >> > > >> > == Current Status == > >> > > >> > Parquet has undergone 2 major releases: > >> > https://github.com/Parquet/parquet-format/releases of the core format > >> and > >> > 22 releases: https://github.com/Parquet/parquet-mr/releases of the > >> > supporting set of Java libraries. > >> > > >> > The Parquet source is currently hosted at GitHub, which will seed the > >> > Apache git repository. > >> > > >> > === Meritocracy === > >> > > >> > We plan to invest in supporting a meritocracy. We will discuss the > >> > requirements in an open forum. Several companies have already > expressed > >> > interest in this project, and we intend to invite additional > developers > >> to > >> > participate. We will encourage and monitor community participation so > >> that > >> > privileges can be extended to those that contribute. > >> > > >> > === Community === > >> > > >> > There is a large need for an advanced columnar storage format for > Hadoop. > >> > Parquet is being used in production by many organizations (see > >> > https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md) > >> > > >> > * Cloudera: https://twitter.com/HenryR/statuses/324222874011451392 > >> > * Criteo: https://twitter.com/julsimon/statuses/312114074911666177 > >> > * Salesforce: > >> https://twitter.com/TwitterOSS/statuses/392734610116726784 > >> > * Stripe: https://twitter.com/avibryant/statuses/391339949250715648 > >> > * Twitter: https://twitter.com/J_/statuses/315844725611581441 > >> > > >> > By bringing Parquet into Apache, we believe that the community will > grow > >> > even bigger. > >> > > >> > === Core Developers === > >> > > >> > Parquet was initially developed as a collaboration between Twitter, > >> > Cloudera and Criteo. > >> > > >> > See > >> > > >> > > >> > https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop > >> > > >> > === Alignment === > >> > > >> > We believe that having Parquet at Apache will help further the growth > of > >> > the big-data community, as it will encourage cooperation within the > >> greater > >> > ecosystem of projects spawned by Apache Hadoop. The alignment is also > >> > beneficial to other Apache communities (such as Hadoop, Hive, Avro). > >> > > >> > == Known Risks == > >> > > >> > === Orphaned Products === > >> > > >> > The risk of the Parquet project being abandoned is minimal. There are > >> many > >> > organizations using Parquet in production, including Twitter, > Cloudera, > >> > Stripe, and Salesforce ( > >> > http://blog.cloudera.com/blog/2013/10/parquet-at-salesforce-com/). > >> > > >> > === Inexperience with Open Source === > >> > > >> > Parquet has existed as a healthy open source for one year. During that > >> > time, we have curated an open-source community successfully, > attracting > >> > over 40 contributors (see > >> > https://github.com/Parquet/parquet-mr/graphs/contributors) from a > >> diverse > >> > group of companies. > >> > Several of the core contributors to the project are deeply familiar > with > >> > OSS and Apache specifically: Julien Le Dem was until recently the PMC > >> Chair > >> > for Apache Pig, and Dmitriy Ryaboy, Aniket Mokashi, and Jonathan > Coveney > >> > are also Apache Pig committers with contributions to several other > Apache > >> > projects. Todd Lipcon and Tom White are committers to Apache Hadoop > and > >> > multiple other related projects. Brock Noland is a Hive committer. > >> > > >> > === Homogenous Developers === > >> > > >> > The initial committers come from a number of companies and countries. > >> > Parquet has an active community of developers, and we are committed to > >> > recruiting additional committers based on their contributions to the > >> > project. The java library component alone has contributions from 31 > >> > individual github accounts, 14 of which contributed over 1000 lines of > >> > code. > >> > > >> > === Reliance on Salaried Developers === > >> > > >> > It is expected that Parquet development will occur on both salaried > time > >> > and on volunteer time, after hours. The majority of initial committers > >> are > >> > paid by their employers to contribute to this project. However, they > are > >> > all passionate about the project, and we are confident that the > project > >> > will continue even if no salaried developers contribute to the > project. > >> As > >> > evidence of this statement, we present the GitHub punchcard (see > >> > https://github.com/Parquet/parquet-mr/graphs/punch-card) showing > that a > >> > lot > >> > of activity happens on weekends. We are committed to recruiting > >> additional > >> > committers including non-salaried developers. > >> > > >> > === Relationships with Other Apache Products === > >> > > >> > As mentioned in the Alignment section, Parquet is closely related to > >> > Hadoop. It provides an API that allowed it to be easily integrated > with > >> > many other apache projects: Pig, Hive, Avro, Thrift, Spark, Drill, > >> Crunch, > >> > Tajo. Some of the features it provides are similar to the ORC file > format > >> > which is part of the Hive project. However Parquet focused on being > >> > framework agnostic and language independent and has been really > >> successful > >> > to that end. On top of the Apache projects mentioned above, Parquet is > >> also > >> > integrated with other open source projects, including Protocol > Buffers, > >> > Cloudera Impala or Scrooge. We look forward to continue collaborating > >> with > >> > those communities, as well as other Apache communities. > >> > > >> > === An Excessive Fascination with the Apache Brand === > >> > > >> > Parquet is an already healthy and well known open source project. This > >> > proposal is not for the purpose of generating publicity. Rather, the > >> > primary benefits to joining Apache are those outlined in the Rationale > >> > section. > >> > > >> > == Documentation == > >> > > >> > Documentation is currently located as README markdown files: > >> > > >> > * https://github.com/Parquet/parquet-format > >> > * https://github.com/Parquet/parquet-mr > >> > > >> > == Source and Intellectual Property Submission Plan == > >> > > >> > The Parquet codebase is currently hosted on Github: > >> > https://github.com/Parquet. > >> > > >> > These are the codebases that we would migrate to the Apache > foundation. > >> > > >> > == External Dependencies == > >> > > >> > > >> > * Junit: EPL > >> > * Apache Commons: ALv2 > >> > * Apache Thrift: ALv2 > >> > * Apache Maven: ALv2 > >> > * Apache Avro: ALv2 > >> > * Apache Hadoop: ALv2 > >> > * Google Guava: ALv2 > >> > * Google Protobuf: New BSD License > >> > > >> > == Cryptography == > >> > > >> > We do not expect Parquet to be a controlled export item due to the > use of > >> > encryption. > >> > > >> > == Required Resources == > >> > > >> > === Mailing lists === > >> > > >> > * priv...@parquet.incubator.apache.org > >> > * comm...@parquet.incubator.apache.org > >> > * d...@parquet.incubator.apache.org > >> > > >> > == Subversion Directory == > >> > > >> > Git is the preferred source control system: > >> > > >> > * git://git.apache.org/parquet-format > >> > * git://git.apache.org/parquet-mr > >> > > >> > == Issue Tracking == > >> > > >> > We'd like to keep using the Git review and issue tracking tools. > >> > Controlling Pull requests closing through git commit messages in > >> > git.apache.org > >> > > >> > == Initial Committers == > >> > > >> > * Aniket Mokashi <aniket...@gmail.com> > >> > * Brock Noland <br...@apache.org> > >> > * Chris Aniszczyk <caniszc...@gmail.com> > >> > * Dmitriy Ryaboy <dvrya...@apache.org> > >> > * Jake Farrell <jfarr...@apache.org> > >> > * Jonathan Coveney <jcove...@gmail.com> > >> > * Julien Le Dem <jul...@apache.org> > >> > * Lukas Nalezenec <lukas.naleze...@gmail.com> > >> > * Marcel Kornacker <mar...@cloudera.com> > >> > * Mickael Lacour > >> > * Nong Li <n...@cloudera.com> > >> > * Remy Pecqueur > >> > * Ryan Blue <b...@cloudera.com> > >> > * Tianshuo Deng <dengtians...@gmail.com> > >> > * Tom White <tomwh...@apache.org> > >> > * Wesley Peck > >> > > >> > == Affiliations == > >> > > >> > * Aniket Mokashi - Twitter > >> > * Brock Noland - Cloudera > >> > * Chris Aniszczyk - Twitter > >> > * Dmitriy Ryaboy - Twitter > >> > * Jake Farrell > >> > * Jonathan Coveney - Twitter > >> > * Julien Le Dem - Twitter > >> > * Lukas Nalezenec > >> > * Marcel Kornacker - Cloudera > >> > * Mickael Lacour - Criteo > >> > * Nong Li - Cloudera > >> > * Remy Pecqueur - Criteo > >> > * Ryan Blue - Cloudera > >> > * Tianshuo Deng - Twitter > >> > * Tom White - Cloudera > >> > * Wesley Peck - ARRIS, Inc. > >> > > >> > == Sponsors == > >> > > >> > === Champion === > >> > > >> > * Todd Lipcon > >> > > >> > === Nominated Mentors === > >> > > >> > * Tom White > >> > * Chris Mattmann > >> > * Jake Farrell > >> > * Roman Shaposhnik > >> > > >> > === Sponsoring Entity === > >> > > >> > The Apache Incubator > >> > > >> > -- > >> > Cheers, > >> > > >> > Chris Aniszczyk > >> > http://aniszczyk.org > >> > +1 512 961 6719 > >> > > >> > >> > >> > >> -- > >> Todd Lipcon > >> Software Engineer, Cloudera > >> > > > > > > > > -- > > Cheers, > > > > Chris Aniszczyk > > http://aniszczyk.org > > +1 512 961 6719 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > -- Cheers, Chris Aniszczyk http://aniszczyk.org +1 512 961 6719