+1
On Tue, Sep 1, 2015 at 7:39 PM, Julian Hyde <jh...@apache.org> wrote: > +1 > > Julian > > > On Sep 1, 2015, at 6:45 AM, Luciano Resende <luckbr1...@gmail.com> > wrote: > > > > On Mon, Aug 31, 2015 at 11:47 AM, Roman Shaposhnik <r...@apache.org> > wrote: > > > >> Following the discussion earlier: > >> http://s.apache.org/Gaf > >> > >> I would like to call a VOTE for accepting HAWQ > >> as a new incubator project. > >> > >> The proposal is available at: > >> https://wiki.apache.org/incubator/HAWQProposal > >> and is also included at the bottom of this email. > >> > >> Vote is open until at least Thu, 3 September 2015, 23:59:00 PST > >> > >> [ ] +1 accept HAWQ into the Apache Incubator > >> [ ] ±0 > >> [ ] -1 because... > >> > >> Thanks, > >> Roman. > >> > >> == Abstract == > >> > >> HAWQ is an advanced enterprise SQL on Hadoop analytic engine built > >> around a robust and high-performance massively-parallel processing > >> (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ. > >> > >> HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating > >> with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as > >> Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and > >> managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL > >> compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP > >> extensions) and supports open database connectivity (ODBC) and Java > >> database connectivity (JDBC), as well. Most business intelligence, > >> data analysis and data visualization tools work with HAWQ out of the > >> box without the need for specialized drivers. > >> > >> A unique aspect of HAWQ is its integration of statistical and machine > >> learning capabilities that can be natively invoked from SQL or (in the > >> context of PL/Python, PL/Java or PL/R) in massively parallel modes and > >> applied to large data sets across a Hadoop cluster. These capabilities > >> are provided through MADlib – an existing open source, parallel > >> machine-learning library. Given the close ties between the two > >> development communities, the MADlib community has expressed interest > >> in joining HAWQ on its journey into the ASF Incubator and will be > >> submitting a separate, concurrent proposal. > >> > >> HAWQ will provide more robust and higher performing options for Hadoop > >> environments that demand best-in-class data analytics for business > >> critical purposes. HAWQ is implemented in C and C++. > >> > >> HAWQ has a few runtime dependencies licensed under the Cat X list: > >> * gperf (GPL Version 3) > >> * libgsasl (LGPL Version 2.1) > >> * libuuid-2.26 (LGPL Version 2) > >> However, given the runtime (dynamic linking) nature of these > >> dependencies it doesn't represent a problem for HAWQ to be considered > >> an ASF project. > >> > >> == Proposal == > >> The goal of this proposal is to bring the core of Pivotal Software, > >> Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software > >> Foundation (ASF) in order to build a vibrant, diverse and > >> self-governed open source community around the technology. Pivotal has > >> agreed to transfer the brand name "HAWQ" to Apache Software Foundation > >> and will stop using HAWQ to refer to this software if the project gets > >> accepted into the ASF Incubator under the name of "Apache HAWQ > >> (incubating)". Pivotal will continue to market and sell an analytic > >> engine product that includes Apache HAWQ (incubating). While HAWQ is > >> our primary choice for a name of the project, in anticipation of any > >> potential issues with PODLINGNAMESEARCH we have come up with two > >> alternative names: (1) Hornet; or (2) Grove. > >> > >> Pivotal is submitting this proposal to donate the HAWQ source code and > >> associated artifacts (documentation, web site content, wiki, etc.) to > >> the Apache Software Foundation Incubator under the Apache License, > >> Version 2.0 and is asking Incubator PMC to establish an open source > >> community. > >> > >> == Background == > >> While the ecosystem of open source SQL-on-Hadoop solutions is fairly > >> developed by now, HAWQ has several unique features that will set it > >> apart from existing ASF and non-ASF projects. HAWQ made its debut in > >> 2013 as a closed source product leveraging a decade's worth of product > >> development effort invested in Greenplum DatabaseⓇ. Since then HAWQ > >> has rapidly gained a solid customer base and became available on > >> non-Pivotal distributions of Hadoop. > >> In 2015 HAWQ still leverages the rock solid foundation of Greenplum > >> Database, while at the same time embracing elasticity and resource > >> management native to Hadoop applications. This allows HAWQ to provide > >> superior SQL on Hadoop performance, scalability and coverage while > >> also providing massively-parallel machine learning capabilities and > >> support for native Hadoop file formats. In addition, HAWQ's advanced > >> features include support for complex joins, rich and compliant SQL > >> dialect and industry-differentiating data federation capabilities. > >> Dynamic pipelining and pluggable query optimizer architecture enable > >> HAWQ to perform queries on Hadoop with the speed and scalability > >> required for enterprise data warehouse (EDW) workloads. HAWQ provides > >> strong support for low-latency analytic SQL queries, coupled with > >> massively parallel machine learning capabilities. This enables > >> discovery-based analysis of large data sets and rapid, iterative > >> development of data analytics applications that apply deep machine > >> learning – significantly shortening data-driven innovation cycles for > >> the enterprise. > >> > >> Hundreds of companies and thousands of servers are running > >> mission-critical applications today on HAWQ managing over PBs of data. > >> > >> == Rationale == > >> Hadoop and HDFS-based data management architectures continue their > >> expansion into the enterprise. As the amount of data stored on Hadoop > >> clusters grows, unlocking the analytics capabilities and democratizing > >> access to that treasure trove of data becomes one of the key concerns. > >> While Hadoop has no shortage of purposefully designed analytical > >> frameworks, the easiest and most cost-effective way to onboard the > >> largest amount of data consumers is provided by offering SQL APIs for > >> data retrieval at scale. Of course, given the high velocity of > >> innovation happening in the underlying Hadoop ecosystem, any > >> SQL-on-Hadoop solution has to keep up with the community. We strongly > >> believe that in the Big Data space, this can be optimally achieved > >> through a vibrant, diverse, self-governed community collectively > >> innovating around a single codebase while at the same time > >> cross-pollinating with various other data management communities. > >> Apache Software Foundation is the ideal place to meet those ambitious > >> goals. We also believe that our initial experience of bringing Pivotal > >> GemfireⓇ into ASF as Apache Geode (incubating) could be leveraged thus > >> improving the chances of HAWQ becoming a vibrant Apache community. > >> > >> == Initial Goals == > >> Our initial goals are to bring HAWQ into the ASF, transition internal > >> engineering processes into the open, and foster a collaborative > >> development model according to the "Apache Way." Pivotal and its > >> partners plan to develop new functionality in an open, > >> community-driven way. To get there, the existing internal build, test > >> and release processes will be refactored to support open development. > >> > >> == Current Status == > >> Currently, the project code base is commercially licensed and is not > >> available to the general public. The documentation and wiki pages are > >> available at FIXME. Although Pivotal HAWQ was developed as a > >> proprietary, closed-source product, its roots are in the PostgreSQL > >> community and the internal engineering practices adopted by the > >> development team lend themselves well to an open, collaborative and > >> meritocratic environment. > >> > >> The Pivotal HAWQ team has always focused on building a robust end user > >> community of paying and non-paying customers. The existing > >> documentation along with StackOverflow and other similar forums are > >> expected to facilitate conversions between our existing users so as to > >> transform them into an active community of HAWQ members, stakeholders > >> and developers. > >> > >> === Meritocracy === > >> Our proposed list of initial committers include the current HAWQ R&D > >> team, Pivotal Field Engineers, and several existing partners. This > >> group will form a base for the broader community we will invite to > >> collaborate on the codebase. We intend to radically expand the initial > >> developer and user community by running the project in accordance with > >> the "Apache Way". Users and new contributors will be treated with > >> respect and welcomed. By participating in the community and providing > >> quality patches/support that move the project forward, contributors > >> will earn merit. They also will be encouraged to provide non-code > >> contributions (documentation, events, community management, etc.) and > >> will gain merit for doing so. Those with a proven support and quality > >> track record will be encouraged to become committers. > >> > >> === Community === > >> If HAWQ is accepted for incubation, the primary initial goal will be > >> transitioning the core community towards embracing the Apache Way of > >> project governance. We would solicit major existing contributors to > >> become committers on the project from the start. > >> > >> === Core Developers === > >> > >> A few of HAWQ's core developers are skilled in working as part of > >> openly governed Apache communities (mainly around Hadoop ecosystem). > >> That said, most of the core developers are currently NOT affiliated > >> with the ASF and would require new ICLAs before committing to the > >> project. > >> > >> === Alignment === > >> The following existing ASF projects can be considered when reviewing > >> HAWQ proposal: > >> > >> Apache Hadoop is a distributed storage and processing framework for > >> very large datasets, focusing primarily on batch processing for > >> analytic purposes. HAWQ builds on top of two key pieces of Hadoop: > >> YARN and HDFS. HAWQ's community roadmap includes plans for > >> contributing Hadoop around HDFS features and increasing support for C > >> and C++ clients. > >> > >> Apache Spark™ is a fast engine for processing large datasets, > >> typically from a Hadoop cluster, and performing batch, streaming, > >> interactive, or machine learning workloads. Recently, Apache Spark > >> has embraced SQL-like APIs around DataFrames at its core. Because of > >> that we would expect a level of collaboration between the two projects > >> when it comes to query optimization and exposing HAWQ tables to Spark > >> analytical pipelines. > >> > >> Apache Hive™ is a data warehouse software that facilitates querying > >> and managing large datasets residing in distributed storage. Hive > >> provides a mechanism to project structure onto this data and query the > >> data using a SQL-like language called HiveQL. Hive is also providing > >> HCatalog capabilities as table and storage management layer for > >> Hadoop, enabling users with different data processing tools to more > >> easily define structure for the data on the grid. Currently the core > >> Hive and HAWQ are viewed as complimentary solutions, but we expect > >> close integration with HCatalog given its dominant position for > >> metadata management on the Hadoop clusters. > >> > >> Apache Phoenix is a high performance relational database layer over > >> HBase for low latency applications. Given Phoenix's exclusive focus on > >> HBase for its data management backend and its overall architecture > >> around HBase's co-processors, it is unlikely that there will be much > >> collaboration between the two projects. > >> > >> == Known Risks == > >> Development has been sponsored mostly by a single company (or its > >> predecessors) thus far and coordinated mainly by the core Pivotal HAWQ > >> team. > >> > >> For the project to fully transition to the Apache Way governance > >> model, development must shift towards the meritocracy-centric model of > >> growing a community of contributors balanced with the needs for > >> extreme stability and core implementation coherency. > >> > >> The tools and development practices in place for the Pivotal HAWQ > >> product are compatible with the ASF infrastructure and thus we do not > >> anticipate any on-boarding pains. > >> > >> The project currently includes a modified version of PostgreSQL 8.3 > >> source code. Given the ASF's position that the PostgreSQL License is > >> compatible with the Apache License version 2.0, we do NOT anticipate > >> any issues with licensing the code base. However, any new capabilities > >> developed by the HAWQ team once part of the ASF would need to be > >> consumed by the PostgreSQL community under the Apache License version > >> 2.0. > >> > >> === Orphaned products === > >> Pivotal is fully committed to maintaining its position as one of the > >> leading providers of SQL-on-Hadoop solutions and the corresponding > >> Pivotal commercial product will continue to be based on the HAWQ > >> project. Moreover, Pivotal has a vested interest in making HAWQ > >> successful by driving its close integration with both existing > >> projects contributed by Pivotal including Apache Geode (incubating) > >> and MADlib (which is requesting Incubation), and sister ASF projects. > >> We expect this to further reduces the risk of orphaning the product. > >> > >> === Inexperience with Open Source === > >> Pivotal has embraced open source software since its formation by > >> employing contributors/committers and by shepherding open source > >> projects like Cloud Foundry, Spring, RabbitMQ and MADlib. Individuals > >> working at Pivotal have experience with the formation of vibrant > >> communities around open technologies with the Cloud Foundry > >> Foundation, and continuing with the creation of a community around > >> Apache Geode (incubating). Although some of the initial committers > >> have not had the experience of developing entirely open source, > >> community-driven projects, we expect to bring to bear the open > >> development practices that have proven successful on longstanding > >> Pivotal open source projects to the HAWQ community. Additionally, > >> several ASF veterans have agreed to mentor the project and are listed > >> in this proposal. The project will rely on their collective guidance > >> and wisdom to quickly transition the entire team of initial committers > >> towards practicing the Apache Way. > >> > >> === Homogeneous Developers === > >> While most of the initial committers are employed by Pivotal, we have > >> already seen a healthy level of interest from existing customers and > >> partners. We intend to convert that interest directly into > >> participation and will be investing in activities to recruit > >> additional committers from other companies. > >> > >> === Reliance on Salaried Developers === > >> Most of the contributors are paid to work in the Big Data space. While > >> they might wander from their current employers, they are unlikely to > >> venture far from their core expertise and thus will continue to be > >> engaged with the project regardless of their current employers. > >> > >> === Relationships with Other Apache Products === > >> As mentioned in the Alignment section, HAWQ may consider various > >> degrees of integration and code exchange with Apache Hadoop, Apache > >> Spark and Apache Hive projects. We expect integration points to be > >> inside and outside the project. We look forward to collaborating with > >> these communities as well as other communities under the Apache > >> umbrella. > >> > >> === An Excessive Fascination with the Apache Brand === > >> While we intend to leverage the Apache ‘branding’ when talking to > >> other projects as testament of our project’s ‘neutrality’, we have no > >> plans for making use of Apache brand in press releases nor posting > >> billboards advertising acceptance of HAWQ into Apache Incubator. > >> > >> == Documentation == > >> The documentation is currently available at > http://hawq.docs.pivotal.io/ > >> > >> == Initial Source == > >> Initial source code will be available immediately after Incubator PMC > >> approves HAWQ joining the Incubator and will be licensed under the > >> Apache License v2. > >> > >> == Source and Intellectual Property Submission Plan == > >> As soon as HAWQ is approved to join the Incubator, the source code > >> will be transitioned via an exhibit to Pivotal's current Software > >> Grant Agreement onto ASF infrastructure and in turn made available > >> under the Apache License, version 2.0. We know of no legal > >> encumberments that would inhibit the transfer of source code to the > >> ASF. > >> > >> == External Dependencies == > >> > >> Runtime dependencies: > >> * gimli (BSD) > >> * openldap (The OpenLDAP Public License) > >> * openssl (OpenSSL License and the Original SSLeay License, BSD style) > >> * proj (MIT) > >> * yaml (Creative Commons Attribution 2.0 License) > >> * python (Python Software Foundation License Version 2) > >> * apr-util (Apache Version 2.0) > >> * bzip2 (BSD-style License) > >> * curl (MIT/X Derivate License) > >> * gperf (GPL Version 3) > >> * protobuf (Google) > >> * libevent (BSD) > >> * json-c (https://github.com/json-c/json-c/blob/master/COPYING) > >> * krb5 (MIT) > >> * pcre (BSD) > >> * libedit (BSD) > >> * libxml2 (MIT) > >> * zlib (Permissive Free Software License) > >> * libgsasl (LGPL Version 2.1) > >> * thrift (Apache Version 2.0) > >> * snappy (Apache Version 2.0 (up to 1.0.1)/New BSD) > >> * libuuid-2.26 (LGPL Version 2) > >> * apache hadoop (Apache Version 2.0) > >> * apache avro (Apache Version 2.0) > >> * glog (BSD) > >> * googlemock (BSD) > >> > >> Build only dependencies: > >> * ant (Apache Version 2.0) > >> * maven (Apache Version 2.0) > >> * cmake (BSD) > >> > >> Test only dependencies: > >> * googletest (BSD) > >> > >> Cryptography N/A > >> > >> == Required Resources == > >> > >> === Mailing lists === > >> * priv...@hawq.incubator.apache.org (moderated subscriptions) > >> * comm...@hawq.incubator.apache.org > >> * d...@hawq.incubator.apache.org > >> * iss...@hawq.incubator.apache.org > >> * u...@hawq.incubator.apache.org > >> > >> === Git Repository === > >> https://git-wip-us.apache.org/repos/asf/incubator-hawq.git > >> > >> === Issue Tracking === > >> JIRA Project HAWQ (HAWQ) > >> > >> === Other Resources === > >> > >> Means of setting up regular builds for HAWQ on builds.apache.org will > >> require integration with Docker support. > >> > >> == Initial Committers == > >> * Lirong Jian > >> * Hubert Huan Zhang > >> * Radar Da Lei > >> * Ivan Yanqing Weng > >> * Zhanwei Wang > >> * Yi Jin > >> * Lili Ma > >> * Jiali Yao > >> * Zhenglin Tao > >> * Ruilong Huo > >> * Ming Li > >> * Wen Lin > >> * Lei Chang > >> * Alexander V Denissov > >> * Newton Alex > >> * Oleksandr Diachenko > >> * Jun Aoki > >> * Bhuvnesh Chaudhary > >> * Vineet Goel > >> * Shivram Mani > >> * Noa Horn > >> * Sujeet S Varakhedi > >> * Junwei (Jimmy) Da > >> * Ting (Goden) Yao > >> * Mohammad F (Foyzur) Rahman > >> * Entong Shen > >> * George C Caragea > >> * Amr El-Helw > >> * Mohamed F Soliman > >> * Venkatesh (Venky) Raghavan > >> * Carlos Garcia > >> * Zixi (Jesse) Zhang > >> * Michael P Schubert > >> * C.J. Jameson > >> * Jacob Frank > >> * Ben Calegari > >> * Shoabe Shariff > >> * Rob Day-Reynolds > >> * Mel S Kiyama > >> * Charles Alan Litzell > >> * David Yozie > >> * Ed Espino > >> * Caleb Welton > >> * Parham Parvizi > >> * Dan Baskette > >> * Christian Tzolov > >> * Tushar Pednekar > >> * Greg Chase > >> * Chloe Jackson > >> * Michael Nixon > >> * Roman Shaposhnik > >> * Alan Gates > >> * Owen O'Malley > >> * Thejas Nair > >> * Don Bosco Durai > >> * Konstantin Boudnik > >> * Sergey Soldatov > >> * Atri Sharma > >> > >> == Affiliations == > >> * Barclays: Atri Sharma > >> * Bloomberg: Justin Erenkrantz > >> * Hortonworks: Alan Gates, Owen O'Malley, Thejas Nair, Don Bosco Durai > >> * WANDisco: Konstantin Boudnik, Sergey Soldatov > >> * Pivotal: everyone else on this proposal > >> > >> == Sponsors == > >> > >> === Champion === > >> Roman Shaposhnik > >> > >> === Nominated Mentors === > >> > >> The initial mentors are listed below: > >> * Alan Gates - Apache Member, Hortonworks > >> * Owen O'Malley - Apache Member, Hortonworks > >> * Thejas Nair - Apache Member, Hortonworks > >> * Konstantin Boudnik - Apache Member, WANDisco > >> * Roman Shaposhnik - Apache Member, Pivotal > >> * Justin Erenkrantz - Apache Member, Bloomberg > >> > >> === Sponsoring Entity === > >> We would like to propose Apache incubator to sponsor this project. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >> For additional commands, e-mail: general-h...@incubator.apache.org > >> > >> > > +1 accept HAWQ into the Apache Incubator > > > > -- > > Luciano Resende > > http://people.apache.org/~lresende > > http://twitter.com/lresende1975 > > http://lresende.blogspot.com/ > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >