+1 Julian
> On Sep 1, 2015, at 6:45 AM, Luciano Resende <luckbr1...@gmail.com> wrote: > > On Mon, Aug 31, 2015 at 11:47 AM, Roman Shaposhnik <r...@apache.org> wrote: > >> Following the discussion earlier: >> http://s.apache.org/Gaf >> >> I would like to call a VOTE for accepting HAWQ >> as a new incubator project. >> >> The proposal is available at: >> https://wiki.apache.org/incubator/HAWQProposal >> and is also included at the bottom of this email. >> >> Vote is open until at least Thu, 3 September 2015, 23:59:00 PST >> >> [ ] +1 accept HAWQ into the Apache Incubator >> [ ] ±0 >> [ ] -1 because... >> >> Thanks, >> Roman. >> >> == Abstract == >> >> HAWQ is an advanced enterprise SQL on Hadoop analytic engine built >> around a robust and high-performance massively-parallel processing >> (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ. >> >> HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating >> with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as >> Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and >> managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL >> compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP >> extensions) and supports open database connectivity (ODBC) and Java >> database connectivity (JDBC), as well. Most business intelligence, >> data analysis and data visualization tools work with HAWQ out of the >> box without the need for specialized drivers. >> >> A unique aspect of HAWQ is its integration of statistical and machine >> learning capabilities that can be natively invoked from SQL or (in the >> context of PL/Python, PL/Java or PL/R) in massively parallel modes and >> applied to large data sets across a Hadoop cluster. These capabilities >> are provided through MADlib – an existing open source, parallel >> machine-learning library. Given the close ties between the two >> development communities, the MADlib community has expressed interest >> in joining HAWQ on its journey into the ASF Incubator and will be >> submitting a separate, concurrent proposal. >> >> HAWQ will provide more robust and higher performing options for Hadoop >> environments that demand best-in-class data analytics for business >> critical purposes. HAWQ is implemented in C and C++. >> >> HAWQ has a few runtime dependencies licensed under the Cat X list: >> * gperf (GPL Version 3) >> * libgsasl (LGPL Version 2.1) >> * libuuid-2.26 (LGPL Version 2) >> However, given the runtime (dynamic linking) nature of these >> dependencies it doesn't represent a problem for HAWQ to be considered >> an ASF project. >> >> == Proposal == >> The goal of this proposal is to bring the core of Pivotal Software, >> Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software >> Foundation (ASF) in order to build a vibrant, diverse and >> self-governed open source community around the technology. Pivotal has >> agreed to transfer the brand name "HAWQ" to Apache Software Foundation >> and will stop using HAWQ to refer to this software if the project gets >> accepted into the ASF Incubator under the name of "Apache HAWQ >> (incubating)". Pivotal will continue to market and sell an analytic >> engine product that includes Apache HAWQ (incubating). While HAWQ is >> our primary choice for a name of the project, in anticipation of any >> potential issues with PODLINGNAMESEARCH we have come up with two >> alternative names: (1) Hornet; or (2) Grove. >> >> Pivotal is submitting this proposal to donate the HAWQ source code and >> associated artifacts (documentation, web site content, wiki, etc.) to >> the Apache Software Foundation Incubator under the Apache License, >> Version 2.0 and is asking Incubator PMC to establish an open source >> community. >> >> == Background == >> While the ecosystem of open source SQL-on-Hadoop solutions is fairly >> developed by now, HAWQ has several unique features that will set it >> apart from existing ASF and non-ASF projects. HAWQ made its debut in >> 2013 as a closed source product leveraging a decade's worth of product >> development effort invested in Greenplum DatabaseⓇ. Since then HAWQ >> has rapidly gained a solid customer base and became available on >> non-Pivotal distributions of Hadoop. >> In 2015 HAWQ still leverages the rock solid foundation of Greenplum >> Database, while at the same time embracing elasticity and resource >> management native to Hadoop applications. This allows HAWQ to provide >> superior SQL on Hadoop performance, scalability and coverage while >> also providing massively-parallel machine learning capabilities and >> support for native Hadoop file formats. In addition, HAWQ's advanced >> features include support for complex joins, rich and compliant SQL >> dialect and industry-differentiating data federation capabilities. >> Dynamic pipelining and pluggable query optimizer architecture enable >> HAWQ to perform queries on Hadoop with the speed and scalability >> required for enterprise data warehouse (EDW) workloads. HAWQ provides >> strong support for low-latency analytic SQL queries, coupled with >> massively parallel machine learning capabilities. This enables >> discovery-based analysis of large data sets and rapid, iterative >> development of data analytics applications that apply deep machine >> learning – significantly shortening data-driven innovation cycles for >> the enterprise. >> >> Hundreds of companies and thousands of servers are running >> mission-critical applications today on HAWQ managing over PBs of data. >> >> == Rationale == >> Hadoop and HDFS-based data management architectures continue their >> expansion into the enterprise. As the amount of data stored on Hadoop >> clusters grows, unlocking the analytics capabilities and democratizing >> access to that treasure trove of data becomes one of the key concerns. >> While Hadoop has no shortage of purposefully designed analytical >> frameworks, the easiest and most cost-effective way to onboard the >> largest amount of data consumers is provided by offering SQL APIs for >> data retrieval at scale. Of course, given the high velocity of >> innovation happening in the underlying Hadoop ecosystem, any >> SQL-on-Hadoop solution has to keep up with the community. We strongly >> believe that in the Big Data space, this can be optimally achieved >> through a vibrant, diverse, self-governed community collectively >> innovating around a single codebase while at the same time >> cross-pollinating with various other data management communities. >> Apache Software Foundation is the ideal place to meet those ambitious >> goals. We also believe that our initial experience of bringing Pivotal >> GemfireⓇ into ASF as Apache Geode (incubating) could be leveraged thus >> improving the chances of HAWQ becoming a vibrant Apache community. >> >> == Initial Goals == >> Our initial goals are to bring HAWQ into the ASF, transition internal >> engineering processes into the open, and foster a collaborative >> development model according to the "Apache Way." Pivotal and its >> partners plan to develop new functionality in an open, >> community-driven way. To get there, the existing internal build, test >> and release processes will be refactored to support open development. >> >> == Current Status == >> Currently, the project code base is commercially licensed and is not >> available to the general public. The documentation and wiki pages are >> available at FIXME. Although Pivotal HAWQ was developed as a >> proprietary, closed-source product, its roots are in the PostgreSQL >> community and the internal engineering practices adopted by the >> development team lend themselves well to an open, collaborative and >> meritocratic environment. >> >> The Pivotal HAWQ team has always focused on building a robust end user >> community of paying and non-paying customers. The existing >> documentation along with StackOverflow and other similar forums are >> expected to facilitate conversions between our existing users so as to >> transform them into an active community of HAWQ members, stakeholders >> and developers. >> >> === Meritocracy === >> Our proposed list of initial committers include the current HAWQ R&D >> team, Pivotal Field Engineers, and several existing partners. This >> group will form a base for the broader community we will invite to >> collaborate on the codebase. We intend to radically expand the initial >> developer and user community by running the project in accordance with >> the "Apache Way". Users and new contributors will be treated with >> respect and welcomed. By participating in the community and providing >> quality patches/support that move the project forward, contributors >> will earn merit. They also will be encouraged to provide non-code >> contributions (documentation, events, community management, etc.) and >> will gain merit for doing so. Those with a proven support and quality >> track record will be encouraged to become committers. >> >> === Community === >> If HAWQ is accepted for incubation, the primary initial goal will be >> transitioning the core community towards embracing the Apache Way of >> project governance. We would solicit major existing contributors to >> become committers on the project from the start. >> >> === Core Developers === >> >> A few of HAWQ's core developers are skilled in working as part of >> openly governed Apache communities (mainly around Hadoop ecosystem). >> That said, most of the core developers are currently NOT affiliated >> with the ASF and would require new ICLAs before committing to the >> project. >> >> === Alignment === >> The following existing ASF projects can be considered when reviewing >> HAWQ proposal: >> >> Apache Hadoop is a distributed storage and processing framework for >> very large datasets, focusing primarily on batch processing for >> analytic purposes. HAWQ builds on top of two key pieces of Hadoop: >> YARN and HDFS. HAWQ's community roadmap includes plans for >> contributing Hadoop around HDFS features and increasing support for C >> and C++ clients. >> >> Apache Spark™ is a fast engine for processing large datasets, >> typically from a Hadoop cluster, and performing batch, streaming, >> interactive, or machine learning workloads. Recently, Apache Spark >> has embraced SQL-like APIs around DataFrames at its core. Because of >> that we would expect a level of collaboration between the two projects >> when it comes to query optimization and exposing HAWQ tables to Spark >> analytical pipelines. >> >> Apache Hive™ is a data warehouse software that facilitates querying >> and managing large datasets residing in distributed storage. Hive >> provides a mechanism to project structure onto this data and query the >> data using a SQL-like language called HiveQL. Hive is also providing >> HCatalog capabilities as table and storage management layer for >> Hadoop, enabling users with different data processing tools to more >> easily define structure for the data on the grid. Currently the core >> Hive and HAWQ are viewed as complimentary solutions, but we expect >> close integration with HCatalog given its dominant position for >> metadata management on the Hadoop clusters. >> >> Apache Phoenix is a high performance relational database layer over >> HBase for low latency applications. Given Phoenix's exclusive focus on >> HBase for its data management backend and its overall architecture >> around HBase's co-processors, it is unlikely that there will be much >> collaboration between the two projects. >> >> == Known Risks == >> Development has been sponsored mostly by a single company (or its >> predecessors) thus far and coordinated mainly by the core Pivotal HAWQ >> team. >> >> For the project to fully transition to the Apache Way governance >> model, development must shift towards the meritocracy-centric model of >> growing a community of contributors balanced with the needs for >> extreme stability and core implementation coherency. >> >> The tools and development practices in place for the Pivotal HAWQ >> product are compatible with the ASF infrastructure and thus we do not >> anticipate any on-boarding pains. >> >> The project currently includes a modified version of PostgreSQL 8.3 >> source code. Given the ASF's position that the PostgreSQL License is >> compatible with the Apache License version 2.0, we do NOT anticipate >> any issues with licensing the code base. However, any new capabilities >> developed by the HAWQ team once part of the ASF would need to be >> consumed by the PostgreSQL community under the Apache License version >> 2.0. >> >> === Orphaned products === >> Pivotal is fully committed to maintaining its position as one of the >> leading providers of SQL-on-Hadoop solutions and the corresponding >> Pivotal commercial product will continue to be based on the HAWQ >> project. Moreover, Pivotal has a vested interest in making HAWQ >> successful by driving its close integration with both existing >> projects contributed by Pivotal including Apache Geode (incubating) >> and MADlib (which is requesting Incubation), and sister ASF projects. >> We expect this to further reduces the risk of orphaning the product. >> >> === Inexperience with Open Source === >> Pivotal has embraced open source software since its formation by >> employing contributors/committers and by shepherding open source >> projects like Cloud Foundry, Spring, RabbitMQ and MADlib. Individuals >> working at Pivotal have experience with the formation of vibrant >> communities around open technologies with the Cloud Foundry >> Foundation, and continuing with the creation of a community around >> Apache Geode (incubating). Although some of the initial committers >> have not had the experience of developing entirely open source, >> community-driven projects, we expect to bring to bear the open >> development practices that have proven successful on longstanding >> Pivotal open source projects to the HAWQ community. Additionally, >> several ASF veterans have agreed to mentor the project and are listed >> in this proposal. The project will rely on their collective guidance >> and wisdom to quickly transition the entire team of initial committers >> towards practicing the Apache Way. >> >> === Homogeneous Developers === >> While most of the initial committers are employed by Pivotal, we have >> already seen a healthy level of interest from existing customers and >> partners. We intend to convert that interest directly into >> participation and will be investing in activities to recruit >> additional committers from other companies. >> >> === Reliance on Salaried Developers === >> Most of the contributors are paid to work in the Big Data space. While >> they might wander from their current employers, they are unlikely to >> venture far from their core expertise and thus will continue to be >> engaged with the project regardless of their current employers. >> >> === Relationships with Other Apache Products === >> As mentioned in the Alignment section, HAWQ may consider various >> degrees of integration and code exchange with Apache Hadoop, Apache >> Spark and Apache Hive projects. We expect integration points to be >> inside and outside the project. We look forward to collaborating with >> these communities as well as other communities under the Apache >> umbrella. >> >> === An Excessive Fascination with the Apache Brand === >> While we intend to leverage the Apache ‘branding’ when talking to >> other projects as testament of our project’s ‘neutrality’, we have no >> plans for making use of Apache brand in press releases nor posting >> billboards advertising acceptance of HAWQ into Apache Incubator. >> >> == Documentation == >> The documentation is currently available at http://hawq.docs.pivotal.io/ >> >> == Initial Source == >> Initial source code will be available immediately after Incubator PMC >> approves HAWQ joining the Incubator and will be licensed under the >> Apache License v2. >> >> == Source and Intellectual Property Submission Plan == >> As soon as HAWQ is approved to join the Incubator, the source code >> will be transitioned via an exhibit to Pivotal's current Software >> Grant Agreement onto ASF infrastructure and in turn made available >> under the Apache License, version 2.0. We know of no legal >> encumberments that would inhibit the transfer of source code to the >> ASF. >> >> == External Dependencies == >> >> Runtime dependencies: >> * gimli (BSD) >> * openldap (The OpenLDAP Public License) >> * openssl (OpenSSL License and the Original SSLeay License, BSD style) >> * proj (MIT) >> * yaml (Creative Commons Attribution 2.0 License) >> * python (Python Software Foundation License Version 2) >> * apr-util (Apache Version 2.0) >> * bzip2 (BSD-style License) >> * curl (MIT/X Derivate License) >> * gperf (GPL Version 3) >> * protobuf (Google) >> * libevent (BSD) >> * json-c (https://github.com/json-c/json-c/blob/master/COPYING) >> * krb5 (MIT) >> * pcre (BSD) >> * libedit (BSD) >> * libxml2 (MIT) >> * zlib (Permissive Free Software License) >> * libgsasl (LGPL Version 2.1) >> * thrift (Apache Version 2.0) >> * snappy (Apache Version 2.0 (up to 1.0.1)/New BSD) >> * libuuid-2.26 (LGPL Version 2) >> * apache hadoop (Apache Version 2.0) >> * apache avro (Apache Version 2.0) >> * glog (BSD) >> * googlemock (BSD) >> >> Build only dependencies: >> * ant (Apache Version 2.0) >> * maven (Apache Version 2.0) >> * cmake (BSD) >> >> Test only dependencies: >> * googletest (BSD) >> >> Cryptography N/A >> >> == Required Resources == >> >> === Mailing lists === >> * priv...@hawq.incubator.apache.org (moderated subscriptions) >> * comm...@hawq.incubator.apache.org >> * d...@hawq.incubator.apache.org >> * iss...@hawq.incubator.apache.org >> * u...@hawq.incubator.apache.org >> >> === Git Repository === >> https://git-wip-us.apache.org/repos/asf/incubator-hawq.git >> >> === Issue Tracking === >> JIRA Project HAWQ (HAWQ) >> >> === Other Resources === >> >> Means of setting up regular builds for HAWQ on builds.apache.org will >> require integration with Docker support. >> >> == Initial Committers == >> * Lirong Jian >> * Hubert Huan Zhang >> * Radar Da Lei >> * Ivan Yanqing Weng >> * Zhanwei Wang >> * Yi Jin >> * Lili Ma >> * Jiali Yao >> * Zhenglin Tao >> * Ruilong Huo >> * Ming Li >> * Wen Lin >> * Lei Chang >> * Alexander V Denissov >> * Newton Alex >> * Oleksandr Diachenko >> * Jun Aoki >> * Bhuvnesh Chaudhary >> * Vineet Goel >> * Shivram Mani >> * Noa Horn >> * Sujeet S Varakhedi >> * Junwei (Jimmy) Da >> * Ting (Goden) Yao >> * Mohammad F (Foyzur) Rahman >> * Entong Shen >> * George C Caragea >> * Amr El-Helw >> * Mohamed F Soliman >> * Venkatesh (Venky) Raghavan >> * Carlos Garcia >> * Zixi (Jesse) Zhang >> * Michael P Schubert >> * C.J. Jameson >> * Jacob Frank >> * Ben Calegari >> * Shoabe Shariff >> * Rob Day-Reynolds >> * Mel S Kiyama >> * Charles Alan Litzell >> * David Yozie >> * Ed Espino >> * Caleb Welton >> * Parham Parvizi >> * Dan Baskette >> * Christian Tzolov >> * Tushar Pednekar >> * Greg Chase >> * Chloe Jackson >> * Michael Nixon >> * Roman Shaposhnik >> * Alan Gates >> * Owen O'Malley >> * Thejas Nair >> * Don Bosco Durai >> * Konstantin Boudnik >> * Sergey Soldatov >> * Atri Sharma >> >> == Affiliations == >> * Barclays: Atri Sharma >> * Bloomberg: Justin Erenkrantz >> * Hortonworks: Alan Gates, Owen O'Malley, Thejas Nair, Don Bosco Durai >> * WANDisco: Konstantin Boudnik, Sergey Soldatov >> * Pivotal: everyone else on this proposal >> >> == Sponsors == >> >> === Champion === >> Roman Shaposhnik >> >> === Nominated Mentors === >> >> The initial mentors are listed below: >> * Alan Gates - Apache Member, Hortonworks >> * Owen O'Malley - Apache Member, Hortonworks >> * Thejas Nair - Apache Member, Hortonworks >> * Konstantin Boudnik - Apache Member, WANDisco >> * Roman Shaposhnik - Apache Member, Pivotal >> * Justin Erenkrantz - Apache Member, Bloomberg >> >> === Sponsoring Entity === >> We would like to propose Apache incubator to sponsor this project. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> > +1 accept HAWQ into the Apache Incubator > > -- > Luciano Resende > http://people.apache.org/~lresende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org