Re: [VOTE] Accept HAWQ into the Apache Incubator

Christian Tzolov Tue, 01 Sep 2015 16:33:44 -0700

+1


On Tue, Sep 1, 2015 at 7:39 PM, Julian Hyde <jh...@apache.org> wrote:

> +1
>
> Julian
>
> > On Sep 1, 2015, at 6:45 AM, Luciano Resende <luckbr1...@gmail.com>
> wrote:
> >
> > On Mon, Aug 31, 2015 at 11:47 AM, Roman Shaposhnik <r...@apache.org>
> wrote:
> >
> >> Following the discussion earlier:
> >>   http://s.apache.org/Gaf
> >>
> >> I would like to call a VOTE for accepting HAWQ
> >> as a new incubator project.
> >>
> >> The proposal is available at:
> >>    https://wiki.apache.org/incubator/HAWQProposal
> >> and is also included at the bottom of this email.
> >>
> >> Vote is open until at least Thu, 3 September 2015, 23:59:00 PST
> >>
> >> [ ] +1 accept HAWQ into the Apache Incubator
> >> [ ] ±0
> >> [ ] -1 because...
> >>
> >> Thanks,
> >> Roman.
> >>
> >> == Abstract ==
> >>
> >> HAWQ is an advanced enterprise SQL on Hadoop analytic engine built
> >> around a robust and high-performance massively-parallel processing
> >> (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ.
> >>
> >> HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating
> >> with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as
> >> Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and
> >> managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL
> >> compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP
> >> extensions) and supports open database connectivity (ODBC) and Java
> >> database connectivity (JDBC), as well. Most business intelligence,
> >> data analysis and data visualization tools work with HAWQ out of the
> >> box without the need for specialized drivers.
> >>
> >> A unique aspect of HAWQ is its integration of statistical and machine
> >> learning capabilities that can be natively invoked from SQL or (in the
> >> context of PL/Python, PL/Java or PL/R) in massively parallel modes and
> >> applied to large data sets across a Hadoop cluster. These capabilities
> >> are provided through MADlib – an existing open source, parallel
> >> machine-learning library. Given the close ties between the two
> >> development communities, the MADlib community has expressed interest
> >> in joining HAWQ on its journey into the ASF Incubator and will be
> >> submitting a separate, concurrent proposal.
> >>
> >> HAWQ will provide more robust and higher performing options for Hadoop
> >> environments that demand best-in-class data analytics for business
> >> critical purposes. HAWQ is implemented in C and C++.
> >>
> >> HAWQ has a few runtime dependencies licensed under the Cat X list:
> >>  * gperf (GPL Version 3)
> >>  * libgsasl (LGPL Version 2.1)
> >>  * libuuid-2.26 (LGPL Version 2)
> >> However, given the runtime (dynamic linking) nature of these
> >> dependencies it doesn't represent a problem for HAWQ to be considered
> >> an ASF project.
> >>
> >> == Proposal ==
> >> The goal of this proposal is to bring the core of Pivotal Software,
> >> Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software
> >> Foundation (ASF) in order to build a vibrant, diverse and
> >> self-governed open source community around the technology. Pivotal has
> >> agreed to transfer the brand name "HAWQ" to Apache Software Foundation
> >> and will stop using HAWQ to refer to this software if the project gets
> >> accepted into the ASF Incubator under the name of "Apache HAWQ
> >> (incubating)". Pivotal will continue to market and sell an analytic
> >> engine product that includes Apache HAWQ (incubating). While HAWQ is
> >> our primary choice for a name of the project, in anticipation of any
> >> potential issues with PODLINGNAMESEARCH we have come up with two
> >> alternative names: (1) Hornet; or (2) Grove.
> >>
> >> Pivotal is submitting this proposal to donate the HAWQ source code and
> >> associated artifacts (documentation, web site content, wiki, etc.) to
> >> the Apache Software Foundation Incubator under the Apache License,
> >> Version 2.0 and is asking Incubator PMC to establish an open source
> >> community.
> >>
> >> == Background ==
> >> While the ecosystem of open source SQL-on-Hadoop solutions is fairly
> >> developed by now, HAWQ has several unique features that will set it
> >> apart from existing ASF and non-ASF projects. HAWQ made its debut in
> >> 2013 as a closed source product leveraging a decade's worth of product
> >> development effort invested in Greenplum DatabaseⓇ. Since then HAWQ
> >> has rapidly gained a solid customer base and became available on
> >> non-Pivotal distributions of Hadoop.
> >> In 2015 HAWQ still leverages the rock solid foundation of Greenplum
> >> Database, while at the same time embracing elasticity and resource
> >> management native to Hadoop applications. This allows HAWQ to provide
> >> superior SQL on Hadoop performance, scalability and coverage while
> >> also providing massively-parallel machine learning capabilities and
> >> support for native Hadoop file formats. In addition, HAWQ's advanced
> >> features include support for complex joins, rich and compliant SQL
> >> dialect and industry-differentiating data federation capabilities.
> >> Dynamic pipelining and pluggable query optimizer architecture enable
> >> HAWQ to perform queries on Hadoop with the speed and scalability
> >> required for enterprise data warehouse (EDW) workloads. HAWQ provides
> >> strong support for low-latency analytic SQL queries, coupled with
> >> massively parallel machine learning capabilities. This enables
> >> discovery-based analysis of large data sets and rapid, iterative
> >> development of data analytics applications that apply deep machine
> >> learning – significantly shortening data-driven innovation cycles for
> >> the enterprise.
> >>
> >> Hundreds of companies and thousands of servers are running
> >> mission-critical applications today on HAWQ managing over PBs of data.
> >>
> >> == Rationale ==
> >> Hadoop and HDFS-based data management architectures continue their
> >> expansion into the enterprise. As the amount of data stored on Hadoop
> >> clusters grows, unlocking the analytics capabilities and democratizing
> >> access to that treasure trove of data becomes one of the key concerns.
> >> While Hadoop has no shortage of purposefully designed analytical
> >> frameworks, the easiest and most cost-effective way to onboard the
> >> largest amount of data consumers is provided by offering SQL APIs for
> >> data retrieval at scale. Of course, given the high velocity of
> >> innovation happening in the underlying Hadoop ecosystem, any
> >> SQL-on-Hadoop solution has to keep up with the community. We strongly
> >> believe that in the Big Data space, this can be optimally achieved
> >> through a vibrant, diverse, self-governed community collectively
> >> innovating around a single codebase while at the same time
> >> cross-pollinating with various other data management communities.
> >> Apache Software Foundation is the ideal place to meet those ambitious
> >> goals. We also believe that our initial experience of bringing Pivotal
> >> GemfireⓇ into ASF as Apache Geode (incubating) could be leveraged thus
> >> improving the chances of HAWQ becoming a vibrant Apache community.
> >>
> >> == Initial Goals ==
> >> Our initial goals are to bring HAWQ into the ASF, transition internal
> >> engineering processes into the open, and foster a collaborative
> >> development model according to the "Apache Way." Pivotal and its
> >> partners plan to develop new functionality in an open,
> >> community-driven way. To get there, the existing internal build, test
> >> and release processes will be refactored to support open development.
> >>
> >> == Current Status ==
> >> Currently, the project code base is commercially licensed and is not
> >> available to the general public. The documentation and wiki pages are
> >> available at FIXME. Although Pivotal HAWQ was developed as a
> >> proprietary, closed-source product, its roots are in the PostgreSQL
> >> community and the internal engineering practices adopted by the
> >> development team lend themselves well to an open, collaborative and
> >> meritocratic environment.
> >>
> >> The Pivotal HAWQ team has always focused on building a robust end user
> >> community of paying and non-paying customers. The existing
> >> documentation along with StackOverflow and other similar forums are
> >> expected to facilitate conversions between our existing users so as to
> >> transform them into an active community of HAWQ members, stakeholders
> >> and developers.
> >>
> >> === Meritocracy ===
> >> Our proposed list of initial committers include the current HAWQ R&D
> >> team, Pivotal Field Engineers, and several existing partners. This
> >> group will form a base for the broader community we will invite to
> >> collaborate on the codebase. We intend to radically expand the initial
> >> developer and user community by running the project in accordance with
> >> the "Apache Way". Users and new contributors will be treated with
> >> respect and welcomed. By participating in the community and providing
> >> quality patches/support that move the project forward, contributors
> >> will earn merit. They also will be encouraged to provide non-code
> >> contributions (documentation, events, community management, etc.) and
> >> will gain merit for doing so. Those with a proven support and quality
> >> track record will be encouraged to become committers.
> >>
> >> === Community ===
> >> If HAWQ is accepted for incubation, the primary initial goal will be
> >> transitioning the core community towards embracing the Apache Way of
> >> project governance. We would solicit major existing contributors to
> >> become committers on the project from the start.
> >>
> >> === Core Developers ===
> >>
> >> A few of HAWQ's core developers are skilled in working as part of
> >> openly governed Apache communities (mainly around Hadoop ecosystem).
> >> That said, most of the core developers are currently NOT affiliated
> >> with the ASF and would require new ICLAs before committing to the
> >> project.
> >>
> >> === Alignment ===
> >> The following existing ASF projects can be considered when reviewing
> >> HAWQ proposal:
> >>
> >> Apache Hadoop is a distributed storage and processing framework for
> >> very large datasets, focusing primarily on batch processing for
> >> analytic purposes. HAWQ builds on top of two key pieces of Hadoop:
> >> YARN and HDFS. HAWQ's community roadmap includes plans for
> >> contributing Hadoop around HDFS features and increasing support for C
> >> and C++ clients.
> >>
> >> Apache Spark™ is a fast engine for processing large datasets,
> >> typically from a Hadoop cluster, and performing batch, streaming,
> >> interactive, or machine learning workloads.  Recently, Apache Spark
> >> has embraced SQL-like APIs around DataFrames at its core. Because of
> >> that we would expect a level of collaboration between the two projects
> >> when it comes to query optimization and exposing HAWQ tables to Spark
> >> analytical pipelines.
> >>
> >> Apache Hive™ is a data warehouse software that facilitates querying
> >> and managing large datasets residing in distributed storage. Hive
> >> provides a mechanism to project structure onto this data and query the
> >> data using a SQL-like language called HiveQL. Hive is also providing
> >> HCatalog capabilities as table and storage management layer for
> >> Hadoop, enabling users with different data processing tools to more
> >> easily define structure for the data on the grid. Currently the core
> >> Hive and HAWQ are viewed as complimentary solutions, but we expect
> >> close integration with HCatalog given its dominant position for
> >> metadata management on the Hadoop clusters.
> >>
> >> Apache Phoenix is a high performance relational database layer over
> >> HBase for low latency applications. Given Phoenix's exclusive focus on
> >> HBase for its data management backend and its overall architecture
> >> around HBase's co-processors, it is unlikely that there will be much
> >> collaboration between the two projects.
> >>
> >> == Known Risks ==
> >> Development has been sponsored mostly by a single company (or its
> >> predecessors) thus far and coordinated mainly by the core Pivotal HAWQ
> >> team.
> >>
> >> For the project to fully transition to the Apache Way governance
> >> model, development must shift towards the meritocracy-centric model of
> >> growing a community of contributors balanced with the needs for
> >> extreme stability and core implementation coherency.
> >>
> >> The tools and development practices in place for the Pivotal HAWQ
> >> product are compatible with the ASF infrastructure and thus we do not
> >> anticipate any on-boarding pains.
> >>
> >> The project currently includes a modified version of PostgreSQL 8.3
> >> source code. Given the ASF's position that the PostgreSQL License is
> >> compatible with the Apache License version 2.0, we do NOT anticipate
> >> any issues with licensing the code base. However, any new capabilities
> >> developed by the HAWQ team once part of the ASF would need to be
> >> consumed by the PostgreSQL community under the Apache License version
> >> 2.0.
> >>
> >> === Orphaned products ===
> >> Pivotal is fully committed to maintaining its position as one of the
> >> leading providers of SQL-on-Hadoop solutions and the corresponding
> >> Pivotal commercial product will continue to be based on the HAWQ
> >> project. Moreover, Pivotal has a vested interest in making HAWQ
> >> successful by driving its close integration with both existing
> >> projects contributed by Pivotal including Apache Geode (incubating)
> >> and MADlib (which is requesting Incubation), and sister ASF projects.
> >> We expect this to further reduces the risk of orphaning the product.
> >>
> >> === Inexperience with Open Source ===
> >> Pivotal has embraced open source software since its formation by
> >> employing contributors/committers and by shepherding open source
> >> projects like Cloud Foundry, Spring, RabbitMQ and MADlib. Individuals
> >> working at Pivotal have experience with the formation of vibrant
> >> communities around open technologies with the Cloud Foundry
> >> Foundation, and continuing with the creation of a community around
> >> Apache Geode (incubating).  Although some of the initial committers
> >> have not had the experience of developing entirely open source,
> >> community-driven projects, we expect to bring to bear the open
> >> development practices that have proven successful on longstanding
> >> Pivotal open source projects to the HAWQ community.  Additionally,
> >> several ASF veterans have agreed to mentor the project and are listed
> >> in this proposal. The project will rely on their collective guidance
> >> and wisdom to quickly transition the entire team of initial committers
> >> towards practicing the Apache Way.
> >>
> >> === Homogeneous Developers ===
> >> While most of the initial committers are employed by Pivotal, we have
> >> already seen a healthy level of interest from existing customers and
> >> partners. We intend to convert that interest directly into
> >> participation and will be investing in activities to recruit
> >> additional committers from other companies.
> >>
> >> === Reliance on Salaried Developers ===
> >> Most of the contributors are paid to work in the Big Data space. While
> >> they might wander from their current employers, they are unlikely to
> >> venture far from their core expertise and thus will continue to be
> >> engaged with the project regardless of their current employers.
> >>
> >> === Relationships with Other Apache Products ===
> >> As mentioned in the Alignment section, HAWQ may consider various
> >> degrees of integration and code exchange with Apache Hadoop, Apache
> >> Spark and Apache Hive projects. We expect integration points to be
> >> inside and outside the project. We look forward to collaborating with
> >> these communities as well as other communities under the Apache
> >> umbrella.
> >>
> >> === An Excessive Fascination with the Apache Brand ===
> >> While we intend to leverage the Apache ‘branding’ when talking to
> >> other projects as testament of our project’s ‘neutrality’, we have no
> >> plans for making use of Apache brand in press releases nor posting
> >> billboards advertising acceptance of HAWQ into Apache Incubator.
> >>
> >> == Documentation ==
> >> The documentation is currently available at
> http://hawq.docs.pivotal.io/
> >>
> >> == Initial Source ==
> >> Initial source code will be available immediately after Incubator PMC
> >> approves HAWQ joining the Incubator and will be licensed under the
> >> Apache License v2.
> >>
> >> == Source and Intellectual Property Submission Plan ==
> >> As soon as HAWQ is approved to join the Incubator, the source code
> >> will be transitioned via an exhibit to Pivotal's current Software
> >> Grant Agreement onto ASF infrastructure and in turn made available
> >> under the Apache License, version 2.0.  We know of no legal
> >> encumberments that would inhibit the transfer of source code to the
> >> ASF.
> >>
> >> == External Dependencies ==
> >>
> >> Runtime dependencies:
> >>  * gimli (BSD)
> >>  * openldap (The OpenLDAP Public License)
> >>  * openssl (OpenSSL License and the Original SSLeay License, BSD style)
> >>  * proj (MIT)
> >>  * yaml (Creative Commons Attribution 2.0 License)
> >>  * python (Python Software Foundation License Version 2)
> >>  * apr-util (Apache Version 2.0)
> >>  * bzip2 (BSD-style License)
> >>  * curl (MIT/X Derivate License)
> >>  * gperf (GPL Version 3)
> >>  * protobuf (Google)
> >>  * libevent (BSD)
> >>  * json-c (https://github.com/json-c/json-c/blob/master/COPYING)
> >>  * krb5 (MIT)
> >>  * pcre (BSD)
> >>  * libedit (BSD)
> >>  * libxml2 (MIT)
> >>  * zlib (Permissive Free Software License)
> >>  * libgsasl (LGPL Version 2.1)
> >>  * thrift (Apache Version 2.0)
> >>  * snappy (Apache Version 2.0 (up to 1.0.1)/New BSD)
> >>  * libuuid-2.26 (LGPL Version 2)
> >>  * apache hadoop (Apache Version 2.0)
> >>  * apache avro (Apache Version 2.0)
> >>  * glog (BSD)
> >>  * googlemock (BSD)
> >>
> >> Build only dependencies:
> >>  * ant (Apache Version 2.0)
> >>  * maven (Apache Version 2.0)
> >>  * cmake (BSD)
> >>
> >> Test only dependencies:
> >>  * googletest (BSD)
> >>
> >> Cryptography N/A
> >>
> >> == Required Resources ==
> >>
> >> === Mailing lists ===
> >>  * priv...@hawq.incubator.apache.org (moderated subscriptions)
> >>  * comm...@hawq.incubator.apache.org
> >>  * d...@hawq.incubator.apache.org
> >>  * iss...@hawq.incubator.apache.org
> >>  * u...@hawq.incubator.apache.org
> >>
> >> === Git Repository ===
> >> https://git-wip-us.apache.org/repos/asf/incubator-hawq.git
> >>
> >> === Issue Tracking ===
> >> JIRA Project HAWQ (HAWQ)
> >>
> >> === Other Resources ===
> >>
> >> Means of setting up regular builds for HAWQ on builds.apache.org will
> >> require integration with Docker support.
> >>
> >> == Initial Committers ==
> >>  * Lirong Jian
> >>  * Hubert Huan Zhang
> >>  * Radar Da Lei
> >>  * Ivan Yanqing Weng
> >>  * Zhanwei Wang
> >>  * Yi Jin
> >>  * Lili Ma
> >>  * Jiali Yao
> >>  * Zhenglin Tao
> >>  * Ruilong Huo
> >>  * Ming Li
> >>  * Wen Lin
> >>  * Lei Chang
> >>  * Alexander V Denissov
> >>  * Newton Alex
> >>  * Oleksandr Diachenko
> >>  * Jun Aoki
> >>  * Bhuvnesh Chaudhary
> >>  * Vineet Goel
> >>  * Shivram Mani
> >>  * Noa Horn
> >>  * Sujeet S Varakhedi
> >>  * Junwei (Jimmy) Da
> >>  * Ting (Goden) Yao
> >>  * Mohammad F (Foyzur) Rahman
> >>  * Entong Shen
> >>  * George C Caragea
> >>  * Amr El-Helw
> >>  * Mohamed F Soliman
> >>  * Venkatesh (Venky) Raghavan
> >>  * Carlos Garcia
> >>  * Zixi (Jesse) Zhang
> >>  * Michael P Schubert
> >>  * C.J. Jameson
> >>  * Jacob Frank
> >>  * Ben Calegari
> >>  * Shoabe Shariff
> >>  * Rob Day-Reynolds
> >>  * Mel S Kiyama
> >>  * Charles Alan Litzell
> >>  * David Yozie
> >>  * Ed Espino
> >>  * Caleb Welton
> >>  * Parham Parvizi
> >>  * Dan Baskette
> >>  * Christian Tzolov
> >>  * Tushar Pednekar
> >>  * Greg Chase
> >>  * Chloe Jackson
> >>  * Michael Nixon
> >>  * Roman Shaposhnik
> >>  * Alan Gates
> >>  * Owen O'Malley
> >>  * Thejas Nair
> >>  * Don Bosco Durai
> >>  * Konstantin Boudnik
> >>  * Sergey Soldatov
> >>  * Atri Sharma
> >>
> >> == Affiliations ==
> >>  * Barclays:  Atri Sharma
> >>  * Bloomberg: Justin Erenkrantz
> >>  * Hortonworks: Alan Gates, Owen O'Malley, Thejas Nair, Don Bosco Durai
> >>  * WANDisco: Konstantin Boudnik, Sergey Soldatov
> >>  * Pivotal: everyone else on this proposal
> >>
> >> == Sponsors ==
> >>
> >> === Champion ===
> >> Roman Shaposhnik
> >>
> >> === Nominated Mentors ===
> >>
> >> The initial mentors are listed below:
> >>  * Alan Gates - Apache Member, Hortonworks
> >>  * Owen O'Malley - Apache Member, Hortonworks
> >>  * Thejas Nair - Apache Member, Hortonworks
> >>  * Konstantin Boudnik - Apache Member, WANDisco
> >>  * Roman Shaposhnik - Apache Member, Pivotal
> >>  * Justin Erenkrantz - Apache Member, Bloomberg
> >>
> >> === Sponsoring Entity ===
> >> We would like to propose Apache incubator to sponsor this project.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >> For additional commands, e-mail: general-h...@incubator.apache.org
> >>
> >>
> > +1 accept HAWQ into the Apache Incubator
> >
> > --
> > Luciano Resende
> > http://people.apache.org/~lresende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [VOTE] Accept HAWQ into the Apache Incubator

Reply via email to