+1 Sounds like a welcome addition to the Hadoop space.
LieGrue, strub --- On Wed, 2/23/11, Alan Gates <ga...@yahoo-inc.com> wrote: > From: Alan Gates <ga...@yahoo-inc.com> > Subject: [VOTE] Accept Howl as an Incubator Project > To: general@incubator.apache.org > Date: Wednesday, February 23, 2011, 12:20 AM > I would like to call a vote on > accepting Howl as an Incubator project. The proposal > is available at http://wiki.apache.org/incubator/HowlProposal. > You can see the discussion from the proposal thread at > http://tinyurl.com/5w7y9p9. > > Alan. > > ---------------------- > > Abstract > Howl is a table and storage management service for data > created using Apache Hadoop. > > > Proposal > The vision of Howl is to provide table management and > storage management layers for Apache Hadoop. This includes: > > • Providing a shared schema and data > type mechanism. > • Providing a table abstraction so > that users need not be concerned with where or how their > data is stored. > • Providing interoperability across > data processing tools such as Pig, Map Reduce, Streaming, > and Hive. > > Background > Data processors using Apache Hadoop have a common need for > table management services. The goal of a table management > service is to track data that exists in a Hadoop grid and > present that data to users in a tabular format. Such a table > management service needs to provide a single input and > output format to users so that individual users need not be > concerned with the storage formats that are chosen for > particular data sets. As part of having a single format, the > data will need to be described by one type of schema and > have a single datatype system. > > Additionally, users should be free to choose the best tools > for their use cases. The Hadoop project includes Map Reduce, > Streaming, Pig, and Hive, and additional tools exist such as > Cascading. Each of these tools has users who prefer it, and > there are use cases best addressed by each of these tools. > Two users on the same grid who need to share data should not > be constrained to use the same tool but rather should be > free to choose the best tool for their use case. A table > management service that presents data in the same way to all > of the tools can alleviate this problem by providing > interfaces to each of the data processing tools. > > There are also a few other features a table management > service should provide, such as notification of when data > arrives. > > A couple of developers at Yahoo! started the project. It is > based on the Hive MetaStore component. There is good amount > of interest in such a service expressed from Yahoo!, > Facebook, LinkedIn, and, others. We are therefore proposing > to place Howl in the Apache incubator and to build an open > source community around it. > > > Rationale > There is a strong need for a table management service, > especially for large grids with petabytes of data, and where > the data volume is increasing by the day. Hadoop users need > to find data to read and have a place to store their > data. Currently users must understand the location of data > to read, the storage format, compression techniques used, > etc. To write data they need to understand where on HDFS > their data belongs, the best compression format to use, how > their data should be serialized, etc. > > Most users do not want to be concerned with these issues. > They want these managed for them. > > Having it as an Apache Open Source project will highly > benefit Howl from the point of view of getting a large > community that currently uses Hadoop and the other products > built around Hadoop (like Pig, Hive, etc.). Users of the > Hadoop ecosystem can influence Howl’s roadmap, and > contribute to it. Looking at it in another way, we believe > having Howl as part of the Hadoop ecosystem will be a great > benefit to the current Hadoop/Pig/Hive community too. > > > Current Status > > Meritocracy > Our intent with this incubator proposal is to start > building a diverse developer community around Howl following > the Apache meritocracy model. We have wanted to make the > project open source and encourage contributors from multiple > organizations from the start. We plan to provide plenty of > support to new developers and to quickly recruit those who > make solid contributions to committer status. > > > Community > Howl is currently being used by developers at Yahoo! and > there has been an expressed interest from LinkedIn and > Facebook. Yahoo! also plans to deploy the current version of > Howl in production soon. We hope to extend the user and > developer base further in the future. The current developers > and users are all interested in building a solid open source > community around Howl. > > To work towards an open source community, we have started > using the GitHub issue tracker and mailing lists at Yahoo! > for development discussions within our group. > > > Core Developers > Howl is currently being developed by four engineers from > Yahoo! - Devaraj Das, Ashutosh Chauhan, Sushanth Sowmyan, > and Mac Yang. All the engineers have deep expertise in > Hadoop and the Hadoop Ecosystem in general. > > > Alignment > The ASF is a natural host for Howl given that it is already > the home of Hadoop, Pig, HBase, Cassandra, and other > emerging cloud software projects. Howl was designed to > support Hadoop from the beginning in order to solve data > management challenges in Hadoop clusters. Howl complements > the existing Apache cloud computing projects by providing a > unified way to manage data. > > > Known Risks > > Orphaned Products > The core developers plan to work full time on the project. > There is very little risk of Howl getting orphaned since > large companies like Yahoo! are planning to deploy this in > their production Hadoop clusters. We believe we can build an > active developer community around Howl (companies like > Facebook and LinkedIn have also expressed interest). > > > Inexperience with Open Source > All of the core developers are active users and followers > of open source. Devaraj Das is an Apache Hadoop committer > and Apache Hadoop PMC member, and has experience with the > Apache infrastructure and development process. Ashutosh > Chauhan is an Apache Pig committer and Apache Pig PMC > member. Sushanth Sowmyan and Mac Yang made contributions to > the Apache Hive and the Apache Chukwa projects. > > > Homogeneous Developers > The current core developers are all from Yahoo! However, we > hope to establish a developer community that includes > contributors from several corporations, and we are starting > to work towards this with Facebook and LinkedIn. > > > Reliance on Salaried Developers > Currently, the developers are paid to do work on Howl. > However, once the project has a community built around it, > we expect to get committers and developers from outside the > current core developers. Companies like Yahoo! are invested > in Howl being a solution to the data management problem in > Hadoop clusters, and that is not likely to change. > > > Relationships with Other Apache Products > Howl is going to be used by users of Hadoop, Pig, and Hive. > See section Initial Source below for more information about > Howl's relationship to Hive. > > > An Excessive Fascination with the Apache Brand > While we respect the reputation of the Apache brand and > have no doubts that it will attract contributors and users, > our interest is primarily to give Howl a solid home as an > open source project following an established development > model. We have also given reasons in the Rationale and > Alignment sections. > > > Documentation > Information about Howl can be found at http://wiki.apache.org/pig/Howl. The > following sources > may be useful to start with: > > • > The GitHub site: https://github.com/yahoo/howl > > • > The roadmap: http://wiki.apache.org/pig/HowlJournal > > > Initial Source > Howl has been under development since Summer 2010 by a team > of engineers in Yahoo!. It is currently hosted on GitHub > under an Apache license at https://github.com/yahoo/howl. > > The initial development of Howl has consisted of: > > • maintaining a branch of the entire > Hive codebase > • getting Howl-related patches > committed to Hive > • developing Howl-specific plugins and > wrappers to customize Hive behavior > At runtime, Howl executes Hive code for metastore and > CLI+DDL, disabling anything related to Hadoop map/reduce > execution. It also makes use of the RCFile storage format > contained in Hive. > > This approach was taken as a first step in order to > validate the required functionality and get a production > version working. However, in the long-term, maintaining a > clone of Hive is undesirable. One possible resolution is to > factor the metastore+CLI+DDL components out of Hive and move > them into Howl (making Hive dependent on Howl). Another > possible resolution is to remove the copy of Hive from Howl > and do the build/release engineering necessary to make Howl > depend on Hive. As part of the incubation process, we plan > to work towards resolution of these issues. > > > External Dependencies > The dependencies all have Apache compatible licenses. > > > Cryptography > Not applicable. > > > Required Resources > > Mailing Lists > • howl-private for private PMC > discussions (with moderated subscriptions) > • howl-dev > • howl-commits > • howl-user > > Subversion Directory > https://svn.apache.org/repos/asf/incubator/howl > > > Issue Tracking > JIRA Howl (HOWL) > > > Other Resources > The existing code already has unit tests, so we would like > a Hudson instance to run them whenever a new patch is > submitted. This can be added after project creation. > > > Initial Committers > • Devaraj Das > • Ashutosh Chauhan > • Sushanth Sowmyan > • Mac Yang > • Paul Yang > • Alan Gates > A CLA is already on file for Sushanth. > > > Affiliations > • Devaraj Das (Yahoo!) > • Ashutosh Chauhan (Yahoo!) > • Sushanth Sowmyan (Yahoo!) > • Mac Yang (Yahoo!) > • Paul Yang (Facebook) > • Alan Gates (Yahoo!) > > Sponsors > > Champion > Owen O’Malley > > > Nominated Mentors > • Olga Natkovich (Pig PMC member and > Apache VP for Pig) > • Alan Gates (Pig PMC member) > • John Sichi (Hive PMC member) > > Sponsoring Entity > We are requesting the Incubator to sponsor this project. > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org