+1 (non binding) Would be interested in helping out with HBase integration, and Bigtop packaging.
On Thu, Feb 28, 2013 at 10:11 AM, Hyunsik Choi <hyun...@apache.org> wrote: > > Hi Folks, > > > > I'd like to call a VOTE for acceptance of Tajo into the Apache incubator. > > The vote will close on Mar 7 at 6:00 PM (PST). > > > > [] +1 Accept Tajo into the Apache incubator > > [] +0 Don't care. > > [] -1 Don't accept Tajo into the incubator because... > > > > Full proposal is pasted at the bottom on this email, and the > corresponding > > wiki is http://wiki.apache.org/incubator/TajoProposal. > > > > Only VOTEs from Incubator PMC members are binding, but all are welcome to > > express their thoughts. > > > > Thanks, > > Hyunsik > > > > PS: From the initial discussion, the main changes are that I've added 4 > new > > committers. Also, I've revised some description of Known Risks because > the > > initial committers have been diverse. > > > > ---------------- > > Tajo Proposal > > > > = Abstract = > > > > Tajo is a distributed data warehouse system for Hadoop. > > > > > > = Proposal = > > > > Tajo is a relational and distributed data warehouse system for Hadoop. > Tajo > > is designed for low-latency and scalable ad-hoc queries, online > aggregation > > and ETL on large-data sets by leveraging advanced database techniques. It > > supports SQL standards. Tajo is inspired by Dryad, MapReduce, Dremel, > > Scope, and parallel databases. Tajo uses HDFS as a primary storage layer, > > and it has its own query engine which allows direct control of > distributed > > execution and data flow. As a result, Tajo has a variety of query > > evaluation strategies and more optimization opportunities. In addition, > > Tajo will have a native columnar execution and and its optimizer. Tajo > will > > be an alternative choice to Hive/Pig on the top of MapReduce. > > > > > > = Background = > > > > Big data analysis has gained much attention in the industrial. Open > source > > communities have proposed scalable and distributed solutions for ad-hoc > > queries on big data. However, there is still room for improvement. > Markets > > need more faster and efficient solutions. Recently, some alternatives > > (e.g., Cloudera's Impala and Amazon Redshift) have come out. > > > > > > = Rationale = > > > > There are a variety of open source distributed execution engines (e.g., > > hive, and pig) running on the top of MapReduce. They are limited by MR > > framework. They cannot directly control distributed execution and data > > flow, and they just use MR framework. So, they have limited query > > evaluation strategies and optimization opportunities. It is hard for them > > to be optimized for a certain type of data processing. > > > > > > = Initial Goals = > > > > The initial goal is to write more documents to describe Tajo's internal. > It > > will be helpful to recruit more committers and to build a solid > community. > > Then, we will make milestones for short/long term plans. > > > > > > = Current Status = > > > > Tajo is in the alpha stage. Users can execute usual SQL queries (e.g., > > selection, projection, group-by, join, union and sort) except for nested > > queries. Tajo provides various row/column storage formats, such as CSV, > > RowFile (a row-store file we have implemented), RCFile, and Trevni, and > it > > also has a rudimentary ETL feature to transform one data format to > another > > data format. In addition, Tajo provides hash and range repartitions. By > > using both repartition methods, Tajo processes aggregation, join, and > sort > > queries over a number of cluster nodes. To evaluate the performance, we > > have carried out benchmark test using TPC-H 1TB on 32 cluster nodes. > > > > > > == Meritocracy == > > > > We will discuss the milestone and the future plan in an open forum. We > plan > > to encourage an environment that supports a meritocracy. The contributors > > will have different privileges according to their contributions. > > > > > > == Community == > > > > Big data analysis has gained attention from open source communities, > > industrial and academic areas. Some projects related to Hadoop already > have > > very large and active communities. We expect that Tajo also will > establish > > an active community. Since Tajo already works for some features and is in > > the alpha stage, it will attract a large community soon. > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org<javascript:;> > For additional commands, e-mail: > general-h...@incubator.apache.org<javascript:;> > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)