Re: [VOTE] Accept Tajo into the Apache Incubator

Andrew Purtell Sat, 02 Mar 2013 06:06:00 -0800

+1 (non binding)

Would be interested in helping out with HBase integration, and Bigtop
packaging.



On Thu, Feb 28, 2013 at 10:11 AM, Hyunsik Choi <hyun...@apache.org> wrote:
> > Hi Folks,
> >
> > I'd like to call a VOTE for acceptance of Tajo into the Apache incubator.
> > The vote will close on Mar 7 at 6:00 PM (PST).
> >
> > [] +1 Accept Tajo into the Apache incubator
> > [] +0 Don't care.
> > [] -1 Don't accept Tajo into the incubator because...
> >
> > Full proposal is pasted at the bottom on this email, and the
> corresponding
> > wiki is http://wiki.apache.org/incubator/TajoProposal.
> >
> > Only VOTEs from Incubator PMC members are binding, but all are welcome to
> > express their thoughts.
> >
> > Thanks,
> > Hyunsik
> >
> > PS: From the initial discussion, the main changes are that I've added 4
> new
> > committers. Also, I've revised some description of Known Risks because
> the
> > initial committers have been diverse.
> >
> > ----------------
> > Tajo Proposal
> >
> > = Abstract =
> >
> > Tajo is a distributed data warehouse system for Hadoop.
> >
> >
> > = Proposal =
> >
> > Tajo is a relational and distributed data warehouse system for Hadoop.
> Tajo
> > is designed for low-latency and scalable ad-hoc queries, online
> aggregation
> > and ETL on large-data sets by leveraging advanced database techniques. It
> > supports SQL standards. Tajo is inspired by Dryad, MapReduce, Dremel,
> > Scope, and parallel databases. Tajo uses HDFS as a primary storage layer,
> > and it has its own query engine which allows direct control of
> distributed
> > execution and data flow. As a result, Tajo has a variety of query
> > evaluation strategies and more optimization opportunities. In addition,
> > Tajo will have a native columnar execution and and its optimizer. Tajo
> will
> > be an alternative choice to Hive/Pig on the top of MapReduce.
> >
> >
> > = Background =
> >
> > Big data analysis has gained much attention in the industrial. Open
> source
> > communities have proposed scalable and distributed solutions for ad-hoc
> > queries on big data. However, there is still room for improvement.
> Markets
> > need more faster and efficient solutions. Recently, some alternatives
> > (e.g., Cloudera's Impala and Amazon Redshift) have come out.
> >
> >
> > = Rationale =
> >
> > There are a variety of open source distributed execution engines (e.g.,
> > hive, and pig) running on the top of MapReduce. They are limited by MR
> > framework. They cannot directly control distributed execution and data
> > flow, and they just use MR framework. So, they have limited query
> > evaluation strategies and optimization opportunities. It is hard for them
> > to be optimized for a certain type of data processing.
> >
> >
> > = Initial Goals =
> >
> > The initial goal is to write more documents to describe Tajo's internal.
> It
> > will be helpful to recruit more committers and to build a solid
> community.
> > Then, we will make milestones for short/long term plans.
> >
> >
> > = Current Status =
> >
> > Tajo is in the alpha stage. Users can execute usual SQL queries (e.g.,
> > selection, projection, group-by, join, union and sort) except for nested
> > queries. Tajo provides various row/column storage formats, such as CSV,
> > RowFile (a row-store file we have implemented), RCFile, and Trevni, and
> it
> > also has a rudimentary ETL feature to transform one data format to
> another
> > data format. In addition, Tajo provides hash and range repartitions. By
> > using both repartition methods, Tajo processes aggregation, join, and
> sort
> > queries over a number of cluster nodes. To evaluate the performance, we
> > have carried out benchmark test using TPC-H 1TB on 32 cluster nodes.
> >
> >
> > == Meritocracy ==
> >
> > We will discuss the milestone and the future plan in an open forum. We
> plan
> > to encourage an environment that supports a meritocracy. The contributors
> > will have different privileges according to their contributions.
> >
> >
> > == Community ==
> >
> > Big data analysis has gained attention from open source communities,
> > industrial and academic areas. Some projects related to Hadoop already
> have
> > very large and active communities. We expect that Tajo also will
> establish
> > an active community. Since Tajo already works for some features and is in
> > the alpha stage, it will attract a large community soon.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org<javascript:;>
> For additional commands, e-mail: 
> general-h...@incubator.apache.org<javascript:;>
>
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [VOTE] Accept Tajo into the Apache Incubator

Reply via email to