[PROPOSAL] Apache Spark for the Incubator

Henry Saputra Fri, 31 May 2013 13:04:36 -0700

I believe it is more of a framework but you can take a look at Shark which
using Spark to do data warehousing that support hive query (
http://shark.cs.berkeley.edu)


- Henry

On Friday, May 31, 2013, Chen, Pei wrote:

> +1 (non-binding)
> This seems like a really interesting project.
> Q- Is Spark just a framework/API or does it also have some tools
> implemented for data analytics?
> --Pei
>
> > -----Original Message-----
> > From: Mattmann, Chris A (398J) [mailto:chris.a.mattm...@jpl.nasa.gov]
> > Sent: Friday, May 31, 2013 2:04 PM
> > To: general@incubator.apache.org
> > Subject: [PROPOSAL] Apache Spark for the Incubator
> >
> > Hi Folks,
> >
> > I'm pleased to bring you a proposal to the Apache Incubator for the
> Apache
> > Spark project: https://wiki.apache.org/incubator/SparkProposal
> >
> > The work originates from the Berkeley AMPLab and through a number of
> > industry participants, and other institutions. Spark is a framework for
> large-
> > scale data analysis on clusters, with a particular focus on low latency
> > operations.
> > The
> > source code is written in Scala, and provides a number of APIs and
> bindings in
> > various programming languages.
> >
> > The proposal text is copied to the bottom of this email. I'm going to
> leave this
> > thread open for the next week for discussion. Once it's died down, I'll
> call an
> > official VOTE.
> >
> > Suresh, Ross G. -- heads up -- this project may be of interest to you
> both and
> > would welcome you guys as additional mentors. We currently have 3
> > mentors committed to the project, but would love to have more. People
> > interested in contributing should declare their interest here on the
> > general@incubator thread and those potential contributors will be
> discussed
> > by the incoming Spark community.
> >
> > Questions -- let's hear em'! :)
> >
> > Cheers,
> > Chris
> > ("Champion", incoming Apache Spark)
> >
> > === Abstract ===
> > Spark is an open source system for large-scale data analysis on clusters.
> >
> > === Proposal ===
> > Spark is an open source system for fast and flexible large-scale data
> analysis.
> > Spark provides a general purpose runtime that supports low-latency
> > execution in several forms. These include interactive exploration of very
> > large datasets, near real-time stream processing, and ad-hoc SQL
> analytics
> > (through higher layer extensions). Spark interfaces with HDFS, HBase,
> > Cassandra and several other storage storage layers, and exposes APIs in
> > Scala, Java and Python.
> > Background
> > Spark started as U.C. Berkeley research project, designed to efficiently
> run
> > machine learning algorithms on large datasets. Over time, it has evolved
> into
> > a general computing engine as outlined above. Spark¹s developer community
> > has also grown to include additional institutions, such as universities,
> > research labs, and corporations. Funding has been provided by various
> > institutions including the U.S. National Science Foundation, DARPA, and a
> > number of industry sponsors. See:
> > https://amplab.cs.berkeley.edu/sponsors/ for full details.
> >
> > === Rationale ===
> > As the number of contributors to Spark has grown, we have sought for a
> > long-term home for the project, and we believe the Apache foundation
> > would be a great fit. Spark is a natural fit for the Apache foundation:
> Spark
> > already interoperates with several existing Apache projects (HDFS, HBase,
> > Hive, Cassandra, Avro and Flume to name a few). The Spark team is
> familiar
> > with the Apache process and and subscribes to the Apache mission - the
> > team includes multiple Apache committers already. Finally, joining Apache
> > will help coordinate the development effort of the growing number of
> > organizations which contribute to Spark.
> >
> > == Initial Goals ==
> > The initial goals will most likely be to move the existing codebase to
> Apache
> > and integrate with the Apache development process. Furthermore, we plan
> > for incremental development, and releases along with the Apache
> > guidelines.
> >
> > === Current Status ===
> > == Meritocracy ==
> > The Spark project already operates on meritocratic principles. Today,
> Spark
> > has several developers and has accepted multiple major patches from
> > outside of U.C. Berkeley. While this process has remained mostly informal
> > (we do not have an official committer list), an implicit organization
> exists in
> > which individuals who contribute major components act as maintainers for
> > those modules. If accepted, the Spark project would include several of
> these
> > participants as committers from the onset. We will work to identify all
> > committers and PPMC members for the project and to operate under the
> > ASF meritocratic principles.
> >
> > === Community ===
> > Acceptance into the Apache foundation would bolster the already strong
> > user and developer community around Spark. That community includes
> > dozens of contributors from several institutions, a meetup group with
> > several hundred members, and an active mailing list composed of hundreds
> > of users.
> > Core Developers
> > The core developers of our project are listed in our contributors and
> initial
> > PPMC below. Though many exist at UC Berkeley, there is a representative
> > cross sampling of other organizations including Quantifind, Microsoft,
> Yahoo!,
> > ClearStory Data, Bizo, Intel, Tagged and Webtrends.
> >
> >
> > === Alignment ===
> > Our proposed ef

[PROPOSAL] Apache Spark for the Incubator

Reply via email to