I believe it is more of a framework but you can take a look at Shark which using Spark to do data warehousing that support hive query ( http://shark.cs.berkeley.edu)
- Henry On Friday, May 31, 2013, Chen, Pei wrote: > +1 (non-binding) > This seems like a really interesting project. > Q- Is Spark just a framework/API or does it also have some tools > implemented for data analytics? > --Pei > > > -----Original Message----- > > From: Mattmann, Chris A (398J) [mailto:chris.a.mattm...@jpl.nasa.gov] > > Sent: Friday, May 31, 2013 2:04 PM > > To: general@incubator.apache.org > > Subject: [PROPOSAL] Apache Spark for the Incubator > > > > Hi Folks, > > > > I'm pleased to bring you a proposal to the Apache Incubator for the > Apache > > Spark project: https://wiki.apache.org/incubator/SparkProposal > > > > The work originates from the Berkeley AMPLab and through a number of > > industry participants, and other institutions. Spark is a framework for > large- > > scale data analysis on clusters, with a particular focus on low latency > > operations. > > The > > source code is written in Scala, and provides a number of APIs and > bindings in > > various programming languages. > > > > The proposal text is copied to the bottom of this email. I'm going to > leave this > > thread open for the next week for discussion. Once it's died down, I'll > call an > > official VOTE. > > > > Suresh, Ross G. -- heads up -- this project may be of interest to you > both and > > would welcome you guys as additional mentors. We currently have 3 > > mentors committed to the project, but would love to have more. People > > interested in contributing should declare their interest here on the > > general@incubator thread and those potential contributors will be > discussed > > by the incoming Spark community. > > > > Questions -- let's hear em'! :) > > > > Cheers, > > Chris > > ("Champion", incoming Apache Spark) > > > > === Abstract === > > Spark is an open source system for large-scale data analysis on clusters. > > > > === Proposal === > > Spark is an open source system for fast and flexible large-scale data > analysis. > > Spark provides a general purpose runtime that supports low-latency > > execution in several forms. These include interactive exploration of very > > large datasets, near real-time stream processing, and ad-hoc SQL > analytics > > (through higher layer extensions). Spark interfaces with HDFS, HBase, > > Cassandra and several other storage storage layers, and exposes APIs in > > Scala, Java and Python. > > Background > > Spark started as U.C. Berkeley research project, designed to efficiently > run > > machine learning algorithms on large datasets. Over time, it has evolved > into > > a general computing engine as outlined above. Spark¹s developer community > > has also grown to include additional institutions, such as universities, > > research labs, and corporations. Funding has been provided by various > > institutions including the U.S. National Science Foundation, DARPA, and a > > number of industry sponsors. See: > > https://amplab.cs.berkeley.edu/sponsors/ for full details. > > > > === Rationale === > > As the number of contributors to Spark has grown, we have sought for a > > long-term home for the project, and we believe the Apache foundation > > would be a great fit. Spark is a natural fit for the Apache foundation: > Spark > > already interoperates with several existing Apache projects (HDFS, HBase, > > Hive, Cassandra, Avro and Flume to name a few). The Spark team is > familiar > > with the Apache process and and subscribes to the Apache mission - the > > team includes multiple Apache committers already. Finally, joining Apache > > will help coordinate the development effort of the growing number of > > organizations which contribute to Spark. > > > > == Initial Goals == > > The initial goals will most likely be to move the existing codebase to > Apache > > and integrate with the Apache development process. Furthermore, we plan > > for incremental development, and releases along with the Apache > > guidelines. > > > > === Current Status === > > == Meritocracy == > > The Spark project already operates on meritocratic principles. Today, > Spark > > has several developers and has accepted multiple major patches from > > outside of U.C. Berkeley. While this process has remained mostly informal > > (we do not have an official committer list), an implicit organization > exists in > > which individuals who contribute major components act as maintainers for > > those modules. If accepted, the Spark project would include several of > these > > participants as committers from the onset. We will work to identify all > > committers and PPMC members for the project and to operate under the > > ASF meritocratic principles. > > > > === Community === > > Acceptance into the Apache foundation would bolster the already strong > > user and developer community around Spark. That community includes > > dozens of contributors from several institutions, a meetup group with > > several hundred members, and an active mailing list composed of hundreds > > of users. > > Core Developers > > The core developers of our project are listed in our contributors and > initial > > PPMC below. Though many exist at UC Berkeley, there is a representative > > cross sampling of other organizations including Quantifind, Microsoft, > Yahoo!, > > ClearStory Data, Bizo, Intel, Tagged and Webtrends. > > > > > > === Alignment === > > Our proposed ef