Re: [VOTE] Accept Gora into the Apache Incubator

Craig L Russell Wed, 22 Sep 2010 09:30:09 -0700

+1

Craig


On Sep 19, 2010, at 8:21 PM, Mattmann, Chris A (388J) wrote:

Hi Folks,
Over the past week or so we've been discussing the Gora project andbringingit into the Apache Incubator [1]. It's time to call a VOTE thread onthe
issue. Please VOTE below:

[ ] +1 Accept Gora into the Apache Incubator.
[ ] +0 Don't care.
[ ] -1 Don't accept Gora into the Apache Incubator because...
I'll leave the VOTE open for the remainder of the week (ending on9/24).
Here's my +1 (IPMC binding).

[1] http://s.apache.org/MPw

Cheers,
Chris

P.S. The wiki text for the proposal is pasted below.

----------
Gora Proposal for Apache Incubation

Abstract
Gora is an ORM framework for column stores such as Apache HBase andApache
Cassandra with a specific focus on Hadoop.

Proposal
Although there are various excellent ORM frameworks for relational
databases, data modeling in NoSQL data stores differ profoundly fromtheirrelational cousins. Moreover, data-model agnostic frameworks such asJDO arenot sufficient for use cases, where one needs to use the full powerof thedata models in column stores. Gora fills this gap by giving the useraneasy-to-use ORM framework with data store specific mappings andbuilt in
Apache Hadoop support.
The overall goal for Gora is to become the standard datarepresentation andpersistence framework for big data. The roadmap of Gora can begrouped as
follows.
* Data Persistence : Persisting objects to Column stores such asHBase,Cassandra, Hypertable; key-value stores such as Voldermort, Redis,etc; SQLdatabases, such as MySQL, HSQLDB, flat files in local file system ofHadoop
HDFS.
* Data Access : An easy to use Java-friendly common API foraccessing the
data regardless of its location.
* Indexing : Persisting objects to Lucene and Solr indexes,
accessing/querying the data with Gora API.
* Analysis : Accesing the data and making analysis through adaptersfor
Apache Pig, Apache Hive and Cascading
* MapReduce <http://wiki.apache.org/incubator/MapReduce>  support :
Out-of-the-box and extensive MapReduce
<http://wiki.apache.org/incubator/MapReduce> (Apache Hadoop)support for
data in the data store.

Background
ORM stands for Object Relation Mapping. It is a technology whichabstactsthe persistency layer (mostly Relational Databases) so that plaindomainlevel objects can be used, without the cumbersome effort to save/load thedata to and from the database. Gora differs from current solutionsin that:* Gora is specially focussed at NoSQL data stores, but also haslimited
support for SQL databases
* The main use case for Gora is to access/analyze big data usingHadoop.
* Gora uses Avro for bean definition, not byte code enhancement or
annotations
* Object-to-data store mappings are backend specific, so that fulldata
model can be utilized.
* Gora is simple since it ignores complex SQL mappings
* Gora will support persistence, indexing and anaysis of data, usingPig,
Lucene, Hive, etc
Rationale
ORM frameworks are nothing new. But with the explosion of datagenerated inTerabytes and even Petabytes, NoSQL data stores are gaining ever-increasingpopularity. Coupled with limited support to already-proven ApacheHadoop
support in current ORM frameworks, there was a need for a new project.
Gora is currently hosted at Github. However, Gora has ties to ASF inmanyways. As detailed in the proposal section, Gora will be a high levelclientfor many Apache projects and subprojects including Hadoop(common,hdfs, and
mapreduce), HBase, Cassandra, Avro, Lucene, Solr, Pig, and Hive. Gora
already uses Hadoop, HBase, Cassandra and Avro. Moreover, Gorastarted its
life inside Apache Nutch project, and now Nutch trunk uses Gora as a
library. Even more, the initial set of committers are all ASF members.
Therefore, we think that Apache will be an excellent home for Gora.

Initial Goals
Initial goals for Gora can be summarized as:
* Iron out the remaining issues with HBase, Cassandra and SQL support.
* Make the first release before the end of the year.
* Improve documentation
* Support for Cascading
Current Status
Meritocracy
Current commit rights belong to the initial list of committers fourof whoare also ASF members. All the developers have extensive experiencewith
Apache projects. We honor the meritocracy policy of ASF foundation.

Community
Gora’s community mostly overlap with that of Nutch, Hadoop, HBase,Avro andCassandra. We have a small community for now (5 initial committers,18people tracking the project at Github), but have been piggybackingthe Nutchcommunity for a while. If Gora is accepted to Apache Incubator, weexpectmore traction. Moreover, with the increasing popularity of NoSQLdatabases,
we expect more users.

Core Developers
Gora was started by the initial code base inside Apache Nutch byDoğacanGüney. Then Enis Söztutar has refactored and re-architected theproject outof Nutch. Later Julien Nioche, Andrzej Bialecki and Doğacan hasported Nutchto use the newly formed project. Later, Sertan Alkan has joined.Doğacan andJulien are Nutch PMC members, Andrzej is the Nutch PMC chair. Enisis an
Apache Hadoop PMC member.

Alignment
As discusssed in the second paragraph of Rationale Section, all of the
current developers are Apache people, and four of them are PMCmembers,which shows that we have some experience with the Apache way.Moreover, Gora
is tightly related with lots of Apache projects, Nutch, Hadoop, HBase,
Cassandra, Avro, Pig, Hive, Lucene to name a few. Gora has startedits lifeinside Nutch, and now nutch trunk uses Gora to persist web crawldata to
HBase, Cassandra and MySQL, which means that Gora is a very critical
component in Nutch.

Known Risks
Orphaned Products
Most of the development depends on Enis and Doğacan for now. Both ofthemintent to continue Gora development. However, we also acknowledgethat more
core developers are needed for the project to be truly successful. The
general strategy to acquire more developers will be to acquire moreusers,
and encourage users to be active in the community and develop patches.
Moreover, the next release of Nutch planned before the end of 2010 has
extensive Gora support. We expect more interest from Nutchcommunity, and wewill continue to announce Gora notifications at Hadoop,HBase andCassandra
mailing lists.

Inexperience with Open Source
We believe that all of the developers have extensive open sourceexperience.Four of the initial committers are apache members. The codebase isalso opensource since April 2010. We also have some documentation, wikipages, issue
tracker and dev mailing list.

Homogeneous Developers
We have a semi-distributed development environment where Doğacan,Enis andSertan share the same office, but Andrzej and Julien areindependent. With
the aim of acquiring more developers, we expect more heterogeneous
development.

Reliance on Salaried Developers
Gora development have been supported by ant.com
<http://wiki.apache.org/incubator/ant.com> search engine ascontract work.It is expected that this contract will continue in the future.However, evenwithout sponsors, we are commited to continue on Gora development,since webelieve in the technology it brings and it’s vital role in Nutch,and our
other closed sourced projects.

Relationships with Other Apache Products
Gora will be tightly related to lots of Apache projects:
*
* Nutch : Apache nutch was to home to Gora’s initial code base.Now, Nutchtrunk uses Gora as a library. The next relase of Nutch, plannedbefore the
end of 2010 will be using Gora’s first release.
*
* Hadoop : Gora has extensive support for Hadoop MapReduce
<http://wiki.apache.org/incubator/MapReduce> Gora defines all thenecessarydata structures for working with Hadoop .Data stored in columnoriented data
stores can be analyzed  with Gora using Hadoop.
*
* Avro : Gora uses and extends Avro. Data beans in Gora are definedusingAvro schemas ,and compiled into Java code with the extended versionof the
Avro compiler. Avro is also used in data serialization.
*
* HBase : Gora supports HBase as a persistency backend.
*
* Cassandra : Gora support Cassandra as a persistency backend.
*
* Lucene/Solr : Gora intends to support Lucene/Solr as a persistencyand
indexing backend.
*
* Pig : Gora intends to support Pig for data anaysis
*
* Hive :  Gora intends to support Hive for data analysis

An Excessive Fascination with the Apache Brand
Gora is a natural fit for Apache due to it's current commiters anddepending
projects.

Documentation
* The project is currently hosted at http://github.com/enis/gora/.
*
* Wiki pages can be found at http://wiki.github.com/enis/gora/.
*
* List of issues can be found at  http://github.com/enis/gora/issues/.
*
* Current web address: http://groups.google.com/group/gora-dev.
*
* Current email address: gora-...@googlegroups.com.

Initial Source
The initial source was developed as a patch to the Apache Nutchproject. Butthe storage abstraction layer was orthogonal to the web crawler, andwedecided to extract it to a separate project with much wider goals.ThusGora, as a project, was born. The initial code is developed by Enisand
Dogacan with ant.com’s sponsorship.

The code can be found at http://github.com/enis/gora/.

External Dependencies
External dependencies excluding Apache projects are as follows
*
* JDOM - http://jdom.org/ -  Apache-style license
*
* SQL Builder - http://openhms.sourceforge.net/sqlbuilder/ - Artistic
License, LGPL. SQL Builder is intended to be removed from the sourcedue to
technical reasons anyway.
*
* HSQLDB - http://hsqldb.org/ - BSD-style license
*
* JUnit - http://junit.org - Common Public License 1.0
*
* SLF4J - http://www.slf4j.org/ - MIT License
*
* Google Guava Libraries - http://code.google.com/p/guava-libraries/ -
Apache License 2.0

Required Resources
Mailing Lists
* gora-private (with moderated subscriptions)
* gora-dev
* gora-commits
Subversion Directory
* http://svn.apache.org/repos/asf/incubator/gora

Issue Tracking
* JIRA (GORA)
Other Resources
We need a wiki at http://wiki.apache.org. Currently, we have a wiki at
Github, Since there is not a lot of pages there, we can manuallymove the
pages to the wiki at wiki.apache.org.

Initial Committers
*    Name         email                  Affiliation   Timezone
* Enis Söztutar enis [at] apache.orgKonneka +3* Doğacan Güney dogacan [at] apache.orgKonneka +3* Sertan Alkan sertanalkan [at] gmail.comKonneka +3
*    Julien Nioche      jnioche [at] apache.org      DigitalPebble
<http://wiki.apache.org/incubator/DigitalPebble>        +1
*    Andrzej Bialecki   ab [at] apache.org           Sigram
* Andrew Hart ahart [at] apache.org NASAJPL -8* Dave Woollard woollard [at] apache.org NASAJPL -8* Henry Saputra hsaputra [at] apache.orgYahoo! -8
Affiliations
All of the parties are affiliated with companies and organizationsthat arefamiliar with the development of open source . Most of the originalGoradevelopment was sponsored by ant.com, however we expect that theamount of
volunteer work will increase, and more developers will come on board.

Sponsors
Champion
* Chris Mattmann (mattmann AT apache DOT org)
Nominated Mentors
* Chris Mattmann (mattmann AT apache DOT org)
* Andrzej Bialecki (ab AT apache DOT org )
* Tom White (tomwhite AT apache DOT org)
Sponsoring Entity
Apache Incubator. Successful graduation can result in either being aTLP,or a subproject of Hadoop, since most of the community is projectedto
overlap.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Craig L Russell
Architect, Oracle
http://db.apache.org/jdo
408 276-5638 mailto:craig.russ...@oracle.com
P.S. A good JDO? O, Gasp!


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [VOTE] Accept Gora into the Apache Incubator

Reply via email to