Re: [PROPOSAL] Gora to enter Incubator

Tom White Tue, 14 Sep 2010 15:37:16 -0700

+1 Sounds very interesting. I'd be happy to help out as a mentor.

Cheers,
Tom


On Mon, Sep 13, 2010 at 6:10 AM, Enis Soztutar <enis.soz.nu...@gmail.com> wrote:
> Hi all,
>
> We would like to announce the Proposal for Gora, an ORM for Colum Stores,
> for the Apache Incubation. We believe that Gora can find a nice home at
> Apache.
>
> Wiki of the proposal can be found at
> http://wiki.apache.org/incubator/GoraProposal
>
> The proposal is as below.
>
>
> = Gora Proposal for Apache Incubation =
>
> == Abstract ==
> Gora is an ORM framework for column stores such as Apache HBase and Apache
> Cassandra with a specific focus on Hadoop.
>
> == Proposal ==
> Although there are various excellent ORM frameworks for relational
> databases, data modeling in NoSQL data stores differ profoundly from their
> relational cousins. Moreover, data-model agnostic frameworks such as JDO are
> not sufficient for use cases, where one needs to use the full power of the
> data models in column stores. Gora fills this gap by giving the user an
> easy-to-use ORM framework with data store specific mappings and built in
> Apache Hadoop support.
>
> The overall goal for Gora is to become the standard data representation and
> persistence framework for big data. The roadmap of Gora can be grouped as
> follows.
>
>  * Data Persistence : Persisting objects to Column stores such as HBase,
> Cassandra, Hypertable; key-value stores such as Voldermort, Redis, etc; SQL
> databases, such as MySQL, HSQLDB, flat files in local file system of Hadoop
> HDFS.
>  * Data Access : An easy to use Java-friendly common API for accessing the
> data regardless of its location.
>  * Indexing : Persisting objects to Lucene and Solr indexes,
> accessing/querying the data with Gora API.
>  * Analysis : Accesing the data and making analysis through adapters for
> Apache Pig, Apache Hive and Cascading
>  * MapReduce support : Out-of-the-box and extensive MapReduce (Apache
> Hadoop) support for data in the data store.
>
> == Background ==
> ORM stands for Object Relation Mapping. It is a technology which abstacts
> the persistency layer
> (mostly Relational Databases) so that plain domain level objects can be
> used, without the cumbersome effort to save/load the data to and from the
> database. Gora differs from current solutions in that:
>  * Gora is specially focussed at NoSQL data stores, but also has limited
> support for SQL databases
>  * The main use case for Gora is to access/analyze big data using Hadoop.
>  * Gora uses Avro for bean definition, not byte code enhancement or
> annotations
>  * Object-to-data store mappings are backend specific, so that full data
> model can be utilized.
>  * Gora is simple since it ignores complex SQL mappings
>  * Gora will support persistence, indexing and anaysis of data, using Pig,
> Lucene, Hive, etc
>
> == Rationale ==
> ORM frameworks are nothing new. But with the explosion of data generated in
> Terabytes and even Petabytes, NoSQL data stores are gaining ever-increasing
> popularity. Coupled with limited support to already-proven Apache Hadoop
> support in current ORM frameworks, there was a need for a new project.
>
> Gora is currently hosted at Github. However, Gora has ties to ASF in many
> ways. As detailed in the proposal section, Gora will be a high level client
> for many Apache projects and subprojects including Hadoop(common, hdfs, and
> mapreduce), HBase, Cassandra, Avro, Lucene, Solr, Pig, and Hive. Gora
> already uses Hadoop, HBase, Cassandra and Avro. Moreover, Gora started its
> life inside Apache Nutch project, and now Nutch trunk uses Gora as a
> library. Even more, the initial set of committers are all ASF members.
> Therefore, we think that Apache will be an excellent home for Gora.
>
> == Initial Goals ==
> Initial goals for Gora can be summarized as:
>  * Iron out the remaining issues with HBase, Cassandra and SQL support.
>  * Make the first release before the end of the year.
>  * Improve documentation
>  * Support for Cascading
>
> == Current Status ==
> === Meritocracy ===
> Current commit rights belong to the initial list of committers four of who
> are also ASF members. All the developers have extensive experience with
> Apache projects. We honor the meritocracy policy of ASF foundation.
>
> === Community ===
> Gora’s community mostly overlap with that of Nutch, Hadoop, HBase, Avro and
> Cassandra. We
> have a small community for now (5 initial committers, 18 people tracking the
> project at Github), but have been piggybacking the Nutch community for a
> while. If Gora is accepted to Apache Incubator, we expect more traction.
> Moreover, with the increasing popularity of NoSQL databases, we expect more
> users.
>
> === Core Developers ===
> Gora was started by the initial code base inside Apache Nutch by Doğacan
> Güney. Then Enis Söztutar has refactored and re-architected the project out
> of Nutch. Later Julien Nioche, Andrzej Bialecki and Doğacan has ported Nutch
> to use the newly formed project. Later, Sertan Alkan has joined. Doğacan and
> Julien are Nutch PMC members, Andrzej is the Nutch PMC chair. Enis is an
> Apache Hadoop PMC member.
>
> === Alignment ===
> As discusssed in the second paragraph of Rationale Section, all of the
> current developers are Apache people, and four of them are PMC members,
> which shows that we have some experience with the Apache way. Moreover, Gora
> is tightly related with lots of Apache projects, Nutch, Hadoop, HBase,
> Cassandra, Avro, Pig, Hive, Lucene to name a few. Gora has started its life
> inside Nutch, and now nutch trunk uses Gora to persist web crawl data to
> HBase, Cassandra and MySQL, which means that Gora is a very critical
> component in Nutch.
>
> == Known Risks ==
> === Orphaned Products ===
> Most of the development depends on Enis and Doğacan for now. Both of them
> intent to continue Gora development. However, we also acknowledge that more
> core developers are needed for the project to be truly successful. The
> general strategy to acquire more developers will be to acquire more users,
> and encourage users to be active in the community and develop patches.
> Moreover, the next release of Nutch planned before the end of 2010 has
> extensive Gora support. We expect more interest from Nutch community, and we
> will continue to announce Gora notifications at Hadoop,HBase and Cassandra
> mailing lists.
>
> === Inexperience with Open Source ===
> We believe that all of the developers have extensive open source experience.
> Four of the initial committers are apache members. The codebase is also open
> source since April 2010. We also have some documentation, wiki pages, issue
> tracker and dev mailing list.
>
> === Homogeneous Developers ===
> We have a semi-distributed development environment where Doğacan, Enis and
> Sertan share the same office, but Andrzej and Julien are independent. With
> the aim of acquiring more developers, we expect more heterogeneous
> development.
>
> === Reliance on Salaried Developers ===
> Gora development have been supported by [[ant.com]]  search engine as
> contract work. It is expected that this contract will continue in the
> future. However, even without sponsors, we are commited to continue on Gora
> development, since we believe in the technology it brings and it’s vital
> role in Nutch, and our other closed sourced projects.
>
> === Relationships with Other Apache Products ===
> Gora will be tightly related to lots of Apache projects:
>
>  * Nutch : Apache nutch was to home to Gora’s initial code base. Now, Nutch
> trunk uses Gora as a library. The next relase of Nutch, planned before the
> end of 2010 will be using Gora’s first release.
>  * Hadoop : Gora has extensive support for Hadoop MapReduce Gora defines all
> the necessary data structures for working with Hadoop .Data stored in column
> oriented data stores can be analyzed  with Gora using Hadoop.
>  * Avro : Gora uses and extends Avro. Data beans in Gora are defined using
> Avro schemas ,and compiled into Java code with the extended version of the
> Avro compiler. Avro is also used in data serialization.
>  * HBase : Gora supports HBase as a persistency backend.
>  * Cassandra : Gora support Cassandra as a persistency backend.
>  * Lucene/Solr : Gora intends to support Lucene/Solr as a persistency and
> indexing backend.
>  * Pig : Gora intends to support Pig for data anaysis
>  * Hive :  Gora intends to support Hive for data analysis
>
> === An Excessive Fascination with the Apache Brand ===
> Gora is a natural fit for Apache due to it's current commiters and depending
> projects.
>
> == Documentation ==
>  * The project is currently hosted at http://github.com/enis/gora/.
>  * Wiki pages can be found at http://wiki.github.com/enis/gora/.
>  * List of issues can be found at  http://github.com/enis/gora/issues/.
>  * Current web address: http://groups.google.com/group/gora-dev.
>  * Current email address: gora-...@googlegroups.com.
>
> == Initial Source ==
> The initial source was developed as a patch to the Apache Nutch project. But
> the storage abstraction layer was orthogonal to the web crawler, and we
> decided to extract it to a separate project with much wider goals. Thus
> Gora, as a project, was born. The initial code is developed by Enis and
> Dogacan with ant.com’s sponsorship.
>
> The code can be found at http://github.com/enis/gora/.
>
> == External Dependencies ==
> External dependencies excluding Apache projects are as follows
>  * JDOM - http://jdom.org/ -  Apache-style license
>  * SQL Builder - http://openhms.sourceforge.net/sqlbuilder/ - Artistic
> License, LGPL. SQL Builder is intended to be removed from the source due to
> technical reasons anyway.
>  * HSQLDB - http://hsqldb.org/ - BSD-style license
>  * JUnit - http://junit.org - Common Public License 1.0
>  * SLF4J - http://www.slf4j.org/ - MIT License
>  * Google Guava Libraries - http://code.google.com/p/guava-libraries/ -
> Apache License 2.0
>
>
> == Required Resources ==
>
> === Mailing Lists ===
>
>  * gora-private (with moderated subscriptions)
>  * gora-dev
>  * gora-commits
>
> === Subversion Directory ===
>
>  * [[http://svn.apache.org/repos/asf/incubator/gora]]
>
> === Issue Tracking ===
>  * JIRA (GORA)
>
> === Other Resources ===
> We need a wiki at http://wiki.apache.org. Currently, we have a wiki at
> Github, Since there is not a lot of pages there, we can manually move the
> pages to the wiki at wiki.apache.org.
>
> == Initial Committers ==
>
> Name                   email
> Affiliation        Timezone
> Enis Söztutar       enis [at] apache.org           Konneka         +3
> Doğacan Güney  dogacan [at] apache.org    Konneka         +3
> Sertan Alkan       sertanalkan [at] gmail.com Konneka         +3
> Julien Nioche       jnioche [at] apache.org      DigitalPebble  +1
> Andrzej Bialecki   ab [at] apache.org             Sigram
>
>
> === Affiliations ===
> All of the parties are affiliated with open source consulting shops. Most of
> the development was sponsored by ant.com, however we expect that the amount
> of volunteer work will increase, and more developers will come on board.
>
> == Sponsors ==
>
> === Champion ===
>  * Chris Mattmann (mattmann AT apache DOT org)
>
> === Nominated Mentors ===
>  * Chris Mattmann (mattmann AT apache DOT org)
>  * Andrzej Bialecki (ab AT apache DOT org )
>
> === Sponsoring Entity ===
> Apache Incubator. Successful graduation can result in either being a TLP, or
> a subproject of
> Hadoop, since most of the community is projected to overlap.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [PROPOSAL] Gora to enter Incubator

Reply via email to