Re: [DISCUSS] S2Graph Incubator Proposal

Hyunsik Choi Mon, 09 Nov 2015 10:55:54 -0800

This project is looking for mentors. Anyone can help? We are also
looking forward to any feedback.

Also, I attached the proposal here. I forgot it.

----------------

= S2Graph Proposal =

== Abstract ==
S2Graph is a distributed and scalable OLTP graph database built on
HBase to support fast traversal on extremely large graph.

Here are additional materials to introduce S2Graph.
 * HBaseCon 2015 - http://www.slideshare.net/HBaseCon/use-cases-session-5
 * Apache: Big Data 2015 -
http://schd.ws/hosted_files/apachebigdata2015/06/s2graph_apache_con.pdf

== Proposal ==
S2Graph is to provide a scalable distributed graph database engine
over key/value storage such as HBase. S2Graph provide fully
ashynchronous API to manupulate data as property graph model and fast
breadth first search query on graph.

== Background ==
S2Graph initially started as an internal project at Kakao.com to
efficiently store user relation and user activities as one large graph
and provide unified query to traverse graph. It was open sourced on
Github about a 3 months ago in June 2015.

Over time S2Graph, together with HBase as storage tier, has begun to
be adapted into various applications, such as messaging, social feeds,
realtime recommendations at Kakao.

Users can benefit from S2Graph`s generalized high level API instead of
low-level key/value API for graph abstraction, just like Phoenix
provide SQL layer over HBase.

== Rationale ==
Graph data(highly interconnected data) is very abundant and important
these days.
When users have a multitude of relationships, each with complex
properties associated with them, graph model is more intuitive and
efficient than tabular format(RDBMS).
There are many ASF projects that provide SQL layer, but there is no
ASF projects that provide scalable graph layer on existing hadoop echo
system.
When graph data grows to trillion edge scale, the process of
traversing takes a long time and costly. However, with the benefit of
HBase`s scalable architecture, S2Graph can traverse large graph in
breadth first search manner efficiently.

S2Graph also interoperates with several existing Apache
projects(HBase, Spark) to provide way to merge real time events and
batch processed data using property graph data model.

Many developers are running their own domain specific API servers to
serve their data products, but graph model is general and S2Graph API
fully support traverse on graph, so it can be used as scalable general
purpose API serving layer for various domains.
As long as data can be modeled as graph, then users can avoid tedious
work for developing customized API servers by using S2Graph.

== Initial Goals ==
The initial goals will be to move the existing codebase to Apache and
integrate with the Apache development process. Once this is
accomplished, we plan for incremental development and releases that
follow the Apache guidelines.

== Current Status ==

=== Meritocracy ===
S2Graph operated on meritocratic principles from the get go.
Currently, all the discussions pertaining to S2Graph development are
public on Github. The current incubation
proposal includes the major code contributors to S2Graph. Several
additional people have worked on the S2graph codebase for industry use
cases and would be interested in becoming committers. We are starting
with a small committer group and we plan to add additional committers
following an open merit-based decision process during the incubation
phase.

=== Community ===
We have already begun building a community but at this time the
community consists only of S2Graph developers – all Kakao employees –
and prospective users.
S2Graph seeks to develop developer and user communities during incubation.

=== Core Developers ===
S2Graph is currently being designed and developed by 2 engineers from
Kakao. - Doyung Yoon, Deawon Jeong.

=== Alignment ===
Our proposed S2Graph effort aligns closely with Apache HBase. The
HBase project perimeter is denoted by a simple byte-array based
Create, Read, Update, Delete and Scan APIs with no current plans to
extend beyond this bounds.

S2Graph complements this with a higher level API for property graph model.

S2Graph was designed to offer scalable distributed graph database skin
over HBase from the beginning in order to provide property graph model
and breadth first search, and continue to focus on providing graph
model.

== Known Risks ==
=== Orphaned Products ===
The core developers of S2Graph team plan to work full time on this
project. There is very little risk of S2Graph getting orphaned since
at least one large company (Kakao) is extensively using it in their
production HBase clusters. For example, currently there are 20+ use
cases with more than 1+Trillion edges and 140 million breadth first
search query requests per minute using S2Graph in production.
We plan to extend and diversify this community further through Apache.

=== Inexperience with Open Source ===
The core developers are all active users and followers of open source.
They are already committers and contributors to the S2Graph Github
project. All have been involved with the source code that has been
released under an open source license. Though the core set of
Developers do not have Apache Open Source experience, there are plans
to onboard individuals with Apache open source experience on to the
project.

=== Homogenous Developers ===
Most committers in this proposal belong to the same institution
(Kakao). The engagement of these committers goes well beyond the
necessary development to support research, and all committers work on
S2Graph full time.
Several people from other institutions are working on and are familiar
with the S2Graph codebase. We will work to attract them as future
committers during the incubation phase, following a merit-based
approach.

=== Reliance on Salaried Developers ===
Kakao invested in S2Graph as the distributed graph database solution
on top of HBase and some of its key engineers are working full time on
the project.
We look forward to other Apache developers and researchers to
contribute to the project.
Also key to addressing the risk associated with relying on Salaried
developers from a single entity is to increase the diversity of the
contributors and actively lobby for Domain experts in the graph
database space to contribute. Apache S2Graph intends to do this.

=== Relationships with Other Apache Products ===
S2Graph has a strong relationship and dependency with Apache Hadoop
HBase and Spark.
Being part of Apache’s Incubation community, could help with a closer
collaboration among these two projects and as well as others.

In terms of graph processing frameworks, S2Graph and Apache Giraph
look similar. However, their goals are apparently different to each
other. Giraph aims at analytical batch processing on immutable graph
data sets. In contrast, S2Graph is designed for OLTP-like workloads on
graph data sets, and S2Graph provides INSERT/UPDATE operations too.

=== An Excessive Fascination with the Apache Brand ===
S2Graph is proposing to enter incubation at Apache in order to help
efforts to diversify the committer-base, not so much to capitalize on
the Apache brand. The S2Graph project is in production use already
inside Kakao, but is not expected to be an Kakao product for external
customers. As such, the S2Graph project is not seeking to use the
Apache brand as a marketing tool.

== Documentation ==
Information about S2Graph can be found at
https://github.com/kakao/s2graph. The following links provide more
information about S2Graph in open source:
 * S2Graph web site: https://steamshon.gitbooks.io/s2graph-book/content/
 * Codebase at Github: https://github.com/kakao/s2graph
 * Issue Tracking: https://github.com/kakao/s2graph/issues
 * User community: https://groups.google.com/forum/#!forum/s2graph

== Initial Source ==

The S2Graph codebase is currently hosted on Github:
https://github.com/kakao/s2graph

=== Source and Intellectual Property Submission Plan ===

Currently, the S2Graph codebase is distributed under the Apache 2.0 License.

== External Dependencies ==

Beyond relying on Apache HBase, Phoenix has the following external dependencies:
 * Asynchbase (BSD license: http://www.antlr3.org/license.html)
 * Mysql (BSD license:
https://github.com/julianhyde/sqlline/blob/master/LICENSE)
 * Play Framework (Apache 2.0 license:
https://github.com/playframework/playframework)
 * Scala (https://github.com/scala/scala)
 * Spark
 * Kafka

== Required Resources ==

=== Mailing list ===

We will migrate our mailing lists to the following:
 * [email protected]
 * [email protected]
 * [email protected]
 * [email protected]

=== Source control ===

The S2Graph team would like to use Git for source control, due to our
current use of Git. We request a writeable Git repo for S2Graph, and
mirroring to be set up to Github through INFRA.

=== Issue Tracking ===

S2Graph currently uses the github issue tracking system associated
with its github repo: https://github.com/kakao/s2graph/issues. We will
migrate to the Apache JIRA:
http://issues.apache.org/jira/browse/S2Graph

=== Other Resources ===

Jenkins/Hudson for builds and test running.
Wiki for documentation purposes
Blog to improve project dissemination

== Initial Committers ==

 * Doyung Yoon <shom83 at gmail.com>
 * Daewon Jeong <blueiur at gmail.com>
 * Jaesang Kim <honeysleep at gmail.com>
 * Hwansung Yu <deejayfwan at gmail.com>
 * Min-Seok Kim <mskim.org at gmail.com>
 * Chul Kang <miralchul at gmail.com>

== Affiliations ==

The initial committers are from one organizations: Kakao.
 * Doyung Yoon, Kakao
 * Daewon Jeong, Kakao
 * Jaesang Kim, Kakao
 * Hwansung Yu, Kakao
 * Min-Seok Kim, Kakao
 * Chul Kang, Kakao

== Sponsors ==

=== Champion ===
Hyunsik Choi

=== Nominated Mentors ===

=== Sponsoring Entity ===

 * The Apache Incubator

On Fri, Nov 6, 2015 at 4:05 PM, Hyunsik Choi <[email protected]> wrote:
> Hi Seetharam,
>
> Thank you for a good question. That seem to be a frequent question to
> this project.
>
> Here is the answer to your question.
> https://steamshon.gitbooks.io/s2graph-book/content/what_is_different_to_titan.html
>
> I hope that this link is helpful to your understanding.
>
> Best regards,
> Hyunsik
>
>
>
> On Fri, Nov 6, 2015 at 3:07 PM, Seetharam Venkatesh
> <[email protected]> wrote:
>> Hi Hyunsik,
>>
>> The proposal looks interesting and want to know how is this different from
>> existing solutions in the same space such as Titan, etc.
>>
>> Thanks!
>> Venkatesh
>>
>>
>> On Fri, Nov 6, 2015 at 1:36 PM Hyunsik Choi <[email protected]> wrote:
>>
>>> Hi folks,
>>>
>>> We would like to start a discussion on S2Graph as an incubation project.
>>>
>>> S2Graph is a distributed and scalable OLTP graph database built on
>>> HBase. It provides interactive queries for vertex/edge/sub-graphs on
>>> extremely large graph data sets as well as insertion and update
>>> operations.
>>>
>>> S2Graph was already introduced in Apache BigData and HBaseCon this year.
>>>
>>> The proposal is available at :
>>> https://wiki.apache.org/incubator/S2GraphProposal
>>>
>>> We are looking forward to any feedback. In addition, we are looking
>>> for volunteers as mentors.
>>>
>>> Best regards,
>>> Hyunsik
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] S2Graph Incubator Proposal

Reply via email to