Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun Thu, 01 Feb 2018 06:06:08 -0800

Thank you all for the feedback.
We changed the project name to Coral.
You can find the proposal at https://wiki.apache.org/incubator/CoralProposal
.


I will soon send out a voting email.

Thanks.
-Gon




On Wed, Jan 31, 2018 at 11:54 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Thanks, much appreciated !
>
> Regards
> JB
>
> On 01/31/2018 09:50 AM, Byung-Gon Chun wrote:
> > On Wed, Jan 31, 2018 at 4:04 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> >
> >> Hi,
> >>
> >> Coral is a good name !
> >>
> >
> > Thanks!
> >
> >
> >>
> >> Does the code belong to Seoul National University ? In that case, in
> >> addition of
> >> your ICLA, we would need a SGA (it's not blocker for the project
> >> bootstrapping
> >> or code donation, but we, at least, will need it later for graduation).
> On
> >> the
> >> other hand, if the committers are all part on the university, you can
> also
> >> sign
> >> a CCLA.
> >>
> >
> > I will figure this out.
> >
> >
> >>
> >> Happy to be mentor on the project if you want me ! ;)
> >>
> >>
> > Thanks! I will add you to the mentor list.
> >
> > -Gon
> >
> >
> >> Thanks,
> >> Regards
> >> JB
> >>
> >> On 01/30/2018 10:17 AM, Byung-Gon Chun wrote:
> >>> Thanks for the comments, JB!
> >>> My replies are inlined below.
> >>>
> >>> On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré <j...@nanthrax.net
> >
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> sorry to be a little bit late on this.
> >>>>
> >>>> It's a very interesting proposal. It sounds pretty close to the
> >> portability
> >>>> layer we want to add in Apache Beam. I would love to see interaction
> >>>> between the
> >>>> two communities.
> >>>>
> >>>> I have two minor questions:
> >>>>
> >>>> 1. about the name: Onyx sounds very generic and the name is used in
> >> other
> >>>> technologies. Maybe another unique name would be more accurate.
> >>>>
> >>>
> >>> We proposed Coral instead. How does this sound?
> >>>
> >>>
> >>>> 2. the Onyx code is on github right now, under the Apache 2.0 license.
> >>>> Does this
> >>>> code has any affiliation with companies ? Meaning that we would need a
> >> SGA
> >>>> for
> >>>> the code donation.
> >>>>
> >>>> It does not. The developers are affiliated with Seoul National
> >> University.
> >>> In this case, do we still need a SGA?
> >>>
> >>>
> >>>> If you need any help for the incubation, I would be more than happy to
> >>>> help !
> >>>>
> >>>>
> >>> Thanks for the offer. Would you be interested in being a mentor of the
> >>> project?
> >>>
> >>> Thanks.
> >>> -Gon
> >>>
> >>>
> >>>
> >>>> Regards
> >>>> JB
> >>>>
> >>>> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
> >>>>> Dear Apache Incubator Community,
> >>>>>
> >>>>> Please accept the following proposal for presentation and discussion:
> >>>>> https://wiki.apache.org/incubator/OnyxProposal
> >>>>>
> >>>>> Onyx is a data processing system that aims to flexibly control the
> >>>> runtime
> >>>>> behaviors of a job to adapt to varying deployment characteristics
> >> (e.g.,
> >>>>> harnessing transient resources in datacenters, cross-datacenter
> >>>> deployment,
> >>>>> changing runtime based on job characteristics, etc.). Onyx provides
> >> ways
> >>>> to
> >>>>> extend the system’s capabilities and incorporate the extensions to
> the
> >>>>> flexible job execution.
> >>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into
> >> an
> >>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and
> deploys
> >>>>> based on a deployment policy.
> >>>>>
> >>>>> I've attached the proposal below.
> >>>>>
> >>>>> Best regards,
> >>>>> Byung-Gon Chun
> >>>>>
> >>>>> = OnyxProposal =
> >>>>>
> >>>>> == Abstract ==
> >>>>> Onyx is a data processing system for flexible employment with
> >>>>> different execution scenarios for various deployment characteristics
> >>>>> on clusters.
> >>>>>
> >>>>> == Proposal ==
> >>>>> Today, there is a wide variety of data processing systems with
> >>>>> different designs for better performance and datacenter efficiency.
> >>>>> They include processing data on specific resource environments and
> >>>>> running jobs with specific attributes. Although each system
> >>>>> successfully solves the problems it targets, most systems are
> designed
> >>>>> in the way that runtime behaviors are built tightly inside the system
> >>>>> core to hide the complexity of distributed computing. This makes it
> >>>>> hard for a single system to support different deployment
> >>>>> characteristics with different runtime behaviors without substantial
> >>>>> effort.
> >>>>>
> >>>>> Onyx is a data processing system that aims to flexibly control the
> >>>>> runtime behaviors of a job to adapt to varying deployment
> >>>>> characteristics. Moreover, it provides a means of extending the
> >>>>> system’s capabilities and incorporating the extensions to the
> flexible
> >>>>> job execution.
> >>>>>
> >>>>> In order to be able to easily modify runtime behaviors to adapt to
> >>>>> varying deployment characteristics, Onyx exposes runtime behaviors to
> >>>>> be flexibly configured and modified at both compile-time and runtime
> >>>>> through a set of high-level graph pass interfaces.
> >>>>>
> >>>>> We hope to contribute to the big data processing community by
> enabling
> >>>>> more flexibility and extensibility in job executions. Furthermore, we
> >>>>> can benefit more together as a community when we work together as a
> >>>>> community to mature the system with more use cases and understanding
> >>>>> of diverse deployment characteristics. The Apache Software Foundation
> >>>>> is the perfect place to achieve these aspirations.
> >>>>>
> >>>>> == Background ==
> >>>>> Many data processing systems have distinctive runtime behaviors
> >>>>> optimized and configured for specific deployment characteristics like
> >>>>> different resource environments and for handling special job
> >>>>> attributes.
> >>>>>
> >>>>> For example, much research have been conducted to overcome the
> >>>>> challenge of running data processing jobs on cheap, unreliable
> >>>>> transient resources. Likewise, techniques for disaggregating
> different
> >>>>> types of resources, like memory, CPU and GPU, are being actively
> >>>>> developed to use datacenter resources more efficiently. Many
> >>>>> researchers are also working to run data processing jobs in even more
> >>>>> diverse environments, such as across distant datacenters. Similarly,
> >>>>> for special job attributes, many works take different approaches,
> such
> >>>>> as runtime optimization, to solve problems like data skew, and to
> >>>>> optimize systems for data processing jobs with small-scale input
> data.
> >>>>>
> >>>>> Although each of the systems performs well with the jobs and in the
> >>>>> environments they target, they perform poorly with unconsidered
> cases,
> >>>>> and do not consider supporting multiple deployment characteristics on
> >>>>> a single system in their designs.
> >>>>>
> >>>>> For an application writer to optimize an application to perform well
> >>>>> on a certain system engraved with its underlying behaviors, it
> >>>>> requires a deep understanding of the system itself, which is an
> >>>>> overhead that often requires a lot of time and effort. Moreover, for
> a
> >>>>> developer to modify such system behaviors, it requires modifications
> >>>>> of the system core, which requires an even deeper understanding of
> the
> >>>>> system itself.
> >>>>>
> >>>>> With this background, Onyx is designed to represent all of its jobs
> as
> >>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> >>>>> applications from various programming models (ex. Apache Beam) are
> >>>>> submitted, transformed to an IR DAG, and optimized/customized for the
> >>>>> deployment characteristics. In the IR DAG optimization phase, the DAG
> >>>>> is modified through a series of compiler “passes” which reshape or
> >>>>> annotate the DAG with an expression of the underlying runtime
> >>>>> behaviors. The IR DAG is then submitted as an execution plan for the
> >>>>> Onyx runtime. The runtime includes the unmodified parts of data
> >>>>> processing in the backbone which is transparently integrated with
> >>>>> configurable components exposed for further extension.
> >>>>>
> >>>>> == Rationale ==
> >>>>> Onyx’s vision lies in providing means for flexibly supporting a wide
> >>>>> variety of job execution scenarios for users while facilitating
> system
> >>>>> developers to extend the execution framework with various
> >>>>> functionalities at the same time. The capabilities of the system can
> >>>>> be extended as it grows to meet a more variety of execution
> scenarios.
> >>>>> We require inputs from users and developers from diverse domains in
> >>>>> order to make it a more thriving and useful project. The Apache
> >>>>> Software Foundation provides the best tools and community to support
> >>>>> this vision.
> >>>>>
> >>>>> == Initial Goals ==
> >>>>> Initial goals will be to move the existing codebase to Apache and
> >>>>> integrate with the Apache development process. We further plan to
> >>>>> develop our system to meet the needs for more execution scenarios for
> >>>>> a more variety of deployment characteristics.
> >>>>>
> >>>>> == Current Status ==
> >>>>> Onyx codebase is currently hosted in a repository at github.com. The
> >>>>> current version has been developed by system developers at Seoul
> >>>>> National University, Viva Republica, Samsung, and LG.
> >>>>>
> >>>>> == Meritocracy ==
> >>>>> We plan to strongly support meritocracy. We will discuss the
> >>>>> requirements in an open forum, and those that continuously contribute
> >>>>> to Onyx with the passion to strengthen the system will be invited as
> >>>>> committers. Contributors that enrich Onyx by providing various use
> >>>>> cases, various implementations of the configurable components
> >>>>> including ideas for optimization techniques will be especially
> >>>>> welcome. Committers with a deep understanding of the system’s
> >>>>> technical aspects as a whole and its philosophy will definitely be
> >>>>> voted as the PMC. We will monitor community participation so that
> >>>>> privileges can be extended to those that contribute.
> >>>>>
> >>>>> == Community ==
> >>>>> We hope to expand our contribution community by becoming an Apache
> >>>>> incubator project. The contributions will come from both users and
> >>>>> system developers interested in flexibility and extensibility of job
> >>>>> executions that Onyx can support. We expect users to mainly
> contribute
> >>>>> to diversify the use cases and deployment characteristics, and
> >>>>> developers to  contribute to implement them.
> >>>>>
> >>>>> == Alignment ==
> >>>>> Apache Spark is one of many popular data processing frameworks. The
> >>>>> system is designed towards optimizing jobs using RDDs in memory and
> >>>>> many other optimizations built tightly within the framework. In
> >>>>> contrast to Spark, Onyx aims to provide more flexibility for job
> >>>>> execution in an easy manner.
> >>>>>
> >>>>> Apache Tez enables developers to build complex task DAGs with control
> >>>>> over the control plane of job execution. In Onyx, a high-level
> >>>>> programming layer (ex. Apache Beam) is automatically converted to a
> >>>>> basic IR DAG and can be converted to any IR DAG through a series of
> >>>>> easy user writable passes, that can both reshape and modify the
> >>>>> annotation (of execution properties) of the DAG. Moreover, Onyx
> leaves
> >>>>> more parts of the job execution configurable, such as the scheduler
> >>>>> and the data plane. As opposed to providing a set of properties for
> >>>>> solid optimization, Onyx’s configurable parts can be easily extended
> >>>>> and explored by implementing the pre-defined interfaces. For example,
> >>>>> an arbitrary intermediate data store can be added.
> >>>>>
> >>>>> Onyx currently supports Apache Beam programs and we are working on
> >>>>> supporting Apache Spark programs as well. Onyx also utilizes Apache
> >>>>> REEF for container management, which allows Onyx to run in Apache
> YARN
> >>>>> and Apache Mesos clusters. If necessary, we plan to contribute to and
> >>>>> collaborate with these other Apache projects for the benefit of all.
> >>>>> We plan to extend such integrations with more Apache softwares.
> Apache
> >>>>> software foundation already hosts many major big-data systems, and we
> >>>>> expect to help further growth of the big-data community by having
> Onyx
> >>>>> within the Apache foundation.
> >>>>>
> >>>>> == Known Risks ==
> >>>>> === Orphaned Products ===
> >>>>> The risk of the Onyx project being orphaned is minimal. There is
> >>>>> already plenty of work that arduously support different deployment
> >>>>> characteristics, and we propose a general way to implement them with
> >>>>> flexible and extensible configuration knobs. The domain of data
> >>>>> processing is already of high interest, and this domain is expected
> to
> >>>>> evolve continuously with various other purposes, such as resource
> >>>>> disaggregation and using transient resources for better datacenter
> >>>>> resource utilization.
> >>>>>
> >>>>> === Inexperience with Open Source ===
> >>>>> The initial committers include PMC members and committers of other
> >>>>> Apache projects. They have experience with open source projects,
> >>>>> starting from their incubation to the top-level. They have been
> >>>>> involved in the open source development process, and are familiar
> with
> >>>>> releasing code under an open source license.
> >>>>>
> >>>>> === Homogeneous Developers ===
> >>>>> The initial set of committers is from a limited set of organizations,
> >>>>> but we expect to attract new contributors from diverse organizations
> >>>>> and will thus grow organically once approved for incubation. Our
> prior
> >>>>> experience with other open source projects will help various
> >>>>> contributors to actively participate in our project.
> >>>>>
> >>>>> === Reliance on Salaried Developers ===
> >>>>> Many developers are from Seoul National University. This is not
> >>>> applicable.
> >>>>>
> >>>>> === Relationships with Other Apache Products ===
> >>>>> Onyx positions itself among multiple Apache products. It runs on
> >>>>> Apache REEF for container management. It also utilizes many useful
> >>>>> development tools including Apache Maven, Apache Log4J, and multiple
> >>>>> Apache Commons components. Onyx supports the Apache Beam programming
> >>>>> model for user applications. We are currently working on supporting
> >>>>> the Apache Spark programming APIs as well.
> >>>>>
> >>>>> === An Excessive Fascination with the Apache Brand ===
> >>>>> We hope to make Onyx a powerful system for data processing, meeting
> >>>>> various needs for different deployment characteristics, under a more
> >>>>> variety of environments. We see the limitations of simply putting
> code
> >>>>> on GitHub, and we believe the Apache community will help the growth
> of
> >>>>> Onyx for the project to become a positively impactful and innovative
> >>>>> open source software. We believe Onyx is a great fit for the Apache
> >>>>> Software Foundation due to the collaboration it aims to achieve from
> >>>>> the big data processing community.
> >>>>>
> >>>>> == Documentation ==
> >>>>> The current documentation for Onyx is at
> >> https://snuspl.github.io/onyx/.
> >>>>>
> >>>>> == Initial Source ==
> >>>>> The Onyx codebase is currently hosted at
> >> https://github.com/snuspl/onyx.
> >>>>>
> >>>>> == External Dependencies ==
> >>>>> To the best of our knowledge, all Onyx dependencies are distributed
> >>>>> under Apache compatible licenses. Upon acceptance to the incubator,
> we
> >>>>> would begin a thorough analysis of all transitive dependencies to
> >>>>> verify this fact and further introduce license checking into the
> build
> >>>>> and release process.
> >>>>>
> >>>>> == Cryptography ==
> >>>>> Not applicable.
> >>>>>
> >>>>> == Required Resources ==
> >>>>> === Mailing Lists ===
> >>>>> We will operate two mailing lists as follows:
> >>>>>    * Onyx PMC discussions: priv...@onyx.incubator.apache.org
> >>>>>    * Onyx developers: d...@onyx.incubator.apache.org
> >>>>>
> >>>>> === Git Repositories ===
> >>>>> Upon incubation: https://github.com/apache/incubator-onyx.
> >>>>> After the incubation, we would like to move the existing repo
> >>>>> https://github.com/snuspl/onyx to the Apache infrastructure
> >>>>>
> >>>>> === Issue Tracking ===
> >>>>> Onyx currently tracks its issues using the Github issue tracker:
> >>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> >>>>> JIRA.
> >>>>>
> >>>>> == Initial Committers ==
> >>>>>   * Byung-Gon Chun
> >>>>>   * Jeongyoon Eo
> >>>>>   * Geon-Woo Kim
> >>>>>   * Joo Yeon Kim
> >>>>>   * Gyewon Lee
> >>>>>   * Jung-Gil Lee
> >>>>>   * Sanha Lee
> >>>>>   * Wooyeon Lee
> >>>>>   * Yunseong Lee
> >>>>>   * JangHo Seo
> >>>>>   * Won Wook Song
> >>>>>   * Taegeon Um
> >>>>>   * Youngseok Yang
> >>>>>
> >>>>> == Affiliations ==
> >>>>>   * SNU (Seoul National University)
> >>>>>     * Byung-Gon Chun
> >>>>>     * Jeongyoon Eo
> >>>>>     * Geon-Woo Kim
> >>>>>     * Gyewon Lee
> >>>>>     * Sanha Lee
> >>>>>     * Wooyeon Lee
> >>>>>     * Yunseong Lee
> >>>>>     * JangHo Seo
> >>>>>     * Won Wook Song
> >>>>>     * Taegeon Um
> >>>>>     * Youngseok Yang
> >>>>>
> >>>>>   * LG
> >>>>>     * Jung-Gil Lee
> >>>>>
> >>>>>   * Samsung
> >>>>>     * Joo Yeon Kim
> >>>>>
> >>>>>   * Viva Republica
> >>>>>     * Geon-Woo Kim
> >>>>>
> >>>>> == Sponsors ==
> >>>>> === Champions ===
> >>>>> Byung-Gon Chun
> >>>>>
> >>>>> === Mentors ===
> >>>>>   * Hyunsik Choi
> >>>>>   * Byung-Gon Chun
> >>>>>   * Markus Weimer
> >>>>>   * Reynold Xin
> >>>>>
> >>>>> === Sponsoring Entity ===
> >>>>> The Apache Incubator
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> Jean-Baptiste Onofré
> >>>> jbono...@apache.org
> >>>> http://blog.nanthrax.net
> >>>> Talend - http://www.talend.com
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >>>> For additional commands, e-mail: general-h...@incubator.apache.org
> >>>>
> >>>>
> >>>
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbono...@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >> For additional commands, e-mail: general-h...@incubator.apache.org
> >>
> >>
> >
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


-- 
Byung-Gon Chun

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Reply via email to