Thank you all for the feedback. We changed the project name to Coral. You can find the proposal at https://wiki.apache.org/incubator/CoralProposal .
I will soon send out a voting email. Thanks. -Gon On Wed, Jan 31, 2018 at 11:54 PM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Thanks, much appreciated ! > > Regards > JB > > On 01/31/2018 09:50 AM, Byung-Gon Chun wrote: > > On Wed, Jan 31, 2018 at 4:04 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > > wrote: > > > >> Hi, > >> > >> Coral is a good name ! > >> > > > > Thanks! > > > > > >> > >> Does the code belong to Seoul National University ? In that case, in > >> addition of > >> your ICLA, we would need a SGA (it's not blocker for the project > >> bootstrapping > >> or code donation, but we, at least, will need it later for graduation). > On > >> the > >> other hand, if the committers are all part on the university, you can > also > >> sign > >> a CCLA. > >> > > > > I will figure this out. > > > > > >> > >> Happy to be mentor on the project if you want me ! ;) > >> > >> > > Thanks! I will add you to the mentor list. > > > > -Gon > > > > > >> Thanks, > >> Regards > >> JB > >> > >> On 01/30/2018 10:17 AM, Byung-Gon Chun wrote: > >>> Thanks for the comments, JB! > >>> My replies are inlined below. > >>> > >>> On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré <j...@nanthrax.net > > > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> sorry to be a little bit late on this. > >>>> > >>>> It's a very interesting proposal. It sounds pretty close to the > >> portability > >>>> layer we want to add in Apache Beam. I would love to see interaction > >>>> between the > >>>> two communities. > >>>> > >>>> I have two minor questions: > >>>> > >>>> 1. about the name: Onyx sounds very generic and the name is used in > >> other > >>>> technologies. Maybe another unique name would be more accurate. > >>>> > >>> > >>> We proposed Coral instead. How does this sound? > >>> > >>> > >>>> 2. the Onyx code is on github right now, under the Apache 2.0 license. > >>>> Does this > >>>> code has any affiliation with companies ? Meaning that we would need a > >> SGA > >>>> for > >>>> the code donation. > >>>> > >>>> It does not. The developers are affiliated with Seoul National > >> University. > >>> In this case, do we still need a SGA? > >>> > >>> > >>>> If you need any help for the incubation, I would be more than happy to > >>>> help ! > >>>> > >>>> > >>> Thanks for the offer. Would you be interested in being a mentor of the > >>> project? > >>> > >>> Thanks. > >>> -Gon > >>> > >>> > >>> > >>>> Regards > >>>> JB > >>>> > >>>> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote: > >>>>> Dear Apache Incubator Community, > >>>>> > >>>>> Please accept the following proposal for presentation and discussion: > >>>>> https://wiki.apache.org/incubator/OnyxProposal > >>>>> > >>>>> Onyx is a data processing system that aims to flexibly control the > >>>> runtime > >>>>> behaviors of a job to adapt to varying deployment characteristics > >> (e.g., > >>>>> harnessing transient resources in datacenters, cross-datacenter > >>>> deployment, > >>>>> changing runtime based on job characteristics, etc.). Onyx provides > >> ways > >>>> to > >>>>> extend the system’s capabilities and incorporate the extensions to > the > >>>>> flexible job execution. > >>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into > >> an > >>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and > deploys > >>>>> based on a deployment policy. > >>>>> > >>>>> I've attached the proposal below. > >>>>> > >>>>> Best regards, > >>>>> Byung-Gon Chun > >>>>> > >>>>> = OnyxProposal = > >>>>> > >>>>> == Abstract == > >>>>> Onyx is a data processing system for flexible employment with > >>>>> different execution scenarios for various deployment characteristics > >>>>> on clusters. > >>>>> > >>>>> == Proposal == > >>>>> Today, there is a wide variety of data processing systems with > >>>>> different designs for better performance and datacenter efficiency. > >>>>> They include processing data on specific resource environments and > >>>>> running jobs with specific attributes. Although each system > >>>>> successfully solves the problems it targets, most systems are > designed > >>>>> in the way that runtime behaviors are built tightly inside the system > >>>>> core to hide the complexity of distributed computing. This makes it > >>>>> hard for a single system to support different deployment > >>>>> characteristics with different runtime behaviors without substantial > >>>>> effort. > >>>>> > >>>>> Onyx is a data processing system that aims to flexibly control the > >>>>> runtime behaviors of a job to adapt to varying deployment > >>>>> characteristics. Moreover, it provides a means of extending the > >>>>> system’s capabilities and incorporating the extensions to the > flexible > >>>>> job execution. > >>>>> > >>>>> In order to be able to easily modify runtime behaviors to adapt to > >>>>> varying deployment characteristics, Onyx exposes runtime behaviors to > >>>>> be flexibly configured and modified at both compile-time and runtime > >>>>> through a set of high-level graph pass interfaces. > >>>>> > >>>>> We hope to contribute to the big data processing community by > enabling > >>>>> more flexibility and extensibility in job executions. Furthermore, we > >>>>> can benefit more together as a community when we work together as a > >>>>> community to mature the system with more use cases and understanding > >>>>> of diverse deployment characteristics. The Apache Software Foundation > >>>>> is the perfect place to achieve these aspirations. > >>>>> > >>>>> == Background == > >>>>> Many data processing systems have distinctive runtime behaviors > >>>>> optimized and configured for specific deployment characteristics like > >>>>> different resource environments and for handling special job > >>>>> attributes. > >>>>> > >>>>> For example, much research have been conducted to overcome the > >>>>> challenge of running data processing jobs on cheap, unreliable > >>>>> transient resources. Likewise, techniques for disaggregating > different > >>>>> types of resources, like memory, CPU and GPU, are being actively > >>>>> developed to use datacenter resources more efficiently. Many > >>>>> researchers are also working to run data processing jobs in even more > >>>>> diverse environments, such as across distant datacenters. Similarly, > >>>>> for special job attributes, many works take different approaches, > such > >>>>> as runtime optimization, to solve problems like data skew, and to > >>>>> optimize systems for data processing jobs with small-scale input > data. > >>>>> > >>>>> Although each of the systems performs well with the jobs and in the > >>>>> environments they target, they perform poorly with unconsidered > cases, > >>>>> and do not consider supporting multiple deployment characteristics on > >>>>> a single system in their designs. > >>>>> > >>>>> For an application writer to optimize an application to perform well > >>>>> on a certain system engraved with its underlying behaviors, it > >>>>> requires a deep understanding of the system itself, which is an > >>>>> overhead that often requires a lot of time and effort. Moreover, for > a > >>>>> developer to modify such system behaviors, it requires modifications > >>>>> of the system core, which requires an even deeper understanding of > the > >>>>> system itself. > >>>>> > >>>>> With this background, Onyx is designed to represent all of its jobs > as > >>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user > >>>>> applications from various programming models (ex. Apache Beam) are > >>>>> submitted, transformed to an IR DAG, and optimized/customized for the > >>>>> deployment characteristics. In the IR DAG optimization phase, the DAG > >>>>> is modified through a series of compiler “passes” which reshape or > >>>>> annotate the DAG with an expression of the underlying runtime > >>>>> behaviors. The IR DAG is then submitted as an execution plan for the > >>>>> Onyx runtime. The runtime includes the unmodified parts of data > >>>>> processing in the backbone which is transparently integrated with > >>>>> configurable components exposed for further extension. > >>>>> > >>>>> == Rationale == > >>>>> Onyx’s vision lies in providing means for flexibly supporting a wide > >>>>> variety of job execution scenarios for users while facilitating > system > >>>>> developers to extend the execution framework with various > >>>>> functionalities at the same time. The capabilities of the system can > >>>>> be extended as it grows to meet a more variety of execution > scenarios. > >>>>> We require inputs from users and developers from diverse domains in > >>>>> order to make it a more thriving and useful project. The Apache > >>>>> Software Foundation provides the best tools and community to support > >>>>> this vision. > >>>>> > >>>>> == Initial Goals == > >>>>> Initial goals will be to move the existing codebase to Apache and > >>>>> integrate with the Apache development process. We further plan to > >>>>> develop our system to meet the needs for more execution scenarios for > >>>>> a more variety of deployment characteristics. > >>>>> > >>>>> == Current Status == > >>>>> Onyx codebase is currently hosted in a repository at github.com. The > >>>>> current version has been developed by system developers at Seoul > >>>>> National University, Viva Republica, Samsung, and LG. > >>>>> > >>>>> == Meritocracy == > >>>>> We plan to strongly support meritocracy. We will discuss the > >>>>> requirements in an open forum, and those that continuously contribute > >>>>> to Onyx with the passion to strengthen the system will be invited as > >>>>> committers. Contributors that enrich Onyx by providing various use > >>>>> cases, various implementations of the configurable components > >>>>> including ideas for optimization techniques will be especially > >>>>> welcome. Committers with a deep understanding of the system’s > >>>>> technical aspects as a whole and its philosophy will definitely be > >>>>> voted as the PMC. We will monitor community participation so that > >>>>> privileges can be extended to those that contribute. > >>>>> > >>>>> == Community == > >>>>> We hope to expand our contribution community by becoming an Apache > >>>>> incubator project. The contributions will come from both users and > >>>>> system developers interested in flexibility and extensibility of job > >>>>> executions that Onyx can support. We expect users to mainly > contribute > >>>>> to diversify the use cases and deployment characteristics, and > >>>>> developers to contribute to implement them. > >>>>> > >>>>> == Alignment == > >>>>> Apache Spark is one of many popular data processing frameworks. The > >>>>> system is designed towards optimizing jobs using RDDs in memory and > >>>>> many other optimizations built tightly within the framework. In > >>>>> contrast to Spark, Onyx aims to provide more flexibility for job > >>>>> execution in an easy manner. > >>>>> > >>>>> Apache Tez enables developers to build complex task DAGs with control > >>>>> over the control plane of job execution. In Onyx, a high-level > >>>>> programming layer (ex. Apache Beam) is automatically converted to a > >>>>> basic IR DAG and can be converted to any IR DAG through a series of > >>>>> easy user writable passes, that can both reshape and modify the > >>>>> annotation (of execution properties) of the DAG. Moreover, Onyx > leaves > >>>>> more parts of the job execution configurable, such as the scheduler > >>>>> and the data plane. As opposed to providing a set of properties for > >>>>> solid optimization, Onyx’s configurable parts can be easily extended > >>>>> and explored by implementing the pre-defined interfaces. For example, > >>>>> an arbitrary intermediate data store can be added. > >>>>> > >>>>> Onyx currently supports Apache Beam programs and we are working on > >>>>> supporting Apache Spark programs as well. Onyx also utilizes Apache > >>>>> REEF for container management, which allows Onyx to run in Apache > YARN > >>>>> and Apache Mesos clusters. If necessary, we plan to contribute to and > >>>>> collaborate with these other Apache projects for the benefit of all. > >>>>> We plan to extend such integrations with more Apache softwares. > Apache > >>>>> software foundation already hosts many major big-data systems, and we > >>>>> expect to help further growth of the big-data community by having > Onyx > >>>>> within the Apache foundation. > >>>>> > >>>>> == Known Risks == > >>>>> === Orphaned Products === > >>>>> The risk of the Onyx project being orphaned is minimal. There is > >>>>> already plenty of work that arduously support different deployment > >>>>> characteristics, and we propose a general way to implement them with > >>>>> flexible and extensible configuration knobs. The domain of data > >>>>> processing is already of high interest, and this domain is expected > to > >>>>> evolve continuously with various other purposes, such as resource > >>>>> disaggregation and using transient resources for better datacenter > >>>>> resource utilization. > >>>>> > >>>>> === Inexperience with Open Source === > >>>>> The initial committers include PMC members and committers of other > >>>>> Apache projects. They have experience with open source projects, > >>>>> starting from their incubation to the top-level. They have been > >>>>> involved in the open source development process, and are familiar > with > >>>>> releasing code under an open source license. > >>>>> > >>>>> === Homogeneous Developers === > >>>>> The initial set of committers is from a limited set of organizations, > >>>>> but we expect to attract new contributors from diverse organizations > >>>>> and will thus grow organically once approved for incubation. Our > prior > >>>>> experience with other open source projects will help various > >>>>> contributors to actively participate in our project. > >>>>> > >>>>> === Reliance on Salaried Developers === > >>>>> Many developers are from Seoul National University. This is not > >>>> applicable. > >>>>> > >>>>> === Relationships with Other Apache Products === > >>>>> Onyx positions itself among multiple Apache products. It runs on > >>>>> Apache REEF for container management. It also utilizes many useful > >>>>> development tools including Apache Maven, Apache Log4J, and multiple > >>>>> Apache Commons components. Onyx supports the Apache Beam programming > >>>>> model for user applications. We are currently working on supporting > >>>>> the Apache Spark programming APIs as well. > >>>>> > >>>>> === An Excessive Fascination with the Apache Brand === > >>>>> We hope to make Onyx a powerful system for data processing, meeting > >>>>> various needs for different deployment characteristics, under a more > >>>>> variety of environments. We see the limitations of simply putting > code > >>>>> on GitHub, and we believe the Apache community will help the growth > of > >>>>> Onyx for the project to become a positively impactful and innovative > >>>>> open source software. We believe Onyx is a great fit for the Apache > >>>>> Software Foundation due to the collaboration it aims to achieve from > >>>>> the big data processing community. > >>>>> > >>>>> == Documentation == > >>>>> The current documentation for Onyx is at > >> https://snuspl.github.io/onyx/. > >>>>> > >>>>> == Initial Source == > >>>>> The Onyx codebase is currently hosted at > >> https://github.com/snuspl/onyx. > >>>>> > >>>>> == External Dependencies == > >>>>> To the best of our knowledge, all Onyx dependencies are distributed > >>>>> under Apache compatible licenses. Upon acceptance to the incubator, > we > >>>>> would begin a thorough analysis of all transitive dependencies to > >>>>> verify this fact and further introduce license checking into the > build > >>>>> and release process. > >>>>> > >>>>> == Cryptography == > >>>>> Not applicable. > >>>>> > >>>>> == Required Resources == > >>>>> === Mailing Lists === > >>>>> We will operate two mailing lists as follows: > >>>>> * Onyx PMC discussions: priv...@onyx.incubator.apache.org > >>>>> * Onyx developers: d...@onyx.incubator.apache.org > >>>>> > >>>>> === Git Repositories === > >>>>> Upon incubation: https://github.com/apache/incubator-onyx. > >>>>> After the incubation, we would like to move the existing repo > >>>>> https://github.com/snuspl/onyx to the Apache infrastructure > >>>>> > >>>>> === Issue Tracking === > >>>>> Onyx currently tracks its issues using the Github issue tracker: > >>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache > >>>>> JIRA. > >>>>> > >>>>> == Initial Committers == > >>>>> * Byung-Gon Chun > >>>>> * Jeongyoon Eo > >>>>> * Geon-Woo Kim > >>>>> * Joo Yeon Kim > >>>>> * Gyewon Lee > >>>>> * Jung-Gil Lee > >>>>> * Sanha Lee > >>>>> * Wooyeon Lee > >>>>> * Yunseong Lee > >>>>> * JangHo Seo > >>>>> * Won Wook Song > >>>>> * Taegeon Um > >>>>> * Youngseok Yang > >>>>> > >>>>> == Affiliations == > >>>>> * SNU (Seoul National University) > >>>>> * Byung-Gon Chun > >>>>> * Jeongyoon Eo > >>>>> * Geon-Woo Kim > >>>>> * Gyewon Lee > >>>>> * Sanha Lee > >>>>> * Wooyeon Lee > >>>>> * Yunseong Lee > >>>>> * JangHo Seo > >>>>> * Won Wook Song > >>>>> * Taegeon Um > >>>>> * Youngseok Yang > >>>>> > >>>>> * LG > >>>>> * Jung-Gil Lee > >>>>> > >>>>> * Samsung > >>>>> * Joo Yeon Kim > >>>>> > >>>>> * Viva Republica > >>>>> * Geon-Woo Kim > >>>>> > >>>>> == Sponsors == > >>>>> === Champions === > >>>>> Byung-Gon Chun > >>>>> > >>>>> === Mentors === > >>>>> * Hyunsik Choi > >>>>> * Byung-Gon Chun > >>>>> * Markus Weimer > >>>>> * Reynold Xin > >>>>> > >>>>> === Sponsoring Entity === > >>>>> The Apache Incubator > >>>>> > >>>>> > >>>>> > >>>> > >>>> -- > >>>> Jean-Baptiste Onofré > >>>> jbono...@apache.org > >>>> http://blog.nanthrax.net > >>>> Talend - http://www.talend.com > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >>>> For additional commands, e-mail: general-h...@incubator.apache.org > >>>> > >>>> > >>> > >>> > >> > >> -- > >> Jean-Baptiste Onofré > >> jbono...@apache.org > >> http://blog.nanthrax.net > >> Talend - http://www.talend.com > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >> For additional commands, e-mail: general-h...@incubator.apache.org > >> > >> > > > > > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > -- Byung-Gon Chun