On Wed, Jan 31, 2018 at 4:04 PM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
> Hi, > > Coral is a good name ! > Thanks! > > Does the code belong to Seoul National University ? In that case, in > addition of > your ICLA, we would need a SGA (it's not blocker for the project > bootstrapping > or code donation, but we, at least, will need it later for graduation). On > the > other hand, if the committers are all part on the university, you can also > sign > a CCLA. > I will figure this out. > > Happy to be mentor on the project if you want me ! ;) > > Thanks! I will add you to the mentor list. -Gon > Thanks, > Regards > JB > > On 01/30/2018 10:17 AM, Byung-Gon Chun wrote: > > Thanks for the comments, JB! > > My replies are inlined below. > > > > On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > > wrote: > > > >> Hi, > >> > >> sorry to be a little bit late on this. > >> > >> It's a very interesting proposal. It sounds pretty close to the > portability > >> layer we want to add in Apache Beam. I would love to see interaction > >> between the > >> two communities. > >> > >> I have two minor questions: > >> > >> 1. about the name: Onyx sounds very generic and the name is used in > other > >> technologies. Maybe another unique name would be more accurate. > >> > > > > We proposed Coral instead. How does this sound? > > > > > >> 2. the Onyx code is on github right now, under the Apache 2.0 license. > >> Does this > >> code has any affiliation with companies ? Meaning that we would need a > SGA > >> for > >> the code donation. > >> > >> It does not. The developers are affiliated with Seoul National > University. > > In this case, do we still need a SGA? > > > > > >> If you need any help for the incubation, I would be more than happy to > >> help ! > >> > >> > > Thanks for the offer. Would you be interested in being a mentor of the > > project? > > > > Thanks. > > -Gon > > > > > > > >> Regards > >> JB > >> > >> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote: > >>> Dear Apache Incubator Community, > >>> > >>> Please accept the following proposal for presentation and discussion: > >>> https://wiki.apache.org/incubator/OnyxProposal > >>> > >>> Onyx is a data processing system that aims to flexibly control the > >> runtime > >>> behaviors of a job to adapt to varying deployment characteristics > (e.g., > >>> harnessing transient resources in datacenters, cross-datacenter > >> deployment, > >>> changing runtime based on job characteristics, etc.). Onyx provides > ways > >> to > >>> extend the system’s capabilities and incorporate the extensions to the > >>> flexible job execution. > >>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into > an > >>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys > >>> based on a deployment policy. > >>> > >>> I've attached the proposal below. > >>> > >>> Best regards, > >>> Byung-Gon Chun > >>> > >>> = OnyxProposal = > >>> > >>> == Abstract == > >>> Onyx is a data processing system for flexible employment with > >>> different execution scenarios for various deployment characteristics > >>> on clusters. > >>> > >>> == Proposal == > >>> Today, there is a wide variety of data processing systems with > >>> different designs for better performance and datacenter efficiency. > >>> They include processing data on specific resource environments and > >>> running jobs with specific attributes. Although each system > >>> successfully solves the problems it targets, most systems are designed > >>> in the way that runtime behaviors are built tightly inside the system > >>> core to hide the complexity of distributed computing. This makes it > >>> hard for a single system to support different deployment > >>> characteristics with different runtime behaviors without substantial > >>> effort. > >>> > >>> Onyx is a data processing system that aims to flexibly control the > >>> runtime behaviors of a job to adapt to varying deployment > >>> characteristics. Moreover, it provides a means of extending the > >>> system’s capabilities and incorporating the extensions to the flexible > >>> job execution. > >>> > >>> In order to be able to easily modify runtime behaviors to adapt to > >>> varying deployment characteristics, Onyx exposes runtime behaviors to > >>> be flexibly configured and modified at both compile-time and runtime > >>> through a set of high-level graph pass interfaces. > >>> > >>> We hope to contribute to the big data processing community by enabling > >>> more flexibility and extensibility in job executions. Furthermore, we > >>> can benefit more together as a community when we work together as a > >>> community to mature the system with more use cases and understanding > >>> of diverse deployment characteristics. The Apache Software Foundation > >>> is the perfect place to achieve these aspirations. > >>> > >>> == Background == > >>> Many data processing systems have distinctive runtime behaviors > >>> optimized and configured for specific deployment characteristics like > >>> different resource environments and for handling special job > >>> attributes. > >>> > >>> For example, much research have been conducted to overcome the > >>> challenge of running data processing jobs on cheap, unreliable > >>> transient resources. Likewise, techniques for disaggregating different > >>> types of resources, like memory, CPU and GPU, are being actively > >>> developed to use datacenter resources more efficiently. Many > >>> researchers are also working to run data processing jobs in even more > >>> diverse environments, such as across distant datacenters. Similarly, > >>> for special job attributes, many works take different approaches, such > >>> as runtime optimization, to solve problems like data skew, and to > >>> optimize systems for data processing jobs with small-scale input data. > >>> > >>> Although each of the systems performs well with the jobs and in the > >>> environments they target, they perform poorly with unconsidered cases, > >>> and do not consider supporting multiple deployment characteristics on > >>> a single system in their designs. > >>> > >>> For an application writer to optimize an application to perform well > >>> on a certain system engraved with its underlying behaviors, it > >>> requires a deep understanding of the system itself, which is an > >>> overhead that often requires a lot of time and effort. Moreover, for a > >>> developer to modify such system behaviors, it requires modifications > >>> of the system core, which requires an even deeper understanding of the > >>> system itself. > >>> > >>> With this background, Onyx is designed to represent all of its jobs as > >>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user > >>> applications from various programming models (ex. Apache Beam) are > >>> submitted, transformed to an IR DAG, and optimized/customized for the > >>> deployment characteristics. In the IR DAG optimization phase, the DAG > >>> is modified through a series of compiler “passes” which reshape or > >>> annotate the DAG with an expression of the underlying runtime > >>> behaviors. The IR DAG is then submitted as an execution plan for the > >>> Onyx runtime. The runtime includes the unmodified parts of data > >>> processing in the backbone which is transparently integrated with > >>> configurable components exposed for further extension. > >>> > >>> == Rationale == > >>> Onyx’s vision lies in providing means for flexibly supporting a wide > >>> variety of job execution scenarios for users while facilitating system > >>> developers to extend the execution framework with various > >>> functionalities at the same time. The capabilities of the system can > >>> be extended as it grows to meet a more variety of execution scenarios. > >>> We require inputs from users and developers from diverse domains in > >>> order to make it a more thriving and useful project. The Apache > >>> Software Foundation provides the best tools and community to support > >>> this vision. > >>> > >>> == Initial Goals == > >>> Initial goals will be to move the existing codebase to Apache and > >>> integrate with the Apache development process. We further plan to > >>> develop our system to meet the needs for more execution scenarios for > >>> a more variety of deployment characteristics. > >>> > >>> == Current Status == > >>> Onyx codebase is currently hosted in a repository at github.com. The > >>> current version has been developed by system developers at Seoul > >>> National University, Viva Republica, Samsung, and LG. > >>> > >>> == Meritocracy == > >>> We plan to strongly support meritocracy. We will discuss the > >>> requirements in an open forum, and those that continuously contribute > >>> to Onyx with the passion to strengthen the system will be invited as > >>> committers. Contributors that enrich Onyx by providing various use > >>> cases, various implementations of the configurable components > >>> including ideas for optimization techniques will be especially > >>> welcome. Committers with a deep understanding of the system’s > >>> technical aspects as a whole and its philosophy will definitely be > >>> voted as the PMC. We will monitor community participation so that > >>> privileges can be extended to those that contribute. > >>> > >>> == Community == > >>> We hope to expand our contribution community by becoming an Apache > >>> incubator project. The contributions will come from both users and > >>> system developers interested in flexibility and extensibility of job > >>> executions that Onyx can support. We expect users to mainly contribute > >>> to diversify the use cases and deployment characteristics, and > >>> developers to contribute to implement them. > >>> > >>> == Alignment == > >>> Apache Spark is one of many popular data processing frameworks. The > >>> system is designed towards optimizing jobs using RDDs in memory and > >>> many other optimizations built tightly within the framework. In > >>> contrast to Spark, Onyx aims to provide more flexibility for job > >>> execution in an easy manner. > >>> > >>> Apache Tez enables developers to build complex task DAGs with control > >>> over the control plane of job execution. In Onyx, a high-level > >>> programming layer (ex. Apache Beam) is automatically converted to a > >>> basic IR DAG and can be converted to any IR DAG through a series of > >>> easy user writable passes, that can both reshape and modify the > >>> annotation (of execution properties) of the DAG. Moreover, Onyx leaves > >>> more parts of the job execution configurable, such as the scheduler > >>> and the data plane. As opposed to providing a set of properties for > >>> solid optimization, Onyx’s configurable parts can be easily extended > >>> and explored by implementing the pre-defined interfaces. For example, > >>> an arbitrary intermediate data store can be added. > >>> > >>> Onyx currently supports Apache Beam programs and we are working on > >>> supporting Apache Spark programs as well. Onyx also utilizes Apache > >>> REEF for container management, which allows Onyx to run in Apache YARN > >>> and Apache Mesos clusters. If necessary, we plan to contribute to and > >>> collaborate with these other Apache projects for the benefit of all. > >>> We plan to extend such integrations with more Apache softwares. Apache > >>> software foundation already hosts many major big-data systems, and we > >>> expect to help further growth of the big-data community by having Onyx > >>> within the Apache foundation. > >>> > >>> == Known Risks == > >>> === Orphaned Products === > >>> The risk of the Onyx project being orphaned is minimal. There is > >>> already plenty of work that arduously support different deployment > >>> characteristics, and we propose a general way to implement them with > >>> flexible and extensible configuration knobs. The domain of data > >>> processing is already of high interest, and this domain is expected to > >>> evolve continuously with various other purposes, such as resource > >>> disaggregation and using transient resources for better datacenter > >>> resource utilization. > >>> > >>> === Inexperience with Open Source === > >>> The initial committers include PMC members and committers of other > >>> Apache projects. They have experience with open source projects, > >>> starting from their incubation to the top-level. They have been > >>> involved in the open source development process, and are familiar with > >>> releasing code under an open source license. > >>> > >>> === Homogeneous Developers === > >>> The initial set of committers is from a limited set of organizations, > >>> but we expect to attract new contributors from diverse organizations > >>> and will thus grow organically once approved for incubation. Our prior > >>> experience with other open source projects will help various > >>> contributors to actively participate in our project. > >>> > >>> === Reliance on Salaried Developers === > >>> Many developers are from Seoul National University. This is not > >> applicable. > >>> > >>> === Relationships with Other Apache Products === > >>> Onyx positions itself among multiple Apache products. It runs on > >>> Apache REEF for container management. It also utilizes many useful > >>> development tools including Apache Maven, Apache Log4J, and multiple > >>> Apache Commons components. Onyx supports the Apache Beam programming > >>> model for user applications. We are currently working on supporting > >>> the Apache Spark programming APIs as well. > >>> > >>> === An Excessive Fascination with the Apache Brand === > >>> We hope to make Onyx a powerful system for data processing, meeting > >>> various needs for different deployment characteristics, under a more > >>> variety of environments. We see the limitations of simply putting code > >>> on GitHub, and we believe the Apache community will help the growth of > >>> Onyx for the project to become a positively impactful and innovative > >>> open source software. We believe Onyx is a great fit for the Apache > >>> Software Foundation due to the collaboration it aims to achieve from > >>> the big data processing community. > >>> > >>> == Documentation == > >>> The current documentation for Onyx is at > https://snuspl.github.io/onyx/. > >>> > >>> == Initial Source == > >>> The Onyx codebase is currently hosted at > https://github.com/snuspl/onyx. > >>> > >>> == External Dependencies == > >>> To the best of our knowledge, all Onyx dependencies are distributed > >>> under Apache compatible licenses. Upon acceptance to the incubator, we > >>> would begin a thorough analysis of all transitive dependencies to > >>> verify this fact and further introduce license checking into the build > >>> and release process. > >>> > >>> == Cryptography == > >>> Not applicable. > >>> > >>> == Required Resources == > >>> === Mailing Lists === > >>> We will operate two mailing lists as follows: > >>> * Onyx PMC discussions: priv...@onyx.incubator.apache.org > >>> * Onyx developers: d...@onyx.incubator.apache.org > >>> > >>> === Git Repositories === > >>> Upon incubation: https://github.com/apache/incubator-onyx. > >>> After the incubation, we would like to move the existing repo > >>> https://github.com/snuspl/onyx to the Apache infrastructure > >>> > >>> === Issue Tracking === > >>> Onyx currently tracks its issues using the Github issue tracker: > >>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache > >>> JIRA. > >>> > >>> == Initial Committers == > >>> * Byung-Gon Chun > >>> * Jeongyoon Eo > >>> * Geon-Woo Kim > >>> * Joo Yeon Kim > >>> * Gyewon Lee > >>> * Jung-Gil Lee > >>> * Sanha Lee > >>> * Wooyeon Lee > >>> * Yunseong Lee > >>> * JangHo Seo > >>> * Won Wook Song > >>> * Taegeon Um > >>> * Youngseok Yang > >>> > >>> == Affiliations == > >>> * SNU (Seoul National University) > >>> * Byung-Gon Chun > >>> * Jeongyoon Eo > >>> * Geon-Woo Kim > >>> * Gyewon Lee > >>> * Sanha Lee > >>> * Wooyeon Lee > >>> * Yunseong Lee > >>> * JangHo Seo > >>> * Won Wook Song > >>> * Taegeon Um > >>> * Youngseok Yang > >>> > >>> * LG > >>> * Jung-Gil Lee > >>> > >>> * Samsung > >>> * Joo Yeon Kim > >>> > >>> * Viva Republica > >>> * Geon-Woo Kim > >>> > >>> == Sponsors == > >>> === Champions === > >>> Byung-Gon Chun > >>> > >>> === Mentors === > >>> * Hyunsik Choi > >>> * Byung-Gon Chun > >>> * Markus Weimer > >>> * Reynold Xin > >>> > >>> === Sponsoring Entity === > >>> The Apache Incubator > >>> > >>> > >>> > >> > >> -- > >> Jean-Baptiste Onofré > >> jbono...@apache.org > >> http://blog.nanthrax.net > >> Talend - http://www.talend.com > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >> For additional commands, e-mail: general-h...@incubator.apache.org > >> > >> > > > > > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > -- Byung-Gon Chun