Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun Wed, 31 Jan 2018 14:02:51 -0800

Thanks for the information, John!


On Wed, Jan 31, 2018 at 9:50 PM, John D. Ament <johndam...@apache.org>
wrote:

> Sorry for mid-posting.
>
> This isn't the list to determine if a project name is suitable.  There's a
> JIRA project dedicated to that, and if you need a quick answer better to
> email trademarks@ to get a more precise answer.
>
> The question is really going to be, is "Apache Onyx" going to be easily
> confused with something else.
>
> John
>
> On Sun, Jan 28, 2018 at 4:50 AM Byung-Gon Chun <bgc...@gmail.com> wrote:
>
> > Thank you for all the information! It looks like Surf doesn't work.
> >
> > If possible, we'd like to keep Onyx.
> > Another name we came up with is Coral.
> >
> > Thanks!
> > -Gon
> >
> >
> > On Sun, Jan 28, 2018 at 4:21 AM, Leif Hedstrom <zw...@apache.org> wrote:
> >
> > > Did we rule out Onyx for sure? Just because some other project might
> use
> > > it on say github doesn’t necessarily exclude us from having an Apache
> > Onyx?
> > >
> > > FWIW, I agree that surf is too similar in pronunciation to Apache serf.
> > :)
> > >
> > > Cheers,
> > >
> > > — Leif
> > >
> > > > On Jan 27, 2018, at 07:31, Dave Fisher <dave2w...@comcast.net>
> wrote:
> > > >
> > > > Checking “Serf Software” which sounds the same.
> > > >
> > > > (1) there is already Apache Serf
> > > > (2) Serf is a product from Hashicorp at https://www.serf.io/. This
> > > would definitely confuse as it is apparently comparable to ZooKeeper.
> > > >
> > > > Regards,
> > > > Dave
> > > >
> > > > Sent from my iPhone
> > > >
> > > >> On Jan 27, 2018, at 3:12 AM, sebb <seb...@gmail.com> wrote:
> > > >>
> > > >> A brief search for 'Surf Software' shows quite a few hits.
> > > >> I have not looked to see if they would be likely to be confused with
> > > >> this project or cause problems for others.
> > > >>
> > > >> But it as though there might be a problem:
> > > >> Surfer -  Golden Software
> > > >> surf @ sourceforge
> > > >> Surf Software company
> > > >>
> > > >>
> > > >>> On 27 January 2018 at 08:03, Byung-Gon Chun <bgc...@gmail.com>
> > wrote:
> > > >>> Since we cannot use the name Onyx, we would like to change the
> > project
> > > name
> > > >>> to Surf.
> > > >>> I hope that this name works.
> > > >>>
> > > >>> -Gon
> > > >>>
> > > >>> ---
> > > >>> Byung-Gon Chun
> > > >>>
> > > >>>
> > > >>>> On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun <bgc...@gmail.com
> >
> > > wrote:
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <da...@apache.org>
> > > wrote:
> > > >>>>>
> > > >>>>> Great work -- I think this technology has a lot of promise, and
> I'd
> > > love
> > > >>>>> to
> > > >>>>> see its evolution inside the Foundation.
> > > >>>>>
> > > >>>>>
> > > >>>> Thanks, Davor!
> > > >>>>
> > > >>>>
> > > >>>>> Parts of it, like the Onyx Intermediate Representation [1],
> overlap
> > > with
> > > >>>>> the work-in-progress inside the Apache Beam project
> > ("portability").
> > > We'd
> > > >>>>> love to work together on this -- would you be open to such
> > > collaboration?
> > > >>>>> If so, it may not be necessary to start from scratch, and
> leverage
> > > the
> > > >>>>> work
> > > >>>>> already done.
> > > >>>>>
> > > >>>>>
> > > >>>> Sure. We're open to collaboration.
> > > >>>>
> > > >>>>
> > > >>>>> Regarding the name, Onyx would likely have to be renamed, due to
> a
> > > >>>>> conflict
> > > >>>>> with a related technology [2].
> > > >>>>>
> > > >>>>>
> > > >>>> Thanks for pointing it out. It's difficult to come up with a good
> > > short
> > > >>>> name. :)
> > > >>>> Do you have any suggestion?
> > > >>>>
> > > >>>> Thanks!
> > > >>>> -Gon
> > > >>>>
> > > >>>> ---
> > > >>>> Byung-Gon Chun
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>> Davor
> > > >>>>>
> > > >>>>> [1] https://snuspl.github.io/onyx/docs/ir/
> > > >>>>> [2] http://www.onyxplatform.org/
> > > >>>>>
> > > >>>>>> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <
> bgc...@gmail.com
> > >
> > > wrote:
> > > >>>>>>
> > > >>>>>> Dear Apache Incubator Community,
> > > >>>>>>
> > > >>>>>> Please accept the following proposal for presentation and
> > > discussion:
> > > >>>>>> https://wiki.apache.org/incubator/OnyxProposal
> > > >>>>>>
> > > >>>>>> Onyx is a data processing system that aims to flexibly control
> the
> > > >>>>> runtime
> > > >>>>>> behaviors of a job to adapt to varying deployment
> characteristics
> > > (e.g.,
> > > >>>>>> harnessing transient resources in datacenters, cross-datacenter
> > > >>>>> deployment,
> > > >>>>>> changing runtime based on job characteristics, etc.). Onyx
> > provides
> > > >>>>> ways to
> > > >>>>>> extend the system’s capabilities and incorporate the extensions
> to
> > > the
> > > >>>>>> flexible job execution.
> > > >>>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark)
> > > into an
> > > >>>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and
> > > deploys
> > > >>>>>> based on a deployment policy.
> > > >>>>>>
> > > >>>>>> I've attached the proposal below.
> > > >>>>>>
> > > >>>>>> Best regards,
> > > >>>>>> Byung-Gon Chun
> > > >>>>>>
> > > >>>>>> = OnyxProposal =
> > > >>>>>>
> > > >>>>>> == Abstract ==
> > > >>>>>> Onyx is a data processing system for flexible employment with
> > > >>>>>> different execution scenarios for various deployment
> > characteristics
> > > >>>>>> on clusters.
> > > >>>>>>
> > > >>>>>> == Proposal ==
> > > >>>>>> Today, there is a wide variety of data processing systems with
> > > >>>>>> different designs for better performance and datacenter
> > efficiency.
> > > >>>>>> They include processing data on specific resource environments
> and
> > > >>>>>> running jobs with specific attributes. Although each system
> > > >>>>>> successfully solves the problems it targets, most systems are
> > > designed
> > > >>>>>> in the way that runtime behaviors are built tightly inside the
> > > system
> > > >>>>>> core to hide the complexity of distributed computing. This makes
> > it
> > > >>>>>> hard for a single system to support different deployment
> > > >>>>>> characteristics with different runtime behaviors without
> > substantial
> > > >>>>>> effort.
> > > >>>>>>
> > > >>>>>> Onyx is a data processing system that aims to flexibly control
> the
> > > >>>>>> runtime behaviors of a job to adapt to varying deployment
> > > >>>>>> characteristics. Moreover, it provides a means of extending the
> > > >>>>>> system’s capabilities and incorporating the extensions to the
> > > flexible
> > > >>>>>> job execution.
> > > >>>>>>
> > > >>>>>> In order to be able to easily modify runtime behaviors to adapt
> to
> > > >>>>>> varying deployment characteristics, Onyx exposes runtime
> behaviors
> > > to
> > > >>>>>> be flexibly configured and modified at both compile-time and
> > runtime
> > > >>>>>> through a set of high-level graph pass interfaces.
> > > >>>>>>
> > > >>>>>> We hope to contribute to the big data processing community by
> > > enabling
> > > >>>>>> more flexibility and extensibility in job executions.
> Furthermore,
> > > we
> > > >>>>>> can benefit more together as a community when we work together
> as
> > a
> > > >>>>>> community to mature the system with more use cases and
> > understanding
> > > >>>>>> of diverse deployment characteristics. The Apache Software
> > > Foundation
> > > >>>>>> is the perfect place to achieve these aspirations.
> > > >>>>>>
> > > >>>>>> == Background ==
> > > >>>>>> Many data processing systems have distinctive runtime behaviors
> > > >>>>>> optimized and configured for specific deployment characteristics
> > > like
> > > >>>>>> different resource environments and for handling special job
> > > >>>>>> attributes.
> > > >>>>>>
> > > >>>>>> For example, much research have been conducted to overcome the
> > > >>>>>> challenge of running data processing jobs on cheap, unreliable
> > > >>>>>> transient resources. Likewise, techniques for disaggregating
> > > different
> > > >>>>>> types of resources, like memory, CPU and GPU, are being actively
> > > >>>>>> developed to use datacenter resources more efficiently. Many
> > > >>>>>> researchers are also working to run data processing jobs in even
> > > more
> > > >>>>>> diverse environments, such as across distant datacenters.
> > Similarly,
> > > >>>>>> for special job attributes, many works take different
> approaches,
> > > such
> > > >>>>>> as runtime optimization, to solve problems like data skew, and
> to
> > > >>>>>> optimize systems for data processing jobs with small-scale input
> > > data.
> > > >>>>>>
> > > >>>>>> Although each of the systems performs well with the jobs and in
> > the
> > > >>>>>> environments they target, they perform poorly with unconsidered
> > > cases,
> > > >>>>>> and do not consider supporting multiple deployment
> characteristics
> > > on
> > > >>>>>> a single system in their designs.
> > > >>>>>>
> > > >>>>>> For an application writer to optimize an application to perform
> > well
> > > >>>>>> on a certain system engraved with its underlying behaviors, it
> > > >>>>>> requires a deep understanding of the system itself, which is an
> > > >>>>>> overhead that often requires a lot of time and effort. Moreover,
> > > for a
> > > >>>>>> developer to modify such system behaviors, it requires
> > modifications
> > > >>>>>> of the system core, which requires an even deeper understanding
> of
> > > the
> > > >>>>>> system itself.
> > > >>>>>>
> > > >>>>>> With this background, Onyx is designed to represent all of its
> > jobs
> > > as
> > > >>>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler,
> > user
> > > >>>>>> applications from various programming models (ex. Apache Beam)
> are
> > > >>>>>> submitted, transformed to an IR DAG, and optimized/customized
> for
> > > the
> > > >>>>>> deployment characteristics. In the IR DAG optimization phase,
> the
> > > DAG
> > > >>>>>> is modified through a series of compiler “passes” which reshape
> or
> > > >>>>>> annotate the DAG with an expression of the underlying runtime
> > > >>>>>> behaviors. The IR DAG is then submitted as an execution plan for
> > the
> > > >>>>>> Onyx runtime. The runtime includes the unmodified parts of data
> > > >>>>>> processing in the backbone which is transparently integrated
> with
> > > >>>>>> configurable components exposed for further extension.
> > > >>>>>>
> > > >>>>>> == Rationale ==
> > > >>>>>> Onyx’s vision lies in providing means for flexibly supporting a
> > wide
> > > >>>>>> variety of job execution scenarios for users while facilitating
> > > system
> > > >>>>>> developers to extend the execution framework with various
> > > >>>>>> functionalities at the same time. The capabilities of the system
> > can
> > > >>>>>> be extended as it grows to meet a more variety of execution
> > > scenarios.
> > > >>>>>> We require inputs from users and developers from diverse domains
> > in
> > > >>>>>> order to make it a more thriving and useful project. The Apache
> > > >>>>>> Software Foundation provides the best tools and community to
> > support
> > > >>>>>> this vision.
> > > >>>>>>
> > > >>>>>> == Initial Goals ==
> > > >>>>>> Initial goals will be to move the existing codebase to Apache
> and
> > > >>>>>> integrate with the Apache development process. We further plan
> to
> > > >>>>>> develop our system to meet the needs for more execution
> scenarios
> > > for
> > > >>>>>> a more variety of deployment characteristics.
> > > >>>>>>
> > > >>>>>> == Current Status ==
> > > >>>>>> Onyx codebase is currently hosted in a repository at github.com
> .
> > > The
> > > >>>>>> current version has been developed by system developers at Seoul
> > > >>>>>> National University, Viva Republica, Samsung, and LG.
> > > >>>>>>
> > > >>>>>> == Meritocracy ==
> > > >>>>>> We plan to strongly support meritocracy. We will discuss the
> > > >>>>>> requirements in an open forum, and those that continuously
> > > contribute
> > > >>>>>> to Onyx with the passion to strengthen the system will be
> invited
> > as
> > > >>>>>> committers. Contributors that enrich Onyx by providing various
> use
> > > >>>>>> cases, various implementations of the configurable components
> > > >>>>>> including ideas for optimization techniques will be especially
> > > >>>>>> welcome. Committers with a deep understanding of the system’s
> > > >>>>>> technical aspects as a whole and its philosophy will definitely
> be
> > > >>>>>> voted as the PMC. We will monitor community participation so
> that
> > > >>>>>> privileges can be extended to those that contribute.
> > > >>>>>>
> > > >>>>>> == Community ==
> > > >>>>>> We hope to expand our contribution community by becoming an
> Apache
> > > >>>>>> incubator project. The contributions will come from both users
> and
> > > >>>>>> system developers interested in flexibility and extensibility of
> > job
> > > >>>>>> executions that Onyx can support. We expect users to mainly
> > > contribute
> > > >>>>>> to diversify the use cases and deployment characteristics, and
> > > >>>>>> developers to  contribute to implement them.
> > > >>>>>>
> > > >>>>>> == Alignment ==
> > > >>>>>> Apache Spark is one of many popular data processing frameworks.
> > The
> > > >>>>>> system is designed towards optimizing jobs using RDDs in memory
> > and
> > > >>>>>> many other optimizations built tightly within the framework. In
> > > >>>>>> contrast to Spark, Onyx aims to provide more flexibility for job
> > > >>>>>> execution in an easy manner.
> > > >>>>>>
> > > >>>>>> Apache Tez enables developers to build complex task DAGs with
> > > control
> > > >>>>>> over the control plane of job execution. In Onyx, a high-level
> > > >>>>>> programming layer (ex. Apache Beam) is automatically converted
> to
> > a
> > > >>>>>> basic IR DAG and can be converted to any IR DAG through a series
> > of
> > > >>>>>> easy user writable passes, that can both reshape and modify the
> > > >>>>>> annotation (of execution properties) of the DAG. Moreover, Onyx
> > > leaves
> > > >>>>>> more parts of the job execution configurable, such as the
> > scheduler
> > > >>>>>> and the data plane. As opposed to providing a set of properties
> > for
> > > >>>>>> solid optimization, Onyx’s configurable parts can be easily
> > extended
> > > >>>>>> and explored by implementing the pre-defined interfaces. For
> > > example,
> > > >>>>>> an arbitrary intermediate data store can be added.
> > > >>>>>>
> > > >>>>>> Onyx currently supports Apache Beam programs and we are working
> on
> > > >>>>>> supporting Apache Spark programs as well. Onyx also utilizes
> > Apache
> > > >>>>>> REEF for container management, which allows Onyx to run in
> Apache
> > > YARN
> > > >>>>>> and Apache Mesos clusters. If necessary, we plan to contribute
> to
> > > and
> > > >>>>>> collaborate with these other Apache projects for the benefit of
> > all.
> > > >>>>>> We plan to extend such integrations with more Apache softwares.
> > > Apache
> > > >>>>>> software foundation already hosts many major big-data systems,
> and
> > > we
> > > >>>>>> expect to help further growth of the big-data community by
> having
> > > Onyx
> > > >>>>>> within the Apache foundation.
> > > >>>>>>
> > > >>>>>> == Known Risks ==
> > > >>>>>> === Orphaned Products ===
> > > >>>>>> The risk of the Onyx project being orphaned is minimal. There is
> > > >>>>>> already plenty of work that arduously support different
> deployment
> > > >>>>>> characteristics, and we propose a general way to implement them
> > with
> > > >>>>>> flexible and extensible configuration knobs. The domain of data
> > > >>>>>> processing is already of high interest, and this domain is
> > expected
> > > to
> > > >>>>>> evolve continuously with various other purposes, such as
> resource
> > > >>>>>> disaggregation and using transient resources for better
> datacenter
> > > >>>>>> resource utilization.
> > > >>>>>>
> > > >>>>>> === Inexperience with Open Source ===
> > > >>>>>> The initial committers include PMC members and committers of
> other
> > > >>>>>> Apache projects. They have experience with open source projects,
> > > >>>>>> starting from their incubation to the top-level. They have been
> > > >>>>>> involved in the open source development process, and are
> familiar
> > > with
> > > >>>>>> releasing code under an open source license.
> > > >>>>>>
> > > >>>>>> === Homogeneous Developers ===
> > > >>>>>> The initial set of committers is from a limited set of
> > > organizations,
> > > >>>>>> but we expect to attract new contributors from diverse
> > organizations
> > > >>>>>> and will thus grow organically once approved for incubation. Our
> > > prior
> > > >>>>>> experience with other open source projects will help various
> > > >>>>>> contributors to actively participate in our project.
> > > >>>>>>
> > > >>>>>> === Reliance on Salaried Developers ===
> > > >>>>>> Many developers are from Seoul National University. This is not
> > > >>>>> applicable.
> > > >>>>>>
> > > >>>>>> === Relationships with Other Apache Products ===
> > > >>>>>> Onyx positions itself among multiple Apache products. It runs on
> > > >>>>>> Apache REEF for container management. It also utilizes many
> useful
> > > >>>>>> development tools including Apache Maven, Apache Log4J, and
> > multiple
> > > >>>>>> Apache Commons components. Onyx supports the Apache Beam
> > programming
> > > >>>>>> model for user applications. We are currently working on
> > supporting
> > > >>>>>> the Apache Spark programming APIs as well.
> > > >>>>>>
> > > >>>>>> === An Excessive Fascination with the Apache Brand ===
> > > >>>>>> We hope to make Onyx a powerful system for data processing,
> > meeting
> > > >>>>>> various needs for different deployment characteristics, under a
> > more
> > > >>>>>> variety of environments. We see the limitations of simply
> putting
> > > code
> > > >>>>>> on GitHub, and we believe the Apache community will help the
> > growth
> > > of
> > > >>>>>> Onyx for the project to become a positively impactful and
> > innovative
> > > >>>>>> open source software. We believe Onyx is a great fit for the
> > Apache
> > > >>>>>> Software Foundation due to the collaboration it aims to achieve
> > from
> > > >>>>>> the big data processing community.
> > > >>>>>>
> > > >>>>>> == Documentation ==
> > > >>>>>> The current documentation for Onyx is at
> > > https://snuspl.github.io/onyx/
> > > >>>>> .
> > > >>>>>>
> > > >>>>>> == Initial Source ==
> > > >>>>>> The Onyx codebase is currently hosted at
> > > https://github.com/snuspl/onyx
> > > >>>>> .
> > > >>>>>>
> > > >>>>>> == External Dependencies ==
> > > >>>>>> To the best of our knowledge, all Onyx dependencies are
> > distributed
> > > >>>>>> under Apache compatible licenses. Upon acceptance to the
> > incubator,
> > > we
> > > >>>>>> would begin a thorough analysis of all transitive dependencies
> to
> > > >>>>>> verify this fact and further introduce license checking into the
> > > build
> > > >>>>>> and release process.
> > > >>>>>>
> > > >>>>>> == Cryptography ==
> > > >>>>>> Not applicable.
> > > >>>>>>
> > > >>>>>> == Required Resources ==
> > > >>>>>> === Mailing Lists ===
> > > >>>>>> We will operate two mailing lists as follows:
> > > >>>>>>  * Onyx PMC discussions: priv...@onyx.incubator.apache.org
> > > >>>>>>  * Onyx developers: d...@onyx.incubator.apache.org
> > > >>>>>>
> > > >>>>>> === Git Repositories ===
> > > >>>>>> Upon incubation: https://github.com/apache/incubator-onyx.
> > > >>>>>> After the incubation, we would like to move the existing repo
> > > >>>>>> https://github.com/snuspl/onyx to the Apache infrastructure
> > > >>>>>>
> > > >>>>>> === Issue Tracking ===
> > > >>>>>> Onyx currently tracks its issues using the Github issue tracker:
> > > >>>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to
> > Apache
> > > >>>>>> JIRA.
> > > >>>>>>
> > > >>>>>> == Initial Committers ==
> > > >>>>>> * Byung-Gon Chun
> > > >>>>>> * Jeongyoon Eo
> > > >>>>>> * Geon-Woo Kim
> > > >>>>>> * Joo Yeon Kim
> > > >>>>>> * Gyewon Lee
> > > >>>>>> * Jung-Gil Lee
> > > >>>>>> * Sanha Lee
> > > >>>>>> * Wooyeon Lee
> > > >>>>>> * Yunseong Lee
> > > >>>>>> * JangHo Seo
> > > >>>>>> * Won Wook Song
> > > >>>>>> * Taegeon Um
> > > >>>>>> * Youngseok Yang
> > > >>>>>>
> > > >>>>>> == Affiliations ==
> > > >>>>>> * SNU (Seoul National University)
> > > >>>>>>   * Byung-Gon Chun
> > > >>>>>>   * Jeongyoon Eo
> > > >>>>>>   * Geon-Woo Kim
> > > >>>>>>   * Gyewon Lee
> > > >>>>>>   * Sanha Lee
> > > >>>>>>   * Wooyeon Lee
> > > >>>>>>   * Yunseong Lee
> > > >>>>>>   * JangHo Seo
> > > >>>>>>   * Won Wook Song
> > > >>>>>>   * Taegeon Um
> > > >>>>>>   * Youngseok Yang
> > > >>>>>>
> > > >>>>>> * LG
> > > >>>>>>   * Jung-Gil Lee
> > > >>>>>>
> > > >>>>>> * Samsung
> > > >>>>>>   * Joo Yeon Kim
> > > >>>>>>
> > > >>>>>> * Viva Republica
> > > >>>>>>   * Geon-Woo Kim
> > > >>>>>>
> > > >>>>>> == Sponsors ==
> > > >>>>>> === Champions ===
> > > >>>>>> Byung-Gon Chun
> > > >>>>>>
> > > >>>>>> === Mentors ===
> > > >>>>>> * Hyunsik Choi
> > > >>>>>> * Byung-Gon Chun
> > > >>>>>> * Markus Weimer
> > > >>>>>> * Reynold Xin
> > > >>>>>>
> > > >>>>>> === Sponsoring Entity ===
> > > >>>>>> The Apache Incubator
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Byung-Gon Chun
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Byung-Gon Chun
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Byung-Gon Chun
> > > >>
> > > >> ------------------------------------------------------------
> ---------
> > > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > >> For additional commands, e-mail: general-h...@incubator.apache.org
> > > >>
> > > >
> > > >
> > > > ------------------------------------------------------------
> ---------
> > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > > For additional commands, e-mail: general-h...@incubator.apache.org
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> > >
> >
> >
> > --
> > Byung-Gon Chun
> >
>



-- 
Byung-Gon Chun

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Reply via email to