Re: [DISCUSS] Proposing MXNet for the Apache Incubator

Joe Spisak Fri, 06 Jan 2017 17:26:09 -0800

Awesome!  Please sign me up as a committer - I've been working with Mu on MXNet 
(Amazon) and would love to get more involved with project!


GitHub ID: jspisak 

Sent from Joe's iPhone

On 2017-01-06 08:44 (-0800), Henri Yandell wrote: 
> Understood. I saw that Greg had recently approved another podling to do> 
> this. Though, assuming approved, there will still need to be some infra> 
> headscratching on the 3,000 issues currently on the main dmlc/mxnet repo> 
> and how imports are best done :) The simplest would be to transfer the> 
> current repo as is over at GitHub - not sure if that's been done before.> 
> 
> On Thu, Jan 5, 2017 at 11:32 PM, Henry Saputra > 
> wrote:> 
> 
> > This is great news and I am looking forward to it =)> 
> >> 
> > According to proposal, the community want to stick with Github issues for> 
> > tracking issues and bugs?> 
> > I suppose this needs a nod by Greg Stein as rep from Apache Infra to> 
> > confirm that this is ok for incubation and how would it impact during> 
> > graduation.> 
> >> 
> > - Henry> 
> >> 
> > On Thu, Jan 5, 2017 at 9:12 PM, Henri Yandell  wrote:> 
> >> 
> > > Hello Incubator,> 
> > >> 
> > > I'd like to propose a new incubator Apache MXNet podling.> 
> > >> 
> > > The existing MXNet project (http://mxnet.io - 1.5 years old, 15> 
> > > committers,> 
> > > 200 contributors) is very interested in joining Apache. MXNet is an> 
> > > open-source deep learning framework that allows you to define, train, 
> > > and> 
> > > deploy deep neural networks on a wide array of devices, from cloud> 
> > > infrastructure to mobile devices.> 
> > >> 
> > > The wiki proposal page is located here:> 
> > >> 
> > > https://wiki.apache.org/incubator/MXNetProposal> 
> > >> 
> > > I've included the text below in case anyone wants to focus on parts of 
> > > it> 
> > > in a reply.> 
> > >> 
> > > Looking forward to your thoughts, and for lots of interested Apache> 
> > members> 
> > > to volunteer to mentor the project in addition to Sebastian and myself.> 
> > >> 
> > > Currently the list of committers is based on the current active coders,> 
> > so> 
> > > we're also very interested in hearing from anyone else who is interested> 
> > in> 
> > > working on the project, be they current or future contributor!> 
> > >> 
> > > Thanks,> 
> > >> 
> > > Hen> 
> > > On behalf of the MXNet project> 
> > >> 
> > > ---------> 
> > >> 
> > > = MXNet: Apache Incubator Proposal => 
> > >> 
> > > == Abstract ==> 
> > >> 
> > > MXNet is a Flexible and Efficient Library for Deep Learning> 
> > >> 
> > > == Proposal ==> 
> > >> 
> > > MXNet is an open-source deep learning framework that allows you to> 
> > define,> 
> > > train, and deploy deep neural networks on a wide array of devices, from> 
> > > cloud infrastructure to mobile devices. It is highly scalable, allowing> 
> > for> 
> > > fast model training, and supports a flexible programming model and> 
> > multiple> 
> > > languages. MXNet allows you to mix symbolic and imperative programming> 
> > > flavors to maximize both efficiency and productivity. MXNet is built on 
> > > a> 
> > > dynamic dependency scheduler that automatically parallelizes both> 
> > symbolic> 
> > > and imperative operations on the fly. A graph optimization layer on top> 
> > of> 
> > > that makes symbolic execution fast and memory efficient. The MXNet> 
> > library> 
> > > is portable and lightweight, and it scales to multiple GPUs and multiple> 
> > > machines.> 
> > >> 
> > > == Background ==> 
> > >> 
> > > Deep learning is a subset of Machine learning and refers to a class of> 
> > > algorithms that use a hierarchical approach with non-linearities to> 
> > > discover and learn representations within data. Deep Learning has> 
> > recently> 
> > > become very popular due to its applicability and advancement of domains> 
> > > such as Computer Vision, Speech Recognition, Natural Language> 
> > Understanding> 
> > > and Recommender Systems. With pervasive and cost effective cloud> 
> > computing,> 
> > > large labeled datasets and continued algorithmic innovation, Deep> 
> > Learning> 
> > > has become the one of the most popular classes of algorithms for machine> 
> > > learning practitioners in recent years.> 
> > >> 
> > > == Rational ==> 
> > >> 
> > > The adoption of deep learning is quickly expanding from initial deep> 
> > domain> 
> > > experts rooted in academia to data scientists and developers working to> 
> > > deploy intelligent services and products. Deep learning however has many> 
> > > challenges. These include model training time (which can take days to> 
> > > weeks), programmability (not everyone writes Python or C++ and like> 
> > > symbolic programming) and balancing production readiness (support for> 
> > > things like failover) with development flexibility (ability to program> 
> > > different ways, support for new operators and model types) and speed of> 
> > > execution (fast and scalable model training).  Other frameworks excel on> 
> > > some but not all of these aspects.> 
> > >> 
> > >> 
> > > == Initial Goals ==> 
> > >> 
> > > MXNet is a fairly established project on GitHub with its first code> 
> > > contribution in April 2015 and roughly 200 contributors. It is used by> 
> > > several large companies and some of the top research institutions on the> 
> > > planet. Initial goals would be the following:> 
> > >> 
> > > 1. Move the existing codebase(s) to Apache> 
> > > 1. Integrate with the Apache development process/sign CLAs> 
> > > 1. Ensure all dependencies are compliant with Apache License version 2.0> 
> > > 1. Incremental development and releases per Apache guidelines> 
> > > 1. Establish engineering discipline and a predictable release cadence of> 
> > > high quality releases> 
> > > 1. Expand the community beyond the current base of expert level users> 
> > > 1. Improve usability and the overall developer/user experience> 
> > > 1. Add additional functionality to address newer problem types and> 
> > > algorithms> 
> > >> 
> > >> 
> > > == Current Status ==> 
> > >> 
> > > === Meritocracy ===> 
> > >> 
> > > The MXNet project already operates on meritocratic principles. Today,> 
> > MXNet> 
> > > has developers worldwide and has accepted multiple major patches from a> 
> > > diverse set of contributors within both industry and academia. We would> 
> > > like to follow ASF meritocratic principles to encourage more developers> 
> > to> 
> > > contribute in this project. We know that only active and committed> 
> > > developers from a diverse set of backgrounds can make MXNet a successful> 
> > > project. We are also improving the documentation and code to help new> 
> > > developers get started quickly.> 
> > >> 
> > > === Community ===> 
> > >> 
> > > Acceptance into the Apache foundation would bolster the growing user and> 
> > > developer community around MXNet. That community includes around 200> 
> > > contributors from academia and industry. The core developers of our> 
> > project> 
> > > are listed in our contributors below and are also represented by logos 
> > > on> 
> > > the mxnet.io site including Amazon, Baidu, Carnegie Mellon University,> 
> > > Turi, Intel, NYU, Nvidia, MIT, Microsoft, TuSimple, University of> 
> > Alberta,> 
> > > University of Washington and Wolfram.> 
> > >> 
> > > === Core Developers ===> 
> > >> 
> > > (with GitHub logins)> 
> > >> 
> > > * Tianqi Chen (@tqchen)> 
> > > * Mu Li (@mli)> 
> > > * Junyuan Xie (@piiswrong)> 
> > > * Bing Xu (@antinucleon)> 
> > > * Chiyuan Zhang (@pluskid)> 
> > > * Minjie Wang (@jermainewang)> 
> > > * Naiyan Wang (@winstywang)> 
> > > * Yizhi Liu (@javelinjs)> 
> > > * Tong He (@hetong007)> 
> > > * Qiang Kou (@thirdwing)> 
> > > * Xingjian Shi (@sxjscience)> 
> > >> 
> > > === Alignment ===> 
> > >> 
> > > ASF is already the home of many distributed platforms, e.g., Hadoop,> 
> > Spark> 
> > > and Mahout, each of which targets a different application domain. MXNet,> 
> > > being a distributed platform for large-scale deep learning, focuses on> 
> > > another important domain for which there still lacks a scalable,> 
> > > programmable, flexible and super fast open-source platform. The recent> 
> > > success of deep learning models especially for vision and speech> 
> > > recognition tasks has generated interests in both applying existing deep> 
> > > learning models and in developing new ones. Thus, an open-source 
> > > platform> 
> > > for deep learning backed by some of the top industry and academic 
> > > players> 
> > > will be able to attract a large community of users and developers. MXNet> 
> > is> 
> > > a complex system needing many iterations of design, implementation and> 
> > > testing. Apache's collaboration framework which encourages active> 
> > > contribution from developers will inevitably help improve the quality of> 
> > > the system, as shown in the success of Hadoop, Spark, etc. Equally> 
> > > important is the community of users which helps identify real-life> 
> > > applications of deep learning, and helps to evaluate the system's> 
> > > performance and ease-of-use. We hope to leverage ASF for coordinating 
> > > and> 
> > > promoting both communities, and in return benefit the communities with> 
> > > another useful tool.> 
> > >> 
> > > == Known Risks ==> 
> > >> 
> > > === Orphaned products ===> 
> > >> 
> > > Given the current level of investment in MXNet and the stakeholders 
> > > using> 
> > > it - the risk of the project being abandoned is minimal. Amazon, for> 
> > > example, is in active development to use MXNet in many of its services> 
> > and> 
> > > many large corporations use it in their production applications.> 
> > >> 
> > > === Inexperience with Open Source ===> 
> > >> 
> > > MXNet has existed as a healthy open source project for more than a year.> 
> > > During that time, the project has attracted 200+ contributors.> 
> > >> 
> > > === Homogenous Developers ===> 
> > >> 
> > > The initial list of committers and contributors includes developers from> 
> > > several institutions and industry participants (see above).> 
> > >> 
> > > === Reliance on Salaried Developers ===> 
> > >> 
> > > Like most open source projects, MXNet receives a substantial support 
> > > from> 
> > > salaried developers. A large fraction of MXNet development is supported> 
> > by> 
> > > graduate students at various universities in the course of research> 
> > degrees> 
> > > - this is more a %u201Cvolunteer%u201D relationship, since in most cases 
> > > students> 
> > > contribute vastly more than is necessary to immediately support 
> > > research.> 
> > > In addition, those working from within corporations are devoting> 
> > > significant time and effort in the project - and these come from several> 
> > > organizations.> 
> > >> 
> > > === A Excessive Fascination with the Apache Brand ===> 
> > >> 
> > > We choose Apache not for publicity. We have two purposes. First, we hope> 
> > > that Apache's known best-practices for managing a mature open source> 
> > > project can help guide us. For example, we are feeling the growing pains> 
> > > of a successful open source project as we attempt a major refactor of 
> > > the> 
> > > internals while customers are using the system in production. We seek> 
> > > guidance in communicating breaking API changes and version revisions.> 
> > > Also, as our involvement from major corporations increases, we want to> 
> > > assure our users that MXNet will stay open and not favor any particular> 
> > > platform or environment. These are some examples of the know-how and> 
> > > discipline we're hoping Apache can bring to our project.> 
> > >> 
> > > Second, we want to leverage Apache's reputation to recruit more> 
> > developers> 
> > > to create a diverse community.> 
> > >> 
> > > === Relationship with Other Apache Products ===> 
> > >> 
> > > Apache Mahout and Apache Spark's MLlib are general machine learning> 
> > > systems. Deep learning algorithms can thus be implemented on these two> 
> > > platforms as well. However, in practice, the overlap will be minimal.> 
> > Deep> 
> > > learning is so computationally intensive that it often requires> 
> > specialized> 
> > > GPU hardware to accomplish tasks of meaningful size. Making efficient> 
> > use> 
> > > of GPU hardware is complex because the hardware is so fast that the> 
> > > supporting systems around it must be carefully optimized to keep the GPU> 
> > > cores busy. Extending this capability to distributed multi-GPU and> 
> > > multi-host environments requires great care. This is a critical> 
> > > differentiator between MXNet and existing Apache machine learning> 
> > systems.> 
> > >> 
> > > Mahout and Spark ML-LIB follow models where their nodes run> 
> > synchronously.> 
> > > This is the fundamental difference to MXNet who follows the parameter> 
> > > server framework. MXNet can run synchronously or asynchronously. In> 
> > > addition, MXNet has optimizations for training a wide range of deep> 
> > > learning models using a variety of approaches (e.g., model parallelism> 
> > and> 
> > > data parallelism) which makes MXNet much more efficient (near-linear> 
> > > speedup on state of the art models). MXNet also supports both imperative> 
> > > and symbolic approaches providing ease of programming for deep learning> 
> > > algorithms.> 
> > >> 
> > > Other Apache projects that are potentially complimentary:> 
> > >> 
> > > Apache Arrow - read data in Apache Arrow%u2018s internal format from 
> > > MXNet,> 
> > that> 
> > > would allow users to run ETL/preprocessing in Spark, save the results in> 
> > > Arrow%u2019s format and then run DL algorithms on it.> 
> > >> 
> > > Apache Singa - MXNet and Singa are both deep learning projects, and can> 
> > > benefit from a larger deep learning community at Apache.> 
> > >> 
> > > == Documentation ==> 
> > >> 
> > > Documentation has recently migrated to http://mxnet.io. We continue to> 
> > > refine and improve the documentation.> 
> > >> 
> > > == Initial Source ==> 
> > >> 
> > > We currently use Github to maintain our source code,> 
> > > https://github.com/MXNet> 
> > >> 
> > > == Source and Intellectual Property Submission Plan ==> 
> > >> 
> > > MXNet Code is available under Apache License, Version 2.0. We will work> 
> > > with the committers to get CLAs signed and review previous 
> > > contributions.> 
> > >> 
> > > == External Dependencies ==> 
> > >> 
> > > * required by the core code base: GCC or CLOM, Clang, any BLAS library> 
> > > (ATLAS, OpenBLAS, MKL), dmlc-core, mshadow, ps-lite (which requires> 
> > > lib-zeromq), TBB> 
> > > * required for GPU usage: cudnn, cuda> 
> > > * required for python usage: Python 2/3> 
> > > * required for R module: R, Rcpp (GPLv2 licensing)> 
> > > * optional for image preparation and preprocessing: opencv> 
> > > * optional dependencies for additional features: torch7, numba, cython> 
> > (in> 
> > > NNVM branch)> 
> > >> 
> > > Rcpt and lib-zeromq are expected to be licensing discussions.> 
> > >> 
> > > == Cryptography ==> 
> > >> 
> > > Not Applicable> 
> > >> 
> > > == Required Resources ==> 
> > >> 
> > > === Mailing Lists ===> 
> > >> 
> > > There is currently no mailing list.> 
> > >> 
> > > === Issue Tracking ===> 
> > >> 
> > > Currently uses GitHub to track issues. Would like to continue to do so.> 
> > >> 
> > > == Committers and Affiliations ==> 
> > >> 
> > > * Tianqi Chen (UW)> 
> > > * Mu Li (AWS)> 
> > > * Junyuan Xie (AWS)> 
> > > * Bing Xu (Apple)> 
> > > * Chiyuan Zhang (MIT)> 
> > > * Minjie Wang (UYU)> 
> > > * Naiyan Wang (Tusimple)> 
> > > * Yizhi Liu (Mediav)> 
> > > * Tong He (Simon Fraser University)> 
> > > * Qiang Kou (Indiana U)> 
> > > * Xingjian Shi (HKUST)> 
> > >> 
> > > == Sponsors ==> 
> > >> 
> > > === Champion ===> 
> > >> 
> > > Henri Yandell (bayard at apache.org)> 
> > >> 
> > > === Nominated Mentors ===> 
> > >> 
> > > Sebastian Schelter (s...@apache.org)> 
> > >> 
> > >> 
> > > === Sponsoring Entity ===> 
> > >> 
> > > We are requesting the Incubator to sponsor this project.> 
> > >> 
> >> 
>


Sent from Joe's iPhone
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [DISCUSS] Proposing MXNet for the Apache Incubator

Reply via email to