Hi Henri, I am Larry Tang, working with Minjie Wang (@jermainewang) on imperative programming part of MXNet. Please add me to the list of committers for MXNet project. I will work intensively on merging a NumPy interface into MXNet as its imperative subsystem in the next few months.
My GitHub ID is: lryta Affiliation: University of Michigan. Best, Larry On 2017-01-06 00:12 (-0500), Henri Yandell <b...@apache.org> wrote: > Hello Incubator,> > > I'd like to propose a new incubator Apache MXNet podling.> > > The existing MXNet project (http://mxnet.io - 1.5 years old, 15 committers,> > 200 contributors) is very interested in joining Apache. MXNet is an> > open-source deep learning framework that allows you to define, train, and> > deploy deep neural networks on a wide array of devices, from cloud> > infrastructure to mobile devices.> > > The wiki proposal page is located here:> > > https://wiki.apache.org/incubator/MXNetProposal> > > I've included the text below in case anyone wants to focus on parts of it> > in a reply.> > > Looking forward to your thoughts, and for lots of interested Apache members> > to volunteer to mentor the project in addition to Sebastian and myself.> > > Currently the list of committers is based on the current active coders, so> > we're also very interested in hearing from anyone else who is interested in> > working on the project, be they current or future contributor!> > > Thanks,> > > Hen> > On behalf of the MXNet project> > > ---------> > > = MXNet: Apache Incubator Proposal => > > == Abstract ==> > > MXNet is a Flexible and Efficient Library for Deep Learning> > > == Proposal ==> > > MXNet is an open-source deep learning framework that allows you to define,> > train, and deploy deep neural networks on a wide array of devices, from> > cloud infrastructure to mobile devices. It is highly scalable, allowing for> > fast model training, and supports a flexible programming model and multiple> > languages. MXNet allows you to mix symbolic and imperative programming> > flavors to maximize both efficiency and productivity. MXNet is built on a> > dynamic dependency scheduler that automatically parallelizes both symbolic> > and imperative operations on the fly. A graph optimization layer on top of> > that makes symbolic execution fast and memory efficient. The MXNet library> > is portable and lightweight, and it scales to multiple GPUs and multiple> > machines.> > > == Background ==> > > Deep learning is a subset of Machine learning and refers to a class of> > algorithms that use a hierarchical approach with non-linearities to> > discover and learn representations within data. Deep Learning has recently> > become very popular due to its applicability and advancement of domains> > such as Computer Vision, Speech Recognition, Natural Language Understanding> > and Recommender Systems. With pervasive and cost effective cloud computing,> > large labeled datasets and continued algorithmic innovation, Deep Learning> > has become the one of the most popular classes of algorithms for machine> > learning practitioners in recent years.> > > == Rational ==> > > The adoption of deep learning is quickly expanding from initial deep domain> > experts rooted in academia to data scientists and developers working to> > deploy intelligent services and products. Deep learning however has many> > challenges. These include model training time (which can take days to> > weeks), programmability (not everyone writes Python or C++ and like> > symbolic programming) and balancing production readiness (support for> > things like failover) with development flexibility (ability to program> > different ways, support for new operators and model types) and speed of> > execution (fast and scalable model training). Other frameworks excel on> > some but not all of these aspects.> > > > == Initial Goals ==> > > MXNet is a fairly established project on GitHub with its first code> > contribution in April 2015 and roughly 200 contributors. It is used by> > several large companies and some of the top research institutions on the> > planet. Initial goals would be the following:> > > 1. Move the existing codebase(s) to Apache> > 1. Integrate with the Apache development process/sign CLAs> > 1. Ensure all dependencies are compliant with Apache License version 2.0> > 1. Incremental development and releases per Apache guidelines> > 1. Establish engineering discipline and a predictable release cadence of> > high quality releases> > 1. Expand the community beyond the current base of expert level users> > 1. Improve usability and the overall developer/user experience> > 1. Add additional functionality to address newer problem types and> > algorithms> > > > == Current Status ==> > > === Meritocracy ===> > > The MXNet project already operates on meritocratic principles. Today, MXNet> > has developers worldwide and has accepted multiple major patches from a> > diverse set of contributors within both industry and academia. We would> > like to follow ASF meritocratic principles to encourage more developers to> > contribute in this project. We know that only active and committed> > developers from a diverse set of backgrounds can make MXNet a successful> > project. We are also improving the documentation and code to help new> > developers get started quickly.> > > === Community ===> > > Acceptance into the Apache foundation would bolster the growing user and> > developer community around MXNet. That community includes around 200> > contributors from academia and industry. The core developers of our project> > are listed in our contributors below and are also represented by logos on> > the mxnet.io site including Amazon, Baidu, Carnegie Mellon University,> > Turi, Intel, NYU, Nvidia, MIT, Microsoft, TuSimple, University of Alberta,> > University of Washington and Wolfram.> > > === Core Developers ===> > > (with GitHub logins)> > > * Tianqi Chen (@tqchen)> > * Mu Li (@mli)> > * Junyuan Xie (@piiswrong)> > * Bing Xu (@antinucleon)> > * Chiyuan Zhang (@pluskid)> > * Minjie Wang (@jermainewang)> > * Naiyan Wang (@winstywang)> > * Yizhi Liu (@javelinjs)> > * Tong He (@hetong007)> > * Qiang Kou (@thirdwing)> > * Xingjian Shi (@sxjscience)> > > === Alignment ===> > > ASF is already the home of many distributed platforms, e.g., Hadoop, Spark> > and Mahout, each of which targets a different application domain. MXNet,> > being a distributed platform for large-scale deep learning, focuses on> > another important domain for which there still lacks a scalable,> > programmable, flexible and super fast open-source platform. The recent> > success of deep learning models especially for vision and speech> > recognition tasks has generated interests in both applying existing deep> > learning models and in developing new ones. Thus, an open-source platform> > for deep learning backed by some of the top industry and academic players> > will be able to attract a large community of users and developers. MXNet is> > a complex system needing many iterations of design, implementation and> > testing. Apache's collaboration framework which encourages active> > contribution from developers will inevitably help improve the quality of> > the system, as shown in the success of Hadoop, Spark, etc. Equally> > important is the community of users which helps identify real-life> > applications of deep learning, and helps to evaluate the system's> > performance and ease-of-use. We hope to leverage ASF for coordinating and> > promoting both communities, and in return benefit the communities with> > another useful tool.> > > == Known Risks ==> > > === Orphaned products ===> > > Given the current level of investment in MXNet and the stakeholders using> > it - the risk of the project being abandoned is minimal. Amazon, for> > example, is in active development to use MXNet in many of its services and> > many large corporations use it in their production applications.> > > === Inexperience with Open Source ===> > > MXNet has existed as a healthy open source project for more than a year.> > During that time, the project has attracted 200+ contributors.> > > === Homogenous Developers ===> > > The initial list of committers and contributors includes developers from> > several institutions and industry participants (see above).> > > === Reliance on Salaried Developers ===> > > Like most open source projects, MXNet receives a substantial support from> > salaried developers. A large fraction of MXNet development is supported by> > graduate students at various universities in the course of research degrees> > - this is more a %u201Cvolunteer%u201D relationship, since in most cases > students> > contribute vastly more than is necessary to immediately support research.> > In addition, those working from within corporations are devoting> > significant time and effort in the project - and these come from several> > organizations.> > > === A Excessive Fascination with the Apache Brand ===> > > We choose Apache not for publicity. We have two purposes. First, we hope> > that Apache's known best-practices for managing a mature open source> > project can help guide us. For example, we are feeling the growing pains> > of a successful open source project as we attempt a major refactor of the> > internals while customers are using the system in production. We seek> > guidance in communicating breaking API changes and version revisions.> > Also, as our involvement from major corporations increases, we want to> > assure our users that MXNet will stay open and not favor any particular> > platform or environment. These are some examples of the know-how and> > discipline we're hoping Apache can bring to our project.> > > Second, we want to leverage Apache's reputation to recruit more developers> > to create a diverse community.> > > === Relationship with Other Apache Products ===> > > Apache Mahout and Apache Spark's MLlib are general machine learning> > systems. Deep learning algorithms can thus be implemented on these two> > platforms as well. However, in practice, the overlap will be minimal. Deep> > learning is so computationally intensive that it often requires specialized> > GPU hardware to accomplish tasks of meaningful size. Making efficient use> > of GPU hardware is complex because the hardware is so fast that the> > supporting systems around it must be carefully optimized to keep the GPU> > cores busy. Extending this capability to distributed multi-GPU and> > multi-host environments requires great care. This is a critical> > differentiator between MXNet and existing Apache machine learning systems.> > > Mahout and Spark ML-LIB follow models where their nodes run synchronously.> > This is the fundamental difference to MXNet who follows the parameter> > server framework. MXNet can run synchronously or asynchronously. In> > addition, MXNet has optimizations for training a wide range of deep> > learning models using a variety of approaches (e.g., model parallelism and> > data parallelism) which makes MXNet much more efficient (near-linear> > speedup on state of the art models). MXNet also supports both imperative> > and symbolic approaches providing ease of programming for deep learning> > algorithms.> > > Other Apache projects that are potentially complimentary:> > > Apache Arrow - read data in Apache Arrow%u2018s internal format from MXNet, > that> > would allow users to run ETL/preprocessing in Spark, save the results in> > Arrow%u2019s format and then run DL algorithms on it.> > > Apache Singa - MXNet and Singa are both deep learning projects, and can> > benefit from a larger deep learning community at Apache.> > > == Documentation ==> > > Documentation has recently migrated to http://mxnet.io. We continue to> > refine and improve the documentation.> > > == Initial Source ==> > > We currently use Github to maintain our source code,> > https://github.com/MXNet> > > == Source and Intellectual Property Submission Plan ==> > > MXNet Code is available under Apache License, Version 2.0. We will work> > with the committers to get CLAs signed and review previous contributions.> > > == External Dependencies ==> > > * required by the core code base: GCC or CLOM, Clang, any BLAS library> > (ATLAS, OpenBLAS, MKL), dmlc-core, mshadow, ps-lite (which requires> > lib-zeromq), TBB> > * required for GPU usage: cudnn, cuda> > * required for python usage: Python 2/3> > * required for R module: R, Rcpp (GPLv2 licensing)> > * optional for image preparation and preprocessing: opencv> > * optional dependencies for additional features: torch7, numba, cython (in> > NNVM branch)> > > Rcpt and lib-zeromq are expected to be licensing discussions.> > > == Cryptography ==> > > Not Applicable> > > == Required Resources ==> > > === Mailing Lists ===> > > There is currently no mailing list.> > > === Issue Tracking ===> > > Currently uses GitHub to track issues. Would like to continue to do so.> > > == Committers and Affiliations ==> > > * Tianqi Chen (UW)> > * Mu Li (AWS)> > * Junyuan Xie (AWS)> > * Bing Xu (Apple)> > * Chiyuan Zhang (MIT)> > * Minjie Wang (UYU)> > * Naiyan Wang (Tusimple)> > * Yizhi Liu (Mediav)> > * Tong He (Simon Fraser University)> > * Qiang Kou (Indiana U)> > * Xingjian Shi (HKUST)> > > == Sponsors ==> > > === Champion ===> > > Henri Yandell (bayard at apache.org)> > > === Nominated Mentors ===> > > Sebastian Schelter (s...@apache.org)> > > > === Sponsoring Entity ===> > > We are requesting the Incubator to sponsor this project.> > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org