Re: [Proposal] New operator graph for MXNet

Zach Kimberg Wed, 15 May 2019 12:01:38 -0700

I would like to raise another option to get back on the topic of changing
the Operator graph structure. On the page discussing Relay IR [1], it
discusses mainly the difference between a data flow graph like we use now
and A-normal [2] which is used in some functional compilers. Is there a
reason we do not want to use a structure based on Single Static Assignment
Form (wikipedia explanation [3], lecture note explanation [4]). It is used
almost universally in the compiler community including in LLVM (clang),
GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind
it's pervasiveness is that it has proven very effective for analysis and
transformations when dealing with control flow.


One possible concern is that it might make automatic differentiation more
difficult [6]. While it certainly is more complicated than a pure
functional approach, the functional approach requires users to use
functional programming. Especially with the languages we support now, that
doesn't seem like a reasonable assumption. Given that the users are already
introducing the complexity inherent in imperative programming, we have to
deal with the increased complexity regardless. I think it might be easier
to have the tools to deal with that rather than attempting to coerce users
into a different programming paradigm or convert code between paradigms.
Furthermore, this may become more important if users are increasingly
making use of control flow like Junru said.

Zach


[1] - https://docs.tvm.ai/dev/relay_intro.html
[2] - https://en.wikipedia.org/wiki/A-normal_form
[3] - https://en.wikipedia.org/wiki/Static_single_assignment_form
[4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf
[5] -
https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form
[6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2

On Wed, May 15, 2019 at 11:51 AM Naveen Swamy <[email protected]> wrote:

> Being dismissive and condescending has been exactly what is plaguing this
> project.
>
> I agree the last paragraph sounds very condescending and very dismissive
> and it breaks many code of conducts listed.
>
> On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <
> [email protected]>
> wrote:
>
> > Hi Junru,
> >
> > Overall, I appreciate the points you made about the proposal.
> >
> > Having said that, I would like to remind the Apache Code of Conduct :
> > https://www.apache.org/foundation/policies/conduct.
> > "Be empathetic, welcoming, friendly and patient".
> >
> > I find your tone condescending. Clearly you understand what he meant from
> > the context whether you prefer to call IR in compilers or data-flow in
> > distributed systems. You could very well say lets use this terminology to
> > have a common understanding instead of saying go learn the basic
> concepts.
> > Before building a cool brand, its important to build a healthy community.
> >
> > Anirudh
> >
> >
> > On Wed, May 15, 2019 at 12:03 AM Junru Shao <[email protected]>
> > wrote:
> >
> > > Hi Pedro,
> > >
> > > I really appreciate that a diligent and talented engineer eagerly wants
> > to
> > > improve our system, and am very thankful that you have done so much for
> > our
> > > community. However, I do want to mention some points that I believe I
> > > should mention.
> > >
> > > While I agree with Tianqi that every design has its pros and cons, I
> > would
> > > love to emphasize that a *good taste* of system design is to optimize
> the
> > > bottleneck, enhance expressiveness (and usability), i.e. to do what
> needs
> > > doing, rather than *trivial nits* that are irrelevant to either
> > performance
> > > or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> > > unique_ptr, won't affect the overall performance when it comes to deep
> > > learning workload, specially when we have an async scheduler that does
> > good
> > > latency hiding in MXNet - to me, these are not major issues that are
> > worth
> > > re-designing our entire system.
> > >
> > > To benefit users - real-world ML practitioners, the most thing I would
> > love
> > > to mention is that dataflow graph-based representation is increasingly
> > > incapable of modern neural networks, because the increasingly appeared
> > > structures like arbitrary control flow (w/ continue, break, etc),
> > > recursion, type conjunction and disjunction, etc. These issues will be
> > our
> > > priority to address, which is brought by Relay, which addresses all
> these
> > > pain points.
> > >
> > > Another minor thing I would love to humbly mention is that, for sake of
> > our
> > > brand, it is our responsibility to be professional about terminologies
> > when
> > > writing an official proposal on Confluence. As one of the numerous
> > > examples, the title of the proposal really shocks me for a while,
> > something
> > > like "operators graph" blah blah so weird. Educate me if I were wrong,
> > but
> > > compiler community would prefer the term "intermediate representation",
> > and
> > > distributed system community would prefer "dataflow graph". If you
> don't
> > > have knowledge in these fields, a better way for efficient
> communication
> > is
> > > to get yourself first familiarize the most basic concepts and then do
> > > discussion. This is a way to save your own valuable time as well.
> > >
> > > Again, thank you so much for your hard work, and hope that we could
> work
> > > together to win customers in the future :-)
> > >
> > > Thanks,
> > > Junru
> > >
> > >
> > > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen <[email protected]>
> > > wrote:
> > >
> > > > The core part of the proposal is to move the graph to be much more
> > > strongly
> > > > typed template class.
> > > > I think this is mainly a point of engineering taste, and both sides
> > have
> > > > pros and cons, let me list them before I share my thoughts on this
> > issue:
> > > >
> > > > - Typed fields certainly enjoy more compile-time type checking, on
> the
> > > > other hand, it is hard to expose
> > > >    template of explosive possibilities to frontend languages.
> > > > - More type-erased fields provide runtime flexibility to store
> > > polymorphic
> > > > types as well as extensible attributes for graph optimization
> > > >   - It is hard to use a virtual class to expose every possible
> > attribute
> > > > that an operator might have, such as inlining, storage pattern,
> > gradient
> > > > etc..
> > > >   - The nature of supporting a growing set of operator attribute
> > > requires a
> > > > type-erased attrs field.
> > > > - In contrast to your argument(typing is a blocker to features),
> > > > type-erased or typed code can both get to the same feature except,
> > except
> > > > that
> > > >   typed code gets more compile-time errors while type-erased get some
> > of
> > > > them in runtime.
> > > > - Templatized data structures will likely introduce additional metal
> > > > burdens to developers and are not really suitable as a core data
> > > structure
> > > >    - Because they imply an explosive number of possible data
> > structures,
> > > > while the core data structure should be a single one.
> > > >
> > > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If
> > > MXNet
> > > > is a pure C++ project, I might take more of the typed approach.
> > > > However, MXNet itself is a project that takes python/scala/clojure
> and
> > > > other frontend languages.
> > > > The introduction of more typing may not align with the original goal
> as
> > > the
> > > > tradeoffs I listed above.
> > > >
> > > > This proposal is really a drastic change of what NNVM does, as well
> as
> > > the
> > > > optimization passes, and given the scope, in your analogy, "a new
> > vehicle
> > > > to solve all the problems"
> > > > rather than a minor patch. It will take a lot of engineering effort
> to
> > > > bring in new features and adapting the existing ones.
> > > > Because of that, it does merit a discussion about how shall we think
> > > about
> > > > the future MXNet2.0.
> > > >
> > > > Technically Relay is a serious candidate. Of course relay, as well as
> > its
> > > > core, is in C++ but maintains the multi-language first principle,
> that
> > is
> > > > why the example code was in python.
> > > > See more related discussion comparing NNVMv1 and relay:
> > > > https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
> > > >
> > > > I think the ideal graph data structure candidate for MXNet2.0 should
> > have
> > > > natural support for:
> > > > - Native support of function, module, and recursions
> > > > - Control flows
> > > > - The ability of interpolation with multi-language frontend, e.g.
> being
> > > > able to prototype graph optimizations in python/scala/clojure if
> > needed.
> > > >
> > > > Adding these support needs significant engineering effort, and I do
> > hope
> > > we
> > > > only have to do it once. While I don't want to force any conclusion
> > here,
> > > > I do think Relay is one such candidate.
> > > >
> > > > Tianqi
> > > >
> > > >
> > > > On Tue, May 14, 2019 at 5:58 PM Pedro Larroy <
> > > [email protected]
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Tianqi
> > > > >
> > > > > Thanks for the quick response.
> > > > >
> > > > > Could you point to examples where graph.h is being exposed which
> > would
> > > > > not be possible with what I propose? I don't think my proposal is
> > > > > having any impact in language bindings, and the way I describe it
> > > > > doesn't affect having or not having higher language bindings.
> Please
> > > > > elaborate so I can understand your concern.  Maybe code examples
> > where
> > > > > the graph attributes are being changed from Python?  I don't think
> we
> > > > > have this on MXNet. This is such a core foundation for MXNet, that
> I
> > > > > don't think we should compromise on it because other project not
> > > > > directly related to MXNet might want to expose some untyped graph
> and
> > > > > Node attributes.  The current status makes maintaining the code
> very
> > > > > painful and also is preventing desired features such as higher
> order
> > > > > gradients to be developed. I have heard from you many times how
> speed
> > > > > is critical for us to innovate in this quickly changing field.
> > > > >
> > > > > My proposal is limited to the graph and wouldn't change the way
> > > > > operators are registered and arguments are processed for operators
> > for
> > > > > example.
> > > > >
> > > > >
> > > > > Regarding the second point, the documentation about Relay in the
> web
> > > > > which I found for example:
> > > > >
> > > > > https://docs.tvm.ai/dev/relay_add_op.html#
> > > > >
> > > > > Is somebody working on making Imperative::Backward use this API?
> this
> > > > > would be a big change which I'm not aware of. And using an IR is
> of a
> > > > > much bigger scope than the change I'm proposing here for example.
> > > > >
> > > > > I think I'm having difficulty understanding what are the arguments
> > > > > here. I'm saying I need to change one piece of my car and what you
> > are
> > > > > selling me is a new vehicle here?  Or your suggestion that we use
> > > > > Relay for the graph passes in MXNet?
> > > > >
> > > > > I would like to see C++ code examples, Python examples are not
> > > > > sufficient when we talk about the core MXNet.
> > > > >
> > > > > Pedro.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen <
> > [email protected]>
> > > > > wrote:
> > > > > >
> > > > > > Thanks for the proposal. Let me share some of my thoughts:
> > > > > >
> > > > > > Specific comments on the proposal
> > > > > > -----------------------------------------------
> > > > > > The heavy use of generic in the Graph type was a huge departure
> > from
> > > > > > type-erased data structure which was presented in the previous
> > > design.
> > > > > > While we understand the advantage of typed language(more
> > compile-time
> > > > > > checking) and type-erased types(more dynamism) the heavy use of
> > > > > > the template will actually make the project solely C++ focused,
> > > making
> > > > it
> > > > > > hard to expose intermediate(templatized) data structure to
> > > > > > other languages like python/scala/clojure.
> > > > > >
> > > > > > While I fully understand some of the lessons taught in
> programming
> > > > > > C++(reduce shared_ptr, more typing etc.)
> > > > > > We need to think about the context of MXNet project and **the
> need
> > to
> > > > > > support multi-language as a first-class**.
> > > > > > Some of the type-erased types are design trade-offs made to
> support
> > > > these
> > > > > > features, and we need to think more
> > > > > > carefully instead of just applying "rules for C++" which may
> bring
> > > > > problems.
> > > > > >
> > > > > > Future of NNVM
> > > > > > ----------------------
> > > > > > Given that this thread touched upon what we should do for better
> > > > > > computational graph handling. I would recommend also to take a
> look
> > > at
> > > > > > NNVMv2 -- relay.
> > > > > >
> > > > > > Relay addresses many of the wish-lists in the proposal already,
> > such
> > > as
> > > > > > operator fusion, high order gradient, offload to hardware,
> isolated
> > > > > > compilation, deployment on edge and accelerators etc.
> > > > > > Relay also address problems not yet being mentioned in the
> > proposal,
> > > > > > including control flow and dynamic runtime, automatic layout
> > > > optimization
> > > > > > etc.
> > > > > >
> > > > > > Tianqi
> > > > > >
> > > > > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > Hi Pedro,
> > > > > > >
> > > > > > > Thanks for taking the inititaive. Skimming through the design
> > doc,
> > > I
> > > > > > > didn't see comparison with existing solutions such as relay in
> > tvm,
> > > > > which
> > > > > > > is already a dependency of mxnet already. Could you elaborate
> on
> > > > > comparison
> > > > > > > with existing solutions in the design doc too?
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > > On 2019/05/14 23:49:30, Pedro Larroy <
> > [email protected]
> > > >
> > > > > > > wrote:
> > > > > > > > Hi dev@
> > > > > > > >
> > > > > > > > As a result of my deep dives on the graph machinery I have
> > > created
> > > > a
> > > > > > > > new proposal to improve the operator graph in MXNet.
> > > > > > > >
> > > > > > > > This would mean superseding the use of NNVM Graph in MXNet
> and
> > > > having
> > > > > > > > a new implementation that we can use to simplify a lot of
> code
> > > and
> > > > do
> > > > > > > > powerful graph manipulation and passes such as operator
> fusion
> > > and
> > > > > > > > other optimizations.
> > > > > > > >
> > > > > > > > As it would be a change with big impact and ramifications,
> your
> > > > > > > > thoughts and feedback on the document would be highly
> > appreciated
> > > > so
> > > > > > > > we can take potential future interesting use cases:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> > > > > > > >
> > > > > > > > Pedro.
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [Proposal] New operator graph for MXNet

Reply via email to