Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Tianqi Chen Fri, 12 Apr 2019 19:36:53 -0700

+1.

While I like slack, personally,  I don't think we should treat slack as
public-archive. "everything that happens (also) happens in dev@"


Tianqi



On Fri, Apr 12, 2019 at 1:19 AM Marco de Abreu <[email protected]>
wrote:

> I'd prefer if we keep discussions on the dev-list instead of slack - feel
> free to open another thread.
>
> -Marco
>
> Pedro Larroy <[email protected]> schrieb am Fr., 12. Apr. 2019,
> 02:24:
>
> > I will respond in slack, so we don't derail the original thread's
> > topic with my points.
> >
> > Looking forward to your proposal.
> >
> > On Thu, Apr 11, 2019 at 1:00 PM Junru Shao <[email protected]>
> > wrote:
> > >
> > > I don't have idea about the following issues:
> > >
> > > 1) Reducing the abuse of inlined code moving more logic to
> implementation
> > > files and improve scoping which will also speed up compilation
> > > 2) Reduce runtime of some unit tests
> > > 3) Improve MXNet startup time
> > >
> > > Will be super interested to hear about your ideas :-)
> > >
> > >
> > > On Thu, Apr 11, 2019 at 12:52 PM Junru Shao <[email protected]>
> > wrote:
> > >
> > > > We have a systematic solution to go without ABI headache. I am
> > struggling
> > > > with some errants, and will share our proposal here as soon as I
> could.
> > > > This will be very interesting topic to discuss. Let's work hard
> > together
> > > > and make it perfect :-)
> > > >
> > > > On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy <
> > > > [email protected]> wrote:
> > > >
> > > >> Thanks Marco for raising this issue. I think we can certainly do
> some
> > > >> improvements in modularization and build. At the same time Tianqi's
> > > >> point of view is important to consider and on point. I see a high
> risk
> > > >> of overengineering in such endeavor.
> > > >>
> > > >> I also see increased complexity, difficulty debugging, C++ ABI
> > > >> headaches, API compatibility, crashes inside a binary module, etc.
> > > >> which I don't want to deal with as a developer or even as an MXNet
> > > >> user. Does somebody have answers to these problems?
> > > >>
> > > >> If somebody thinks they have a good solution, by all means propose a
> > > >> design in the wiki, I think we are all open. Personally I see
> several
> > > >> other lower hanging fruits which need our attention:
> > > >>  * Simplifying our build logic,
> > > >>  * Cuda selection in CMake,
> > > >>  * Reducing the abuse of inlined code moving more logic to
> > > >> implementation files and improve scoping which will also speed up
> > > >> compilation, (some units take more than 5 minutes to build and lots
> of
> > > >> RAM in a top of the line CPU core)
> > > >>  * Reduce runtime of some unit tests
> > > >> And other  improvements in our codebase that would bring immediate
> > > >> benefits without the risks of overengineering of a plugin system. I
> > > >> also question our bandwidth for such an endeavor.
> > > >>  * Improve MXNet startup time.
> > > >>  * Thread safety
> > > >>
> > > >> I would say, let's apply the KISS principle, let's make the project
> > > >> fast to build, easy to work on, well documented and easy to
> contribute
> > > >> to before building the next Netscape browser. Otherwise we could
> save
> > > >> ourselves this exercise and switch to Rust directly.
> > > >>
> > > >> Pedro.
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen <
> [email protected]>
> > > >> wrote:
> > > >> >
> > > >> > Just to clarify. I am not questioning the usefulness of the
> > separation.
> > > >> > Just want to highlight the technical challenges here based on our
> > past
> > > >> > experiences.
> > > >> >
> > > >> > Crossing DLL boundaries in C++ can create quite a lot of problems,
> > > >> > especially some of the dependencies used a different version of
> the
> > > >> > compiler, follows static packaging or simply because of the
> dynamic
> > > >> linking
> > > >> > difference in windows. These problems could make this direction
> move
> > > >> less
> > > >> > appealing compared to focusing effort on other things.
> > > >> >
> > > >> > Technically, as a first step, it is possible to make dependencies
> > change
> > > >> > not change the global header files and via registration so that
> > changing
> > > >> > certain component won't trigger a global recompile in CMake. This
> is
> > > >> also a
> > > >> > required step toward some modularity.
> > > >> >
> > > >> > For plugins, solutions that use C ABI can be used for certain
> plugin
> > > >> > modules.
> > > >> >
> > > >> > Some of the discussion has been tied to what the interface should
> > look
> > > >> > like. I think we should use different threads for these and puts
> in
> > more
> > > >> > thoughts.
> > > >> >
> > > >> > Tianqi
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> > > >> > [email protected]> wrote:
> > > >> >
> > > >> > > I think we can make some incremental progress.  My thoughts were
> > > >> along the
> > > >> > > lines of plugins (thinking about what happens with the VLC
> > project).
> > > >> At
> > > >> > > process launch time we could gather some information about our
> > > >> execution
> > > >> > > environment (either through configuration, or by convention
> > looking
> > > >> at our
> > > >> > > folder structure and libraries available).  We could then later
> > load
> > > >> the
> > > >> > > components we need after understanding if we're using a CUDA
> > backend
> > > >> and
> > > >> > > what operators or subgraph components we would need.  Advantages
> > > >> would be
> > > >> > > that we would move a lot of the current conditional compile
> logic
> > to
> > > >> > > runtime, and automate a lot of it.  It would also make packaging
> > > >> binaries
> > > >> > > for targeted environments a little easier.  As an example we
> could
> > > >> compile
> > > >> > > once, then remove CUDA focused libraries for systems that are
> > going
> > > >> to run
> > > >> > > on CPUs.
> > > >> > >
> > > >> > > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen <
> > [email protected]>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > While I personally like the idea. This can be something that
> is
> > > >> fairly
> > > >> > > > technical challenging and I would caution against this idea vs
> > > >> pushing
> > > >> > > for
> > > >> > > > good features and just allow runtime configuration.
> > > >> > > >
> > > >> > > > The main problem here is due to the C++ ABI. There is no
> > standard
> > > >> c++ ABI
> > > >> > > > across compilers, which means resorting to runtime DLL and
> > dynamic
> > > >> > > loading
> > > >> > > > brings all sorts of technical problems, especially when
> multiple
> > > >> modules
> > > >> > > > depend on the same third dependency(CUDA runtime).
> > > >> > > > There is no good to go solution can be made here, especially
> > given
> > > >> the
> > > >> > > > explosion of the backend variants and dependencies in C++.
> > > >> > > > A partial solution could be achieved, through the sole use of
> C
> > ABI.
> > > >> > > > Combing this with code generation can result in some
> > > >> simplifications and
> > > >> > > > enable some runtime loadable module. TVM does this, and
> perhaps
> > > >> MXNet
> > > >> > > could
> > > >> > > > reuse some of that component for operator libraries.
> Similarly,
> > > >> having a
> > > >> > > > customizable operator library that is loadable via C ABI might
> > be
> > > >> > > possible.
> > > >> > > >
> > > >> > > > So to summarize, while I really like the idea of dynamically
> > > >> loadable
> > > >> > > > modules. My past experience suggests that this will bring a
> lot
> > of
> > > >> > > > additional engineering burden and technical debts without
> > > >> significant
> > > >> > > > benefit. I would suggest starting by supporting something
> simple
> > > >> like a
> > > >> > > > plugin module, before moving toward the general direction.
> > > >> > > >
> > > >> > > > Tianqi
> > > >> > > >
> > > >> > > > On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
> > > >> > > > [email protected]> wrote:
> > > >> > > >
> > > >> > > > > Strongly support the idea of runtime loadable components in
> > MXNet.
> > > >> > > > There's
> > > >> > > > > no reason (other than perhaps engineering effort) we can't
> > have a
> > > >> > > single
> > > >> > > > > compilation of MXNet that finds dependencies and chooses
> > execution
> > > >> > > paths
> > > >> > > > > intelligently (or based on configuration) at runtime.
> > > >> > > > >
> > > >> > > > > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <
> > > >> [email protected]>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hello,
> > > >> > > > > >
> > > >> > > > > > I'd like to start a discussion about something that I've
> > noticed
> > > >> > > being
> > > >> > > > > > troublesome to maintain in the current version: Backend
> > choices
> > > >> being
> > > >> > > > > made
> > > >> > > > > > at compile time.
> > > >> > > > > >
> > > >> > > > > > Right now, the different backends and accelerators (CPU,
> > cuda,
> > > >> mkl,
> > > >> > > AWS
> > > >> > > > > > elastic inference, (future) AMD, openblas,TVM, etc) are
> all
> > > >> scattered
> > > >> > > > > > across the different layers of MXNet. On one hand, we have
> > > >> compile
> > > >> > > time
> > > >> > > > > > flags that decide which backends are being compiled into
> the
> > > >> binary,
> > > >> > > > > while
> > > >> > > > > > at the same time choices can be made in the frontend
> during
> > > >> runtime.
> > > >> > > > > >
> > > >> > > > > > At the moment, we have a lot of conditional build logic
> that
> > > >> picks
> > > >> > > > > > different parts. With the addition of MKLML and later
> > MKLDNN the
> > > >> > > clear
> > > >> > > > > > separation of CPU and GPU got kind of broken up. While we
> > have
> > > >> some
> > > >> > > > > places
> > > >> > > > > > where each code lives, in the end we resort to some files
> > > >> containing
> > > >> > > a
> > > >> > > > > lot
> > > >> > > > > > of conditional logic for the different backends (sorry I
> > can't
> > > >> > > provide
> > > >> > > > > > links right now since I'm on mobile). To me this seems
> like
> > a
> > > >> residue
> > > >> > > > of
> > > >> > > > > > the fast development style from the early days (more
> > processor
> > > >> > > > statement
> > > >> > > > > > and less object orientation) while also having organic
> > growth
> > > >> with
> > > >> > > new
> > > >> > > > > > accelerators. When I see how much AMD had to hack to fit
> in
> > > >> their
> > > >> > > > > > implementation, it seemed like we have to make this part
> > more
> > > >> > > developer
> > > >> > > > > > friendly.
> > > >> > > > > >
> > > >> > > > > > At the moment, every new flavour of MXNet has to be
> entirely
> > > >> > > > recompiled.
> > > >> > > > > > This makes it hard for users to figure out which options
> to
> > use,
> > > >> > > while
> > > >> > > > it
> > > >> > > > > > makes it harder for us to test since the overhead to test
> > every
> > > >> > > single
> > > >> > > > > > combination of compile parameters would be overwhelming.
> > > >> > > > > >
> > > >> > > > > > I'd propose to have a clear class hierarchy based
> structure
> > for
> > > >> > > > > > accelerators, operators and memory management. This
> > structure
> > > >> can
> > > >> > > then
> > > >> > > > be
> > > >> > > > > > implemented by the different backends. To reduce the
> compile
> > > >> burden,
> > > >> > > we
> > > >> > > > > > would introduce dynamic loading and split the different
> > > >> backends into
> > > >> > > > > > modules. These could then be developed, maintained and
> > compiled
> > > >> on
> > > >> > > > their
> > > >> > > > > > own and then placed in a "module" folder to be loaded at
> > > >> runtime.
> > > >> > > > Adding
> > > >> > > > > a
> > > >> > > > > > new accelerator would be a matter of placing the
> precompiled
> > > >> binary
> > > >> > > > into
> > > >> > > > > > the folder. The detailed configuration of that Backend
> would
> > > >> then be
> > > >> > > > done
> > > >> > > > > > on runtime - the user shouldn't worry at the point of
> > > >> downloading
> > > >> > > mxnet
> > > >> > > > > > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM,
> > cuda
> > > >> or
> > > >> > > what
> > > >> > > > > ever
> > > >> > > > > > else there is. I have an idea how we could help the user
> > > >> choosing,
> > > >> > > but
> > > >> > > > > > that's outside the scope of this proposal.
> > > >> > > > > >
> > > >> > > > > > This would allow us to have a "core" MXNet that takes care
> > of
> > > >> the
> > > >> > > > engine,
> > > >> > > > > > scheduling, communication and all the other crucial parts.
> > On
> > > >> the
> > > >> > > other
> > > >> > > > > > hand we could make MXNet less of a monolith and have clear
> > > >> > > interfaces.
> > > >> > > > > This
> > > >> > > > > > would also act as a forcing function because the different
> > parts
> > > >> > > > wouldn't
> > > >> > > > > > be intermingled but have to follow the common interface.
> > > >> > > > > >
> > > >> > > > > > Of course this comes with the question what these
> interfaces
> > > >> would
> > > >> > > look
> > > >> > > > > > like. For operators, I'd like to propose getting inspiring
> > (or
> > > >> fully
> > > >> > > > > > adapting) ONNX. For memory management and other Backend
> > specific
> > > >> > > things
> > > >> > > > > we
> > > >> > > > > > could look at the current implementations and find a
> common
> > > >> ground.
> > > >> > > > > >
> > > >> > > > > > Back when I had a community driven project, we heavily
> used
> > this
> > > >> > > > > modularity
> > > >> > > > > > and it brought great benefits - besides the fact that our
> > core
> > > >> was
> > > >> > > > closed
> > > >> > > > > > source. It allowed community developers to act entirely
> > > >> independent
> > > >> > > > from
> > > >> > > > > > other parts and even allowed them to add their own logic
> > without
> > > >> > > having
> > > >> > > > > to
> > > >> > > > > > touch the core. Thinking about companies that implement
> > their
> > > >> own
> > > >> > > > > backends
> > > >> > > > > > or have special tweaked operators without wanting to
> > disclose
> > > >> them,
> > > >> > > > this
> > > >> > > > > > structure would avoid them having to fork the project and
> > then
> > > >> spend
> > > >> > > a
> > > >> > > > > lot
> > > >> > > > > > of effort porting the changes to the latest source release
> > > >> versions.
> > > >> > > > > > Instead, they would maintain their module and we as MXNet
> > > >> community
> > > >> > > > would
> > > >> > > > > > only have to maintain these interfaces.
> > > >> > > > > >
> > > >> > > > > > Right now this is a lot of prosa and basically a brain
> dump
> > of
> > > >> my
> > > >> > > > > thoughts.
> > > >> > > > > > I'd be happy to follow up with details, but first I'd be
> > > >> curious what
> > > >> > > > the
> > > >> > > > > > community thinks about this design.
> > > >> > > > > >
> > > >> > > > > > Best regards,
> > > >> > > > > > Marco
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >>
> > > >
> >
>

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Reply via email to