I don't have idea about the following issues:

1) Reducing the abuse of inlined code moving more logic to implementation
files and improve scoping which will also speed up compilation
2) Reduce runtime of some unit tests
3) Improve MXNet startup time

Will be super interested to hear about your ideas :-)


On Thu, Apr 11, 2019 at 12:52 PM Junru Shao <[email protected]> wrote:

> We have a systematic solution to go without ABI headache. I am struggling
> with some errants, and will share our proposal here as soon as I could.
> This will be very interesting topic to discuss. Let's work hard together
> and make it perfect :-)
>
> On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy <
> [email protected]> wrote:
>
>> Thanks Marco for raising this issue. I think we can certainly do some
>> improvements in modularization and build. At the same time Tianqi's
>> point of view is important to consider and on point. I see a high risk
>> of overengineering in such endeavor.
>>
>> I also see increased complexity, difficulty debugging, C++ ABI
>> headaches, API compatibility, crashes inside a binary module, etc.
>> which I don't want to deal with as a developer or even as an MXNet
>> user. Does somebody have answers to these problems?
>>
>> If somebody thinks they have a good solution, by all means propose a
>> design in the wiki, I think we are all open. Personally I see several
>> other lower hanging fruits which need our attention:
>>  * Simplifying our build logic,
>>  * Cuda selection in CMake,
>>  * Reducing the abuse of inlined code moving more logic to
>> implementation files and improve scoping which will also speed up
>> compilation, (some units take more than 5 minutes to build and lots of
>> RAM in a top of the line CPU core)
>>  * Reduce runtime of some unit tests
>> And other  improvements in our codebase that would bring immediate
>> benefits without the risks of overengineering of a plugin system. I
>> also question our bandwidth for such an endeavor.
>>  * Improve MXNet startup time.
>>  * Thread safety
>>
>> I would say, let's apply the KISS principle, let's make the project
>> fast to build, easy to work on, well documented and easy to contribute
>> to before building the next Netscape browser. Otherwise we could save
>> ourselves this exercise and switch to Rust directly.
>>
>> Pedro.
>>
>>
>>
>> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen <[email protected]>
>> wrote:
>> >
>> > Just to clarify. I am not questioning the usefulness of the separation.
>> > Just want to highlight the technical challenges here based on our past
>> > experiences.
>> >
>> > Crossing DLL boundaries in C++ can create quite a lot of problems,
>> > especially some of the dependencies used a different version of the
>> > compiler, follows static packaging or simply because of the dynamic
>> linking
>> > difference in windows. These problems could make this direction move
>> less
>> > appealing compared to focusing effort on other things.
>> >
>> > Technically, as a first step, it is possible to make dependencies change
>> > not change the global header files and via registration so that changing
>> > certain component won't trigger a global recompile in CMake. This is
>> also a
>> > required step toward some modularity.
>> >
>> > For plugins, solutions that use C ABI can be used for certain plugin
>> > modules.
>> >
>> > Some of the discussion has been tied to what the interface should look
>> > like. I think we should use different threads for these and puts in more
>> > thoughts.
>> >
>> > Tianqi
>> >
>> >
>> >
>> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
>> > [email protected]> wrote:
>> >
>> > > I think we can make some incremental progress.  My thoughts were
>> along the
>> > > lines of plugins (thinking about what happens with the VLC project).
>> At
>> > > process launch time we could gather some information about our
>> execution
>> > > environment (either through configuration, or by convention looking
>> at our
>> > > folder structure and libraries available).  We could then later load
>> the
>> > > components we need after understanding if we're using a CUDA backend
>> and
>> > > what operators or subgraph components we would need.  Advantages
>> would be
>> > > that we would move a lot of the current conditional compile logic to
>> > > runtime, and automate a lot of it.  It would also make packaging
>> binaries
>> > > for targeted environments a little easier.  As an example we could
>> compile
>> > > once, then remove CUDA focused libraries for systems that are going
>> to run
>> > > on CPUs.
>> > >
>> > > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen <[email protected]>
>> > > wrote:
>> > >
>> > > > While I personally like the idea. This can be something that is
>> fairly
>> > > > technical challenging and I would caution against this idea vs
>> pushing
>> > > for
>> > > > good features and just allow runtime configuration.
>> > > >
>> > > > The main problem here is due to the C++ ABI. There is no standard
>> c++ ABI
>> > > > across compilers, which means resorting to runtime DLL and dynamic
>> > > loading
>> > > > brings all sorts of technical problems, especially when multiple
>> modules
>> > > > depend on the same third dependency(CUDA runtime).
>> > > > There is no good to go solution can be made here, especially given
>> the
>> > > > explosion of the backend variants and dependencies in C++.
>> > > > A partial solution could be achieved, through the sole use of C ABI.
>> > > > Combing this with code generation can result in some
>> simplifications and
>> > > > enable some runtime loadable module. TVM does this, and perhaps
>> MXNet
>> > > could
>> > > > reuse some of that component for operator libraries. Similarly,
>> having a
>> > > > customizable operator library that is loadable via C ABI might be
>> > > possible.
>> > > >
>> > > > So to summarize, while I really like the idea of dynamically
>> loadable
>> > > > modules. My past experience suggests that this will bring a lot of
>> > > > additional engineering burden and technical debts without
>> significant
>> > > > benefit. I would suggest starting by supporting something simple
>> like a
>> > > > plugin module, before moving toward the general direction.
>> > > >
>> > > > Tianqi
>> > > >
>> > > > On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
>> > > > [email protected]> wrote:
>> > > >
>> > > > > Strongly support the idea of runtime loadable components in MXNet.
>> > > > There's
>> > > > > no reason (other than perhaps engineering effort) we can't have a
>> > > single
>> > > > > compilation of MXNet that finds dependencies and chooses execution
>> > > paths
>> > > > > intelligently (or based on configuration) at runtime.
>> > > > >
>> > > > > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <
>> [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > Hello,
>> > > > > >
>> > > > > > I'd like to start a discussion about something that I've noticed
>> > > being
>> > > > > > troublesome to maintain in the current version: Backend choices
>> being
>> > > > > made
>> > > > > > at compile time.
>> > > > > >
>> > > > > > Right now, the different backends and accelerators (CPU, cuda,
>> mkl,
>> > > AWS
>> > > > > > elastic inference, (future) AMD, openblas,TVM, etc) are all
>> scattered
>> > > > > > across the different layers of MXNet. On one hand, we have
>> compile
>> > > time
>> > > > > > flags that decide which backends are being compiled into the
>> binary,
>> > > > > while
>> > > > > > at the same time choices can be made in the frontend during
>> runtime.
>> > > > > >
>> > > > > > At the moment, we have a lot of conditional build logic that
>> picks
>> > > > > > different parts. With the addition of MKLML and later MKLDNN the
>> > > clear
>> > > > > > separation of CPU and GPU got kind of broken up. While we have
>> some
>> > > > > places
>> > > > > > where each code lives, in the end we resort to some files
>> containing
>> > > a
>> > > > > lot
>> > > > > > of conditional logic for the different backends (sorry I can't
>> > > provide
>> > > > > > links right now since I'm on mobile). To me this seems like a
>> residue
>> > > > of
>> > > > > > the fast development style from the early days (more processor
>> > > > statement
>> > > > > > and less object orientation) while also having organic growth
>> with
>> > > new
>> > > > > > accelerators. When I see how much AMD had to hack to fit in
>> their
>> > > > > > implementation, it seemed like we have to make this part more
>> > > developer
>> > > > > > friendly.
>> > > > > >
>> > > > > > At the moment, every new flavour of MXNet has to be entirely
>> > > > recompiled.
>> > > > > > This makes it hard for users to figure out which options to use,
>> > > while
>> > > > it
>> > > > > > makes it harder for us to test since the overhead to test every
>> > > single
>> > > > > > combination of compile parameters would be overwhelming.
>> > > > > >
>> > > > > > I'd propose to have a clear class hierarchy based structure for
>> > > > > > accelerators, operators and memory management. This structure
>> can
>> > > then
>> > > > be
>> > > > > > implemented by the different backends. To reduce the compile
>> burden,
>> > > we
>> > > > > > would introduce dynamic loading and split the different
>> backends into
>> > > > > > modules. These could then be developed, maintained and compiled
>> on
>> > > > their
>> > > > > > own and then placed in a "module" folder to be loaded at
>> runtime.
>> > > > Adding
>> > > > > a
>> > > > > > new accelerator would be a matter of placing the precompiled
>> binary
>> > > > into
>> > > > > > the folder. The detailed configuration of that Backend would
>> then be
>> > > > done
>> > > > > > on runtime - the user shouldn't worry at the point of
>> downloading
>> > > mxnet
>> > > > > > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda
>> or
>> > > what
>> > > > > ever
>> > > > > > else there is. I have an idea how we could help the user
>> choosing,
>> > > but
>> > > > > > that's outside the scope of this proposal.
>> > > > > >
>> > > > > > This would allow us to have a "core" MXNet that takes care of
>> the
>> > > > engine,
>> > > > > > scheduling, communication and all the other crucial parts. On
>> the
>> > > other
>> > > > > > hand we could make MXNet less of a monolith and have clear
>> > > interfaces.
>> > > > > This
>> > > > > > would also act as a forcing function because the different parts
>> > > > wouldn't
>> > > > > > be intermingled but have to follow the common interface.
>> > > > > >
>> > > > > > Of course this comes with the question what these interfaces
>> would
>> > > look
>> > > > > > like. For operators, I'd like to propose getting inspiring (or
>> fully
>> > > > > > adapting) ONNX. For memory management and other Backend specific
>> > > things
>> > > > > we
>> > > > > > could look at the current implementations and find a common
>> ground.
>> > > > > >
>> > > > > > Back when I had a community driven project, we heavily used this
>> > > > > modularity
>> > > > > > and it brought great benefits - besides the fact that our core
>> was
>> > > > closed
>> > > > > > source. It allowed community developers to act entirely
>> independent
>> > > > from
>> > > > > > other parts and even allowed them to add their own logic without
>> > > having
>> > > > > to
>> > > > > > touch the core. Thinking about companies that implement their
>> own
>> > > > > backends
>> > > > > > or have special tweaked operators without wanting to disclose
>> them,
>> > > > this
>> > > > > > structure would avoid them having to fork the project and then
>> spend
>> > > a
>> > > > > lot
>> > > > > > of effort porting the changes to the latest source release
>> versions.
>> > > > > > Instead, they would maintain their module and we as MXNet
>> community
>> > > > would
>> > > > > > only have to maintain these interfaces.
>> > > > > >
>> > > > > > Right now this is a lot of prosa and basically a brain dump of
>> my
>> > > > > thoughts.
>> > > > > > I'd be happy to follow up with details, but first I'd be
>> curious what
>> > > > the
>> > > > > > community thinks about this design.
>> > > > > >
>> > > > > > Best regards,
>> > > > > > Marco
>> > > > > >
>> > > > >
>> > > >
>> > >
>>
>

Reply via email to