Hello,

I'd like to start a discussion about something that I've noticed being
troublesome to maintain in the current version: Backend choices being made
at compile time.

Right now, the different backends and accelerators (CPU, cuda, mkl, AWS
elastic inference, (future) AMD, openblas,TVM, etc) are all scattered
across the different layers of MXNet. On one hand, we have compile time
flags that decide which backends are being compiled into the binary, while
at the same time choices can be made in the frontend during runtime.

At the moment, we have a lot of conditional build logic that picks
different parts. With the addition of MKLML and later MKLDNN the clear
separation of CPU and GPU got kind of broken up. While we have some places
where each code lives, in the end we resort to some files containing a lot
of conditional logic for the different backends (sorry I can't provide
links right now since I'm on mobile). To me this seems like a residue of
the fast development style from the early days (more processor statement
and less object orientation) while also having organic growth with new
accelerators. When I see how much AMD had to hack to fit in their
implementation, it seemed like we have to make this part more developer
friendly.

At the moment, every new flavour of MXNet has to be entirely recompiled.
This makes it hard for users to figure out which options to use, while it
makes it harder for us to test since the overhead to test every single
combination of compile parameters would be overwhelming.

I'd propose to have a clear class hierarchy based structure for
accelerators, operators and memory management. This structure can then be
implemented by the different backends. To reduce the compile burden, we
would introduce dynamic loading and split the different backends into
modules. These could then be developed, maintained and compiled on their
own and then placed in a "module" folder to be loaded at runtime. Adding a
new accelerator would be a matter of placing the precompiled binary into
the folder. The detailed configuration of that Backend would then be done
on runtime - the user shouldn't worry at the point of downloading mxnet
whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or what ever
else there is. I have an idea how we could help the user choosing, but
that's outside the scope of this proposal.

This would allow us to have a "core" MXNet that takes care of the engine,
scheduling, communication and all the other crucial parts. On the other
hand we could make MXNet less of a monolith and have clear interfaces. This
would also act as a forcing function because the different parts wouldn't
be intermingled but have to follow the common interface.

Of course this comes with the question what these interfaces would look
like. For operators, I'd like to propose getting inspiring (or fully
adapting) ONNX. For memory management and other Backend specific things we
could look at the current implementations and find a common ground.

Back when I had a community driven project, we heavily used this modularity
and it brought great benefits - besides the fact that our core was closed
source. It allowed community developers to act entirely independent from
other parts and even allowed them to add their own logic without having to
touch the core. Thinking about companies that implement their own backends
or have special tweaked operators without wanting to disclose them, this
structure would avoid them having to fork the project and then spend a lot
of effort porting the changes to the latest source release versions.
Instead, they would maintain their module and we as MXNet community would
only have to maintain these interfaces.

Right now this is a lot of prosa and basically a brain dump of my thoughts.
I'd be happy to follow up with details, but first I'd be curious what the
community thinks about this design.

Best regards,
Marco

Reply via email to