Hello, I'd like to start a discussion about something that I've noticed being troublesome to maintain in the current version: Backend choices being made at compile time.
Right now, the different backends and accelerators (CPU, cuda, mkl, AWS elastic inference, (future) AMD, openblas,TVM, etc) are all scattered across the different layers of MXNet. On one hand, we have compile time flags that decide which backends are being compiled into the binary, while at the same time choices can be made in the frontend during runtime. At the moment, we have a lot of conditional build logic that picks different parts. With the addition of MKLML and later MKLDNN the clear separation of CPU and GPU got kind of broken up. While we have some places where each code lives, in the end we resort to some files containing a lot of conditional logic for the different backends (sorry I can't provide links right now since I'm on mobile). To me this seems like a residue of the fast development style from the early days (more processor statement and less object orientation) while also having organic growth with new accelerators. When I see how much AMD had to hack to fit in their implementation, it seemed like we have to make this part more developer friendly. At the moment, every new flavour of MXNet has to be entirely recompiled. This makes it hard for users to figure out which options to use, while it makes it harder for us to test since the overhead to test every single combination of compile parameters would be overwhelming. I'd propose to have a clear class hierarchy based structure for accelerators, operators and memory management. This structure can then be implemented by the different backends. To reduce the compile burden, we would introduce dynamic loading and split the different backends into modules. These could then be developed, maintained and compiled on their own and then placed in a "module" folder to be loaded at runtime. Adding a new accelerator would be a matter of placing the precompiled binary into the folder. The detailed configuration of that Backend would then be done on runtime - the user shouldn't worry at the point of downloading mxnet whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or what ever else there is. I have an idea how we could help the user choosing, but that's outside the scope of this proposal. This would allow us to have a "core" MXNet that takes care of the engine, scheduling, communication and all the other crucial parts. On the other hand we could make MXNet less of a monolith and have clear interfaces. This would also act as a forcing function because the different parts wouldn't be intermingled but have to follow the common interface. Of course this comes with the question what these interfaces would look like. For operators, I'd like to propose getting inspiring (or fully adapting) ONNX. For memory management and other Backend specific things we could look at the current implementations and find a common ground. Back when I had a community driven project, we heavily used this modularity and it brought great benefits - besides the fact that our core was closed source. It allowed community developers to act entirely independent from other parts and even allowed them to add their own logic without having to touch the core. Thinking about companies that implement their own backends or have special tweaked operators without wanting to disclose them, this structure would avoid them having to fork the project and then spend a lot of effort porting the changes to the latest source release versions. Instead, they would maintain their module and we as MXNet community would only have to maintain these interfaces. Right now this is a lot of prosa and basically a brain dump of my thoughts. I'd be happy to follow up with details, but first I'd be curious what the community thinks about this design. Best regards, Marco
