Strongly support the idea of runtime loadable components in MXNet. There's no reason (other than perhaps engineering effort) we can't have a single compilation of MXNet that finds dependencies and chooses execution paths intelligently (or based on configuration) at runtime.
On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <[email protected]> wrote: > Hello, > > I'd like to start a discussion about something that I've noticed being > troublesome to maintain in the current version: Backend choices being made > at compile time. > > Right now, the different backends and accelerators (CPU, cuda, mkl, AWS > elastic inference, (future) AMD, openblas,TVM, etc) are all scattered > across the different layers of MXNet. On one hand, we have compile time > flags that decide which backends are being compiled into the binary, while > at the same time choices can be made in the frontend during runtime. > > At the moment, we have a lot of conditional build logic that picks > different parts. With the addition of MKLML and later MKLDNN the clear > separation of CPU and GPU got kind of broken up. While we have some places > where each code lives, in the end we resort to some files containing a lot > of conditional logic for the different backends (sorry I can't provide > links right now since I'm on mobile). To me this seems like a residue of > the fast development style from the early days (more processor statement > and less object orientation) while also having organic growth with new > accelerators. When I see how much AMD had to hack to fit in their > implementation, it seemed like we have to make this part more developer > friendly. > > At the moment, every new flavour of MXNet has to be entirely recompiled. > This makes it hard for users to figure out which options to use, while it > makes it harder for us to test since the overhead to test every single > combination of compile parameters would be overwhelming. > > I'd propose to have a clear class hierarchy based structure for > accelerators, operators and memory management. This structure can then be > implemented by the different backends. To reduce the compile burden, we > would introduce dynamic loading and split the different backends into > modules. These could then be developed, maintained and compiled on their > own and then placed in a "module" folder to be loaded at runtime. Adding a > new accelerator would be a matter of placing the precompiled binary into > the folder. The detailed configuration of that Backend would then be done > on runtime - the user shouldn't worry at the point of downloading mxnet > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or what ever > else there is. I have an idea how we could help the user choosing, but > that's outside the scope of this proposal. > > This would allow us to have a "core" MXNet that takes care of the engine, > scheduling, communication and all the other crucial parts. On the other > hand we could make MXNet less of a monolith and have clear interfaces. This > would also act as a forcing function because the different parts wouldn't > be intermingled but have to follow the common interface. > > Of course this comes with the question what these interfaces would look > like. For operators, I'd like to propose getting inspiring (or fully > adapting) ONNX. For memory management and other Backend specific things we > could look at the current implementations and find a common ground. > > Back when I had a community driven project, we heavily used this modularity > and it brought great benefits - besides the fact that our core was closed > source. It allowed community developers to act entirely independent from > other parts and even allowed them to add their own logic without having to > touch the core. Thinking about companies that implement their own backends > or have special tweaked operators without wanting to disclose them, this > structure would avoid them having to fork the project and then spend a lot > of effort porting the changes to the latest source release versions. > Instead, they would maintain their module and we as MXNet community would > only have to maintain these interfaces. > > Right now this is a lot of prosa and basically a brain dump of my thoughts. > I'd be happy to follow up with details, but first I'd be curious what the > community thinks about this design. > > Best regards, > Marco >
