Some feedback from MXNet Zhihu topic

Qing Lan Wed, 19 Sep 2018 11:05:24 -0700

Hi all,

There was a trend topic<https://www.zhihu.com/question/293996867> in Zhihu (a 
famous Chinese Stackoverflow+Quora) asking about the status of MXNet in 2018 
recently. Mu replied the thread and obtained more than 300+ `like`.
However there are a few concerns addressed in the comments of this thread, I 
have done some simple translation from Chinese to English:


1. Documentations! Until now, the online doc still contains:
                1. Depreciated but not updated doc
                2. Wrong documentation with poor description
                3. Document in Alpha stage such as you must install `pip –pre` 
in order to run.

2. Examples! For Gluon specifically, many examples are still mixing Gluon/MXNet 
apis. The mixure of mx.sym, mx.nd mx.gluon confused the users of what is the 
right one to choose in order to get their model to work. As an example, 
Although Gluon made data encapsulation possible, still there are examples using 
mxn.io.ImageRecordIter with tens of params (feels like gluon examples are 
simply the copy from old Python examples).

3. Examples again! Comparing to PyTorch, there are a few examples I don't like 
in Gluon:
                1. Available to run however the code structure is still very 
complicated. Such as example/image-classification/cifar10.py. It seemed like a 
consecutive code concatenation. In fact, these are just a series of layers 
mixed with model.fit. It makes user very hard to modify/extend the model.
                2. Only available to run with certain settings. If users try to 
change a little bit in the model, crashes will happen. For example, the 
multi-gpu example in Gluon website, MXNet hide the logic that using batch size 
to change learning rate in a optimizer. A lot of newbies didn't know this fact 
and they would only find that the model stopped converging when batch size 
changed.
                3. The worst scenario is the model itself just simply didn't 
work. Maintainers in the MXNet community didn't run the model (even no 
integration test) and merge the code directly. It makes the script not able run 
till somebody raise the issues and fix it.

4. The Community problem. The core advantage for MXNet is it's scalability and 
efficiency. However, the documentation of some tools are confusing. Here are 
two examples:

                1. im2rec contains 2 versions, C++ (binary) and python. But 
nobody would thought that the argparse in these tools are different (in the 
meantime, there is no appropriate examples to compare with, users could only 
use them by guessing the usage).

                2. How to combine MXNet distributed platform with 
supercomputing tool such as Slurm? How do we do profiling and how to debug. A 
couples of companies I knew thought of using MXNet for distributed training. 
Due to lack of examples and poor support from the community, they have to 
change their models into TensorFlow and Horovod.

5. The heavy code base. Most of the MXNet examples/source 
code/documentation/language binding are in a single repo. A git clone operation 
will cost tens of Mb. The New feature PR would takes longer time than expected. 
The poor reviewing response / rules keeps new contributors away from the 
community. I remember there was a call for document-improvement last year. The 
total timeline cost a user 3 months of time to merge into master. It almost 
equals to a release interval of Pytorch.

6. To Developers. There are very few people in the community discussed the 
improvement we can take to make MXNet more user-friendly. It's been so easy to 
trigger tens of stack issues during coding. Again, is that a requirement for 
MXNet users to be familiar with C++? The connection between Python and C lacks 
a IDE lint (maybe MXNet assume every developers as a VIM master). 
API/underlying implementation chaged frequently. People have to release their 
code with an achieved version of MXNet (such as TuSimple and MSRA). Let's take 
a look at PyTorch, an API used move tensor to device would raise a thorough 
discussion.

There will be more comments translated to English and I will keep this thread 
updated…
Thanks,
Qing

Some feedback from MXNet Zhihu topic

Reply via email to