The key complain here is mainly about the clarity of the documents
themselves. Maybe it is time to focus on a single flavor of API that is
useful(Gluon) and highlight all the docs around that

Tianqi


On Wed, Sep 19, 2018 at 11:04 AM Qing Lan <[email protected]> wrote:

> Hi all,
>
> There was a trend topic<https://www.zhihu.com/question/293996867> in
> Zhihu (a famous Chinese Stackoverflow+Quora) asking about the status of
> MXNet in 2018 recently. Mu replied the thread and obtained more than 300+
> `like`.
> However there are a few concerns addressed in the comments of this thread,
> I have done some simple translation from Chinese to English:
>
> 1. Documentations! Until now, the online doc still contains:
>                 1. Depreciated but not updated doc
>                 2. Wrong documentation with poor description
>                 3. Document in Alpha stage such as you must install `pip
> –pre` in order to run.
>
> 2. Examples! For Gluon specifically, many examples are still mixing
> Gluon/MXNet apis. The mixure of mx.sym, mx.nd mx.gluon confused the users
> of what is the right one to choose in order to get their model to work. As
> an example, Although Gluon made data encapsulation possible, still there
> are examples using mxn.io.ImageRecordIter with tens of params (feels like
> gluon examples are simply the copy from old Python examples).
>
> 3. Examples again! Comparing to PyTorch, there are a few examples I don't
> like in Gluon:
>                 1. Available to run however the code structure is still
> very complicated. Such as example/image-classification/cifar10.py. It
> seemed like a consecutive code concatenation. In fact, these are just a
> series of layers mixed with model.fit. It makes user very hard to
> modify/extend the model.
>                 2. Only available to run with certain settings. If users
> try to change a little bit in the model, crashes will happen. For example,
> the multi-gpu example in Gluon website, MXNet hide the logic that using
> batch size to change learning rate in a optimizer. A lot of newbies didn't
> know this fact and they would only find that the model stopped converging
> when batch size changed.
>                 3. The worst scenario is the model itself just simply
> didn't work. Maintainers in the MXNet community didn't run the model (even
> no integration test) and merge the code directly. It makes the script not
> able run till somebody raise the issues and fix it.
>
> 4. The Community problem. The core advantage for MXNet is it's scalability
> and efficiency. However, the documentation of some tools are confusing.
> Here are two examples:
>
>                 1. im2rec contains 2 versions, C++ (binary) and python.
> But nobody would thought that the argparse in these tools are different (in
> the meantime, there is no appropriate examples to compare with, users could
> only use them by guessing the usage).
>
>                 2. How to combine MXNet distributed platform with
> supercomputing tool such as Slurm? How do we do profiling and how to debug.
> A couples of companies I knew thought of using MXNet for distributed
> training. Due to lack of examples and poor support from the community, they
> have to change their models into TensorFlow and Horovod.
>
> 5. The heavy code base. Most of the MXNet examples/source
> code/documentation/language binding are in a single repo. A git clone
> operation will cost tens of Mb. The New feature PR would takes longer time
> than expected. The poor reviewing response / rules keeps new contributors
> away from the community. I remember there was a call for
> document-improvement last year. The total timeline cost a user 3 months of
> time to merge into master. It almost equals to a release interval of
> Pytorch.
>
> 6. To Developers. There are very few people in the community discussed the
> improvement we can take to make MXNet more user-friendly. It's been so easy
> to trigger tens of stack issues during coding. Again, is that a requirement
> for MXNet users to be familiar with C++? The connection between Python and
> C lacks a IDE lint (maybe MXNet assume every developers as a VIM master).
> API/underlying implementation chaged frequently. People have to release
> their code with an achieved version of MXNet (such as TuSimple and MSRA).
> Let's take a look at PyTorch, an API used move tensor to device would raise
> a thorough discussion.
>
> There will be more comments translated to English and I will keep this
> thread updated…
> Thanks,
> Qing
>

Reply via email to