+1 to make MKL-DNN default. I'm tracking https://github.com/apache/incubator-mxnet/issues/13369 as open issue to be addressed for 1.4.0 I do agree that we should move to a model to include released dependencies instead of just taking bleeding edge snapshots. However, speed of development is important as well. As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN development team provide us with a well tested tag/commit id to include in 1.4.0 release? Steffen
On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <[email protected]> wrote: > Thanks for the information, Kellen and Naveen. > > Better than onnx-tensorrt, MKL-DNN has already provided versioning and > release tags. My concern is that as MKL-DNN is still under intensive > development, if it has a new feature or bug fix on its master branch, do we > really want to wait for next release to get it supported in MXNet? > > Take the LSTM regression as an example, probably MKL-DNN will give a fix > or improvement on its master branch soon, do we need to wait for 0.18 > release to get it fixed for mxnet user? AFAIK, tensorflow is also using > normal commit id, not release, as the dependency for MKL-DNN. > > Regarding the LSTM regression, we are using internal JIRA tickets rather > than github issues to track the defects of MKL-DNN. But I agree with you, > we need update the progress of it in Alex's issue. > > Thanks, > -tao > > -----Original Message----- > From: kellen sunderland [mailto:[email protected]] > Sent: Thursday, November 22, 2018 10:55 AM > To: [email protected] > Subject: Re: Include MKLDNN into default mxnet pip package > > Agree with your point about other repos also not being based on versioning > Tao. I would point out that I've given some that I've worked with similar > feedback: https://github.com/onnx/onnx-tensorrt/issues/68 > > On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy <[email protected]> wrote: > > > Tao, > > > > You are right there are many submodules in 3rd party. We have to start > > somewhere and I believe this one is a good candidate to start with. > > This is not to cater to release of MXNet or to tie them with the > > releases of the submodules but instead to pick only stable releases > > and not to pick up bleeding edge commits from the tip of the master, > > this gives us confidence in the submodule that MXNet users are > > depending on that especially if we make MKLDNN the default. > > > > Good to know it is known already as a regression.Alex has created this > > issue https://github.com/apache/incubator-mxnet/issues/13369, please > > add details and link the corresponding issue in MKLDNN(I couldn't find). > > > > -Naveen > > > > On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <[email protected]> wrote: > > > > > Here are my answers for the questions from Kellen and Naveen about > > > MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN > > > default here. > > > > > > @Kellen, > > > > > > FYI, here is a list for those platforms which are officially > > > supported by MKL-DNN. > > > https://github.com/intel/mkl-dnn#system-requirements > > > > > > Most of computation intensive kernels in MKL-DNN are JITed. So they > > > are supposed to generate code according to the platform during > > > runtime. For non-JIT code in MKL-DNN, same as other code in MXNet, > > > it will generate instructions according to the options/flags of > > > compiler. We can set -DARCH_OPT_FLAGS when build MKL-DNN to avoid > > > optimization for compiling machine. That's exactly what we are doing > for MKL-DNN build in MXNet. > > Even > > > without MKL-DNN, I noticed there were issues about illegal > > > instructions > > of > > > MXNet when users import the pip package on a lower end machine which > > > probably only supports SSE. > > > > > > @Naveen, > > > > > > The LSTM issue has already been identified as a regression from the > > recent > > > version of MKL-DNN. Hopefully it will be fixed soon with a new > > > update of MKL-DNN. > > > > > > MXNet has many submodule dependencies under the 3rd party folder. > > > Seems > > we > > > don't require release versions for most of these dependencies. The > > release > > > period of MKL-DNN and MXNet are not matched very well. I think it > > > would > > be > > > a risk for MXNet release if it hardly depends on the release of a > > > submodule, no need to say depends on the releases of all submodules. > > > > > > -tao > > > > > > -----Original Message----- > > > From: Naveen Swamy [mailto:[email protected]] > > > Sent: Thursday, November 22, 2018 9:08 AM > > > To: [email protected] > > > Cc: [email protected] > > > Subject: Re: Include MKLDNN into default mxnet pip package > > > > > > Hi Alex, > > > > > > Thanks for promptly running the numbers on AMD and reporting here. > > > > > > Can you please update the AMD numbers here for posterity > > > > > https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL > > -DNN+-+Performance+Benchmarking > > > ? > > > > > > are there any outstanding issues when MKLDNN is enabled? from my > > > offline conversation I am briefly aware performance issues with > > > LSTM, is there an GitHub issue for it? > > > > > > MKLDNN is a submodule dependency, are we pulling the latest commit > > > or releases ? If not we should move to releases before we make it a > > default. > > > Ideally we should use platform specific distributions (-dev > > > packages) at least we should rely on well tested releases. > > > > > > > > > Thanks, Naveen > > > > > > On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander > > <[email protected] > > > > > > > wrote: > > > > > > > AMD benchmarks have been published. We are seeing a x15.8 speedup > > > > with > > > > Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a > > > > smaller network (Mobilenet - batch size 32) the speedup is more > > > > significant at x38.7. Let's have a vote to see if the PR to have > > > > MKLDNN enabled by default > > > > (https://github.com/apache/incubator-mxnet/pull/12591) can be > > > > merged before 1.4.0 release. > > > > > > > > On 10/19/18, 9:17 AM, "Pedro Larroy" > > > > <[email protected]> > > > > wrote: > > > > > > > > I did pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen > > > > 1950X and unit > > > > tests are passing. > > > > > > > > Is this build using AVX512? in /proc/cpuinfo I see only "avx" > > flag. > > > > There's no "avx2" like on recent intel cpus. > > > > > > > > Pedro. > > > > > > > > On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko > > > > <[email protected]> > > > > wrote: > > > > > > > > > Awesome collaborative effort across many contributors and > > > companies! > > > > > > > > > > The boost is impressive and for MXNet users to get this > > > > boost "out of the > > > > > box" is a great benefit and makes MXNet an even better choice. > > > > > > > > > > Alex - can you clarify whether there are any down sides with > > > > regards to > > > > > noon AVX-512 architectures, AMD CPUs, etc? Will it > > > > gracefully fallback? > > > > > > > > > > Hagay > > > > > > > > > > > > > > > On Fri, Oct 19, 2018, 15:46 Sergio Fernández > > > > <[email protected]> > > > > wrote: > > > > > > > > > > > If there is no downside on platforms not supporting AVX512 > > > > instructions, > > > > > > then +1 > > > > > > > > > > > > > > > > > > On Wed, Oct 17, 2018, 14:10 Alex Zai <[email protected]> > wrote: > > > > > > > > > > > > > Hey all, > > > > > > > We have been working hard these past few months to > > > > integrate > > > and > > > > > > stabilize > > > > > > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet > > > > and have made > > > > > > > incredible progress. On CPUs with AVX512 instructions > > > > (such as > > > > c5.18x) > > > > > we > > > > > > > have seen performance increase up to 12x and on other > > > > platforms (Macs, > > > > > > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks > > > > can be found > > > > > > here > > > > > > > ( > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650 > > 764 > > > > > > > and https://github.com/apache/incubator-mxnet/pull/12591 > ). > > > > > > > > > > > > > > Currently, using this accelerator requires the developer > > > > to either pip > > > > > > > install the mxnet-mkl version of mxnet or to build it > > > > themselves from > > > > > > > source. Given that we should try to provide the best > > > > performance "out > > > > > of > > > > > > > the box” with mxnet we should include this in the > > > > default > > > build. > > > > The > > > > > > mkldnn > > > > > > > library is included with in the pip package build so it > > > > does > > > not > > > > > require > > > > > > an > > > > > > > external dependency. > > > > > > > > > > > > > > There were concerns that MKLDNN could cause regressions > > > > on certain > > > > > > > platforms (as it did with the tensorflow version a while > > > > back); but we > > > > > > > added a env flag (MXNET_MKLDNN_ENABLED) that allows > > > > users to turn of > > > > > this > > > > > > > feature during runtime. Please bring up any other > > > > concerns you may have > > > > > > and > > > > > > > your thoughts on including this accelerator in the > > > > default > > > build. > > > > > > > > > > > > > > Best, > > > > > > > Alex > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
