3x less overhead* On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <[email protected]> wrote:
> someone is doing something unhealthy when they fork, which is causing an > assertion in the openmp library. the same assertion that would fire in mkl, > which is linked to libiomp5 (exact same omp library). this is new behavior > and most likely due to an error or suboptimal approach in the forking logic > in mxnet. > > in order to circumvent the assert, the Ci team is proposing to remove the > library completely which is equivalent to cutting off your leg to make the > pain from stubbing your toe go away. > > we get a lot of performance gain from OMP. is has about a 1/3 less > overhead for entering omp regions and also supports omp regions after a > fork, which libgomp does not. > > in many months, no investigation has occurred as to WHY the assertion is > failing. > > the pr is vetoed until such a time that the actual root cause of the > problem is known. > > > thanks, > > -Chris. > > > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <[email protected]> wrote: > >> Dear MXNet community, >> >> I would like to drive attention to an important issue that is present in >> the MXNet CMake build: usage of bundled llvm OpenMP library. >> >> I have opened a PR to remove it: >> https://github.com/apache/incubator-mxnet/pull/12160 >> >> The issue was closed, but I am strong in my oppinion that it's the right >> thing to do. >> >> *Background* >> If you want to use OpenMP pragmas in your code for parallelization you >> would supply a special flag to the compiler: >> >> - Clang / -fopenmp >> https://openmp.llvm.org/ >> >> - GCC / -fopenmp >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html >> >> - Intel / [Q]openmp >> >> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 >> >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support) >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx >> >> Each of the compilers would enable the '#pragma omp' directive during >> C/C++ >> compilation and arrange for automatic linking of the OpenMP runtime >> library >> supplied by each complier separately. >> >> Thus, to use the advantages of an OpenMP implementation one has to compile >> the code with the corresponding compiler. >> >> Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP >> is >> used ([1] and [2]) to replace the OpenMP library supplied by the compiler. >> >> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel >> Library >> for Deep Neural Networks): >> >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime >> library to work. As different OpenMP runtimes may not be binary compatible >> it's important to ensure that only one OpenMP runtime is used throughout >> the application. Having more than one OpenMP runtime initialized may lead >> to undefined behavior resulting in incorrect results or crashes." [3] >> >> And: >> >> "Using GNU compiler with -fopenmp and -liomp5 options will link the >> application with both Intel and GNU OpenMP runtime libraries. This will >> lead to undefined behavior of the application." [4] >> >> As can be seen from ldd for MXNet: >> >> $ ldd build/tests/mxnet_unit_tests | grep omp >> libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so >> (0x00007f697bc55000) >> libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 >> (0x00007f69660cd000) >> >> *Performance* >> >> The only performance data related to OpenMP in MXNet I was able to find is >> here: >> >> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172 >> >> Which in my understanding is testing imact of different environment >> variables for the same setup (using same bundled OpenMP library). >> >> The libraries may differ in implementation and the Thread Affinity >> Interface [5] may have significant impact on performance. >> >> All compliers support it: >> >> - Clang / KMP_AFFINITY >> >> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp >> >> - GCC / GOMP_CPU_AFFINITY >> >> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html >> >> - Intel / KMP_AFFINITY >> >> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 >> >> - Visual Studio / SetThreadAffinityMask >> >> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask >> >> *Issues* >> >> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1 >> https://github.com/apache/incubator-mxnet/issues/10856 >> >> libomp.so dependency (need REAL fix) >> https://github.com/apache/incubator-mxnet/issues/11417 >> >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL >> https://github.com/apache/incubator-mxnet/issues/8532 >> >> Performance regression when OMP_NUM_THREADS environment variable is not >> set >> https://github.com/apache/incubator-mxnet/issues/9744 >> >> Poor concat CPU performance on CUDA builds >> https://github.com/apache/incubator-mxnet/issues/11905 >> >> I would appreciate hearing your thoughts. >> >> >> Best >> Anton >> >> [1] >> >> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405 >> [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty >> [3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265 >> [4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280 >> [5] https://software.intel.com/en-us/node/522691 >> >
