I don't have necessary rights to reopen this PR. пн, 20 мая 2019 г. в 08:00, Pedro Larroy <[email protected]>:
> Hi Anton, Stas. > > Can we reopen this PR and get it merged as per the data collected by Stas? > > https://github.com/apache/incubator-mxnet/pull/12160 > > > https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations > > There are multiple issues that will be fixed by solving this problem. > > > Pedro > > On Tue, Feb 12, 2019 at 4:54 AM Anton Chernov <[email protected]> wrote: > > > > I would like to propose a possible alternative solution for > consideration. > > > > If keeping llvm OpenMP as a submodule is inevitable one could make > > following adjustments: > > > > Since compilers try to find their own OpenMP library implicitly, MXNet > > needs to ensure that only the bundled version is found. Therefore during > > the build and also during deployment this library has to provide symlinks > > for each possible compiler that would link to the built artifact ie. > > > > libiomp.so -> libgomp.so -> libomp.so > > > > The MKLML iomp would need to be hidden and removed as well. > > > > On Windows it would be a different story, but as can be seen [1] bundled > > OpenMP was not included in the Windows build anyway. > > > > Alternatively: always use iomp (with same symlinking trick though) > provided > > by MKLML distribution [2]. This potentially could work on Windows as > well. > > > > Best > > Anton > > > > [1] > > > https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410 > > [2] https://github.com/intel/mkl-dnn/releases > > > > вт, 12 февр. 2019 г. в 11:22, Anton Chernov <[email protected]>: > > > > > Recent benchmarking results have been published here [1]. Experiments > > > compare different OpenMP implementations as well as binaries compiled > with > > > different compilers including GCC, Clang and ICC. > > > > > > During experimentation another issues with mixing up libraries was > > > identified and described here [2]. > > > > > > Best > > > Anton > > > > > > [1] https://cwiki.apache.org/confluence/x/2wclBg > > > [2] > > > > https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041 > > > > > > > > > вс, 9 дек. 2018 г. в 16:28, Anton Chernov <[email protected]>: > > > > > >> Hi Chris, > > >> > > >> Following up on the issue, are all things resolved in the discussion? > > >> > > >> If yes, I kindly ask you to reopen this PR and remove ‘requesting > > >> changes’ status: > > >> https://github.com/apache/incubator-mxnet/pull/12160 > > >> > > >> Thank you. > > >> > > >> > > >> Best > > >> Anton > > >> > > >> > > >> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov <[email protected]>: > > >> > > >>> Another thing to take into consideration: > > >>> > > >>> All python artefacts that are created (PyPi) are built with make and > are > > >>> not using the bundled OpenMP library. > > >>> > > >>> One step for the switch to CMake to happen is the approval and > merging > > >>> of the mentioned PR: > > >>> > > >>> https://github.com/apache/incubator-mxnet/pull/12160 > > >>> > > >>> If there are no other objections I kindly ask Chris Olivier to remove > > >>> his 'requesting changes' veto on it to unblock the CMake overhaul > work. > > >>> > > >>> Thank you. > > >>> > > >>> Best > > >>> Anton > > >>> > > >>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov <[email protected]>: > > >>> > > >>>> > > >>>> Thank you for you answer, Chris. > > >>>> > > >>>> > The whole “mixing omp libraries” is something that occurs in > > >>>> production > > >>>> every day and certainly in everything that uses mkl. > > >>>> > > >>>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures > that > > >>>> this mixture is not happening: > > >>>> > > >>>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP > > >>>> runtime library to work. As different OpenMP runtimes may not be > binary > > >>>> compatible it's important to ensure that only one OpenMP runtime is > used > > >>>> throughout the application. Having more than one OpenMP runtime > initialized > > >>>> may lead to undefined behavior resulting in incorrect results or > crashes." > > >>>> [1] > > >>>> > > >>>> That is why 2 different MKLML libraries are provided: > > >>>> > > >>>> lib/libmklml_gnu.so | Intel MKL small library for GNU* OpenMP > runtime > > >>>> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP > > >>>> runtime > > >>>> > > >>>> > is the suggestion that libiomp be removed from mkl? > > >>>> > > >>>> That is certainly not my suggestion. > > >>>> > > >>>> > have you spoken with intel? have you consulted Intel at all? > > >>>> > > >>>> Yes, I have asked for comments on the issue. > > >>>> > > >>>> > “hard to debug random crash”. you’re seeing an assertion which is > > >>>> probably ... > > >>>> > > >>>> I'm seeing the result of undefined behaviour. And I want to put > > >>>> emphasis on the following statement: > > >>>> > > >>>> I disregards of whether there is a particular reason for the assert > - > > >>>> it is a result of behaviour that should not happen. There are valid > ways > > >>>> how to use llvm OpenMP in MXNet and the current way is not one of > them. > > >>>> > > >>>> > The lack of root-causing the problem and knee-jerk solution here > > >>>> makes me > > >>>> uncomfortable. > > >>>> > > >>>> I hope that my efforts highlighting the problems reach you to > mitigate > > >>>> your uncomfort. > > >>>> > > >>>> > if you want to see performance differences there’s an environment > > >>>> variable > > >>>> you can set in the mxnet omp tuning code that will print overhead > and > > >>>> execution times for the current omp library. > > >>>> > > >>>> I don't want to see performance differences in the current OpenMP > > >>>> library. I want to remove the current OpenMP library and use the one > > >>>> provided by the compiler. > > >>>> > > >>>> > > >>>> > > >>>> Best > > >>>> Anton > > >>>> > > >>>> [1] > https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265 > > >>>> > > >>>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <[email protected] > >: > > >>>> > > >>>>> Do you not work on CI mostly? My apologies for thinking that was > some > > >>>>> sort > > >>>>> of team effort between you and a few others that were passionate > about > > >>>>> CI > > >>>>> keeping the CI system running smoothly. > > >>>>> > > >>>>> You have source code, you have the line the assertion is on. If you > > >>>>> can’t > > >>>>> describe what’s going wrong that causes the assertion, then I don’t > > >>>>> really > > >>>>> have anything more to add to this conversation beyond what’s below: > > >>>>> > > >>>>> The whole “mixing omp libraries” is something that occurs in > production > > >>>>> every day and certainly in everything that uses mkl. It may > > >>>>> occasionally > > >>>>> cause problems for some edge cases when there is super-complex > linking > > >>>>> strategies and dynamic loading. But this is not one of those edge > > >>>>> cases. > > >>>>> Mostly blaming this is a red herring for other thread-related > problems > > >>>>> and > > >>>>> people switch omp library and the timing of their code changes and > they > > >>>>> stop seeing the problem. I’ve spent my entire career doing heavily > > >>>>> multiphreaded c++ development and i’ve seen that a million times. > is > > >>>>> the > > >>>>> suggestion that libiomp be removed from mkl? have you spoken with > > >>>>> intel? > > >>>>> have you consulted Intel at all? > > >>>>> > > >>>>> and what you are seeing isn’t some “hard to debug random crash”. > you’re > > >>>>> seeing an assertion which is probably related to omp trying to > create a > > >>>>> thread pool after a fork and something was done in the mxnet code > to > > >>>>> make > > >>>>> that sketchy to do. I’d suggest filing an issue with the llvm > openmp > > >>>>> just > > >>>>> like you’d file with any other not-well-understood behavior in > mxnet. > > >>>>> > > >>>>> The lack of root-causing the problem and knee-jerk solution here > makes > > >>>>> me > > >>>>> uncomfortable. > > >>>>> > > >>>>> if you want to see performance differences there’s an environment > > >>>>> variable > > >>>>> you can set in the mxnet omp tuning code that will print overhead > and > > >>>>> execution times for the current omp library. > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <[email protected] > > > > >>>>> wrote: > > >>>>> > > >>>>> > Hi Chris, > > >>>>> > > > >>>>> > Thank you for your answer. If you have noticed the initial email > > >>>>> comes from > > >>>>> > me, Anton Chernov (@lebeg on Github) and thus the proposal is not > > >>>>> from any > > >>>>> > 'Ci' team that you've mentioned, but from me personally. > > >>>>> > > > >>>>> > You are writing: > > >>>>> > > > >>>>> > > someone is doing something unhealthy when they fork ... > > >>>>> > > > >>>>> > I'm missing any context to understand what you mean. > > >>>>> > > > >>>>> > > we get a lot of performance gain from OMP ... > > >>>>> > > > >>>>> > There is no data that would prove this statement and therefore > it is > > >>>>> a > > >>>>> > random guess. > > >>>>> > > > >>>>> > > in many months, no investigation has occurred as to WHY the > > >>>>> assertion is > > >>>>> > failing. > > >>>>> > > > >>>>> > The investigation has concluded that this is happening due to > > >>>>> undefined > > >>>>> > behaviour which is, in my opinion, a suffient answer that does > not > > >>>>> require > > >>>>> > to go any deeper. > > >>>>> > > > >>>>> > > the pr is vetoed until such a time that the actual root cause > of > > >>>>> the > > >>>>> > problem is known. > > >>>>> > > > >>>>> > And considering the statements above there is no valid reason to > > >>>>> veto the > > >>>>> > PR. > > >>>>> > > > >>>>> > > > >>>>> > Best > > >>>>> > Anton > > >>>>> > > > >>>>> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier < > [email protected]>: > > >>>>> > > > >>>>> > > 3x less overhead* > > >>>>> > > > > >>>>> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier < > > >>>>> [email protected]> > > >>>>> > > wrote: > > >>>>> > > > > >>>>> > > > someone is doing something unhealthy when they fork, which is > > >>>>> causing > > >>>>> > an > > >>>>> > > > assertion in the openmp library. the same assertion that > would > > >>>>> fire in > > >>>>> > > mkl, > > >>>>> > > > which is linked to libiomp5 (exact same omp library). this > is new > > >>>>> > > behavior > > >>>>> > > > and most likely due to an error or suboptimal approach in the > > >>>>> forking > > >>>>> > > logic > > >>>>> > > > in mxnet. > > >>>>> > > > > > >>>>> > > > in order to circumvent the assert, the Ci team is proposing > to > > >>>>> remove > > >>>>> > the > > >>>>> > > > library completely which is equivalent to cutting off your > leg > > >>>>> to make > > >>>>> > > the > > >>>>> > > > pain from stubbing your toe go away. > > >>>>> > > > > > >>>>> > > > we get a lot of performance gain from OMP. is has about a 1/3 > > >>>>> less > > >>>>> > > > overhead for entering omp regions and also supports omp > regions > > >>>>> after a > > >>>>> > > > fork, which libgomp does not. > > >>>>> > > > > > >>>>> > > > in many months, no investigation has occurred as to WHY the > > >>>>> assertion > > >>>>> > is > > >>>>> > > > failing. > > >>>>> > > > > > >>>>> > > > the pr is vetoed until such a time that the actual root > cause of > > >>>>> the > > >>>>> > > > problem is known. > > >>>>> > > > > > >>>>> > > > > > >>>>> > > > thanks, > > >>>>> > > > > > >>>>> > > > -Chris. > > >>>>> > > > > > >>>>> > > > > > >>>>> > > > > > >>>>> > > > > > >>>>> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov < > > >>>>> [email protected]> > > >>>>> > > wrote: > > >>>>> > > > > > >>>>> > > >> Dear MXNet community, > > >>>>> > > >> > > >>>>> > > >> I would like to drive attention to an important issue that > is > > >>>>> present > > >>>>> > in > > >>>>> > > >> the MXNet CMake build: usage of bundled llvm OpenMP library. > > >>>>> > > >> > > >>>>> > > >> I have opened a PR to remove it: > > >>>>> > > >> https://github.com/apache/incubator-mxnet/pull/12160 > > >>>>> > > >> > > >>>>> > > >> The issue was closed, but I am strong in my oppinion that > it's > > >>>>> the > > >>>>> > right > > >>>>> > > >> thing to do. > > >>>>> > > >> > > >>>>> > > >> *Background* > > >>>>> > > >> If you want to use OpenMP pragmas in your code for > > >>>>> parallelization you > > >>>>> > > >> would supply a special flag to the compiler: > > >>>>> > > >> > > >>>>> > > >> - Clang / -fopenmp > > >>>>> > > >> https://openmp.llvm.org/ > > >>>>> > > >> > > >>>>> > > >> - GCC / -fopenmp > > >>>>> > > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html > > >>>>> > > >> > > >>>>> > > >> - Intel / [Q]openmp > > >>>>> > > >> > > >>>>> > > >> > > >>>>> > > > > >>>>> > > > >>>>> > https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 > > >>>>> > > >> > > >>>>> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support) > > >>>>> > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx > > >>>>> > > >> > > >>>>> > > >> Each of the compilers would enable the '#pragma omp' > directive > > >>>>> during > > >>>>> > > >> C/C++ > > >>>>> > > >> compilation and arrange for automatic linking of the OpenMP > > >>>>> runtime > > >>>>> > > >> library > > >>>>> > > >> supplied by each complier separately. > > >>>>> > > >> > > >>>>> > > >> Thus, to use the advantages of an OpenMP implementation one > has > > >>>>> to > > >>>>> > > compile > > >>>>> > > >> the code with the corresponding compiler. > > >>>>> > > >> > > >>>>> > > >> Currently, in MXNet CMake build scripts a bundled version of > > >>>>> llvm > > >>>>> > OpenMP > > >>>>> > > >> is > > >>>>> > > >> used ([1] and [2]) to replace the OpenMP library supplied > by the > > >>>>> > > compiler. > > >>>>> > > >> > > >>>>> > > >> I will quote here the README from the MKL-DNN (Intel(R) Math > > >>>>> Kernel > > >>>>> > > >> Library > > >>>>> > > >> for Deep Neural Networks): > > >>>>> > > >> > > >>>>> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an > > >>>>> OpenMP > > >>>>> > > runtime > > >>>>> > > >> library to work. As different OpenMP runtimes may not be > binary > > >>>>> > > compatible > > >>>>> > > >> it's important to ensure that only one OpenMP runtime is > used > > >>>>> > throughout > > >>>>> > > >> the application. Having more than one OpenMP runtime > > >>>>> initialized may > > >>>>> > > lead > > >>>>> > > >> to undefined behavior resulting in incorrect results or > > >>>>> crashes." [3] > > >>>>> > > >> > > >>>>> > > >> And: > > >>>>> > > >> > > >>>>> > > >> "Using GNU compiler with -fopenmp and -liomp5 options will > link > > >>>>> the > > >>>>> > > >> application with both Intel and GNU OpenMP runtime > libraries. > > >>>>> This > > >>>>> > will > > >>>>> > > >> lead to undefined behavior of the application." [4] > > >>>>> > > >> > > >>>>> > > >> As can be seen from ldd for MXNet: > > >>>>> > > >> > > >>>>> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp > > >>>>> > > >> libomp.so => > > >>>>> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so > > >>>>> > > >> (0x00007f697bc55000) > > >>>>> > > >> libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 > > >>>>> > > >> (0x00007f69660cd000) > > >>>>> > > >> > > >>>>> > > >> *Performance* > > >>>>> > > >> > > >>>>> > > >> The only performance data related to OpenMP in MXNet I was > able > > >>>>> to > > >>>>> > find > > >>>>> > > is > > >>>>> > > >> here: > > >>>>> > > >> > > >>>>> > > >> > > >>>>> > > > > >>>>> > > > >>>>> > https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172 > > >>>>> > > >> > > >>>>> > > >> Which in my understanding is testing imact of different > > >>>>> environment > > >>>>> > > >> variables for the same setup (using same bundled OpenMP > > >>>>> library). > > >>>>> > > >> > > >>>>> > > >> The libraries may differ in implementation and the Thread > > >>>>> Affinity > > >>>>> > > >> Interface [5] may have significant impact on performance. > > >>>>> > > >> > > >>>>> > > >> All compliers support it: > > >>>>> > > >> > > >>>>> > > >> - Clang / KMP_AFFINITY > > >>>>> > > >> > > >>>>> > > >> > > >>>>> > > > > >>>>> > > > >>>>> > https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp > > >>>>> > > >> > > >>>>> > > >> - GCC / GOMP_CPU_AFFINITY > > >>>>> > > >> > > >>>>> > > >> > > >>>>> > > > > >>>>> > > > >>>>> > https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html > > >>>>> > > >> > > >>>>> > > >> - Intel / KMP_AFFINITY > > >>>>> > > >> > > >>>>> > > >> > > >>>>> > > > > >>>>> > > > >>>>> > https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1 > > >>>>> > > >> > > >>>>> > > >> - Visual Studio / SetThreadAffinityMask > > >>>>> > > >> > > >>>>> > > >> > > >>>>> > > > > >>>>> > > > >>>>> > https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask > > >>>>> > > >> > > >>>>> > > >> *Issues* > > >>>>> > > >> > > >>>>> > > >> Failed OpenMP assertion when loading MXNet compiled with > DEBUG=1 > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/10856 > > >>>>> > > >> > > >>>>> > > >> libomp.so dependency (need REAL fix) > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11417 > > >>>>> > > >> > > >>>>> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy > > >>>>> with MKL > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/8532 > > >>>>> > > >> > > >>>>> > > >> Performance regression when OMP_NUM_THREADS environment > > >>>>> variable is > > >>>>> > not > > >>>>> > > >> set > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/9744 > > >>>>> > > >> > > >>>>> > > >> Poor concat CPU performance on CUDA builds > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11905 > > >>>>> > > >> > > >>>>> > > >> I would appreciate hearing your thoughts. > > >>>>> > > >> > > >>>>> > > >> > > >>>>> > > >> Best > > >>>>> > > >> Anton > > >>>>> > > >> > > >>>>> > > >> [1] > > >>>>> > > >> > > >>>>> > > >> > > >>>>> > > > > >>>>> > > > >>>>> > https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405 > > >>>>> > > >> [2] > > >>>>> https://github.com/apache/incubator-mxnet/tree/master/3rdparty > > >>>>> > > >> [3] > > >>>>> https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265 > > >>>>> > > >> [4] > > >>>>> https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280 > > >>>>> > > >> [5] https://software.intel.com/en-us/node/522691 > > >>>>> > > >> > > >>>>> > > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>> >
