if it is really a problem, then it would be prioritized. all the necessary info is in that issue (and i already mentioned just yesterday or today on that ticket) what it was again and it’s like i was talking to no one, as it has been, simply an immediate revert to “remove the library”. in the time wasted on all this, it could have been resolved 100 times over.
I can remove just about every bug from mxnet by turning off ALL of the features in CMakeLists.txt. no features, no bugs. This is roughly equivalent to the approach that has been taken so far for 1.5 years, which is not good engineering practice, ad a suggestion that I am surprised to see championed by a committer. Here’s another example: Not too long ago (maybe 8 months?) there was a crash at shutdown in debug mode in tcmalloc (gperf version of malloc, which is similar to jemalloc) with an error message about bad pointer to free() or something like that. At the time, I didn’t know what caused it and so I did not block it’s removal. fast-forward to about two months ago, where I saw the same error in a different code base. Since it was happening to me, I was in a position to debug it, so I did and found that a the same small static library was linked into two different shared objects, and occasionally (depending upon link order, I presume), a global string variable was created and destroyed twice, because when linking, both shared object c-runtime init functions had the same name, so mapped to the same startup routine and global data address, so when both shared objects initialized, they called the same address. This caused both a memory leak because the first startup string memory allocation was discarded by the second call to the constructor and at shutdown, an assert in tcmalloc because the same second memory pointer allocated was freed twice. When tcmalloc was removed, the assert went away but the bug, to the best of my knowledge, is still there. If I knew then what I know now, I would have asked the bug to be fixed rather than remove tcmalloc. Not because of a love for tcmalloc, but because there is something telling you there is a bug and the bug should be fixed, because if you just hide the bug (comment out the assert) then it’s likely to cause other (harder to track down) problems later. So now that bug is probably still there causing who-knows-what random crashes or undefined behavior. This is the kind of root causing that should be done and not effectively commenting out the assert. I believe we should insist on the highest standards. I understand if a person does CI all day and if they find something they can do via CI (ie turn off a feature) which makes the problem go away, then they might feel compelled to champion that option. Like the saying goes, “When you have a hammer in your hand, everything looks like a nail”. But this is not always the best solution for the project. There is a bug, and it should be fixed because commenting out the assert just hides the bug from plain view, but the bug remains. Or sufficient evidence otherwise. -Chris On Sat, Dec 7, 2019 at 8:06 AM Leonard Lausen <[email protected]> wrote: > It appears to me that this issue only occurs when having multiple openmp > libraries at runtime. I don't understand why we need to support this > use-case. MKL-DNN works with whatever openmp runtime is provided by the > compiler [1 > <https://github.com/intel/mkl-dnn/blob/433e086bf5d9e5ccfc9ec0b70322f931b6b1921d/doc/build/build_options.md#openmp>]. > If you think this use-case is important, please give some more reasoning. > If you convince me I'm happy to help to root-cause it. > > Otherwise I suggest to follow the simplistic approach of using the compile > openmp runtime. If any specific openmp runtime is needed, then we can > compile with the associated compiler (GCC, LLVM, Intel Compiler). > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/incubator-mxnet/issues/10856?email_source=notifications&email_token=ACVWZ7MIEQL7BQDDEKJT7G3QXPCZXA5CNFSM4E66F4P2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGGJ5MA#issuecomment-562863792>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ACVWZ7MC2MIHMWK72IDVX4TQXPCZXANCNFSM4E66F4PQ> > . >
