http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48750
--- Comment #7 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-04-24 21:51:51 UTC --- I had a quick look to this code and something really bad is going on, like, eg, memory deallocated before calling the destructor of the object constructed in it via placement new. And much more. The patchlet below, which I'd ask the submitters to test, gives me something much more sane, wrt Valgrind too, but Jahannes should really have a look asap. (Note the patch is versus mainline first, as usual, thus doesn't apply cleanly as-is to 4.4 for various trivial reasons, like uglified names) ///////////////// Index: par_loop.h =================================================================== --- par_loop.h (revision 172920) +++ par_loop.h (working copy) @@ -91,8 +91,7 @@ _ThreadIndex __iam = omp_get_thread_num(); // Neutral element. - _Result* __reduct = static_cast<_Result*> - (::operator new(sizeof(_Result))); + _Result* __reduct; _DifferenceType __start = equally_split_point(__length, __num_threads, __iam), @@ -100,7 +99,7 @@ if (__start < __stop) { - new(__reduct) _Result(__f(__o, __begin + __start)); + __reduct = new _Result(__f(__o, __begin + __start)); ++__start; __constructed[__iam] = true; } @@ -110,18 +109,26 @@ for (; __start < __stop; ++__start) *__reduct = __r(*__reduct, __f(__o, __begin + __start)); - __thread_results[__iam] = *__reduct; + if (__constructed[__iam]) + { + ::new(&__thread_results[__iam]) _Result(*__reduct); + delete __reduct; + } } //parallel for (_ThreadIndex __i = 0; __i < __num_threads; ++__i) if (__constructed[__i]) - __output = __r(__output, __thread_results[__i]); + { + __output = __r(__output, __thread_results[__i]); + (&__thread_results[__i])->~_Result(); + } // Points to last element processed (needed as return value for // some algorithms like transform). __f._M_finish_iterator = __begin + __length; - delete[] __thread_results; + ::operator delete(__thread_results); + delete[] __constructed; return __o;