http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58437
--- Comment #12 from Jeffrey M. Birnbaum <jmbnyc at gmail dot com> --- Tammy, We tested gcc 4.7.2, 4.6.2 and 4.4.3/5 (the bug is not in either 4.4.3/5). I have gcc 4.8.1 on my laptop but have not tried it yet. I confirmed the issue by compiling my test (almost identical to the one you submitted but using 500M elements) on 4.4.5 and then moving the executable over to a box with 4.7.2 installed. the native compiled program performed poorly compared to the one compiled with 4.4.5 (this also ruled out chip issues, i.e. haswell vs sandybridge). I knew something was wrong when my own single threaded merge sort that produces a gradient instead of sorting the data in place was outperforming the std::sort using -D_GLIBCXX_PARALLEL, i.e. parallel sort of 500M entries. Best, /JMB