https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63706
Bug ID: 63706 Summary: stl_heap.h:make_heap()'s worst time complexity doesn't conform with C++ standard Product: gcc Version: 4.9.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: tomytsai at gmail dot com Hi, In current (4.9.0 and "the latest collection") libstdc++ source code, the make_heap() function has the worst time complexity of O(n log n), while C++ standard requires "This algorithm makes at most 3 * (finish - start) comparisons.", i.e. O(n) time complexity. This topic has been discussed in some related links: http://stackoverflow.com/questions/24475056/is-libcs-implementation-of-stdmake-heap-nonconformant http://llvm.org/bugs/show_bug.cgi?id=20161 As a result, llvm3.4 has added a new function, __sift_down(), to support the worst time O(n) time complexity for make_heap(). Their implementation can be found at http://llvm.org/svn/llvm-project/libcxx/trunk/include/algorithm (search for make_heap()) Currently, the make_heap() implementation in libstdc++ is usually working well with practical data. However we could make up some pathetic cases to make it run with O(n lg n) comparisons, as below. If we split data into 4 equal parts: [ A | B | C | D ] and put the largest n/4 elements in part B, from large to small. The rest of data we don't care about the order. Then the data looks like: [ A | B: [largest] [2nd largest] [3rd] .... [n/4-th largest] | C | D ] If we call make_heap() on this data, based on the current implementation, it will go backwards from the end of part B, to the start of part A. Then what happens is that: since the elements in part B are the top n/4 elements of all data, they will be 1. temporarily located to a slot in the [ C | D ] area, and then 2. pushed up all the way to the first element (since they are the max element encountered so far). Step 2 takes O(log n) steps for each element. Hence, for the pathetic data above, if we do make_heap(), processing part B already takes O(n/4 * log n) data comparison and moves. This doesn't conform with the C++ standard, which requires O(n) time complexity. One possible solution is, as llvm 3.4 does above, to provide a sift_down() function to support the worst time O(n) time complexity. (In addition, like llvm 3.4, the sift_down() function is also useful for supporting the pop_heap() function. Although it has the same asymptotic complexity as current __adjust_heap() in gcc's libstdc++, but it only has one-pass top-down search, and so has a better cache performance) Thanks! Tomy