https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63706

            Bug ID: 63706
           Summary: stl_heap.h:make_heap()'s worst time complexity doesn't
                    conform with C++ standard
           Product: gcc
           Version: 4.9.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tomytsai at gmail dot com

Hi,

In current (4.9.0 and "the latest collection") libstdc++ source code, the
make_heap() function has the worst time complexity of O(n log n), while
C++ standard requires "This algorithm makes at most 3 * (finish - start)
comparisons.", i.e. O(n) time complexity.

This topic has been discussed in some related links:
http://stackoverflow.com/questions/24475056/is-libcs-implementation-of-stdmake-heap-nonconformant

http://llvm.org/bugs/show_bug.cgi?id=20161

As a result, llvm3.4 has added a new function, __sift_down(), to support
the worst time O(n) time complexity for make_heap().

Their implementation can be found at 
http://llvm.org/svn/llvm-project/libcxx/trunk/include/algorithm
(search for make_heap())

Currently, the make_heap() implementation in libstdc++ is usually working 
well with practical data. However we could make up some pathetic 
cases to make it run with O(n lg n) comparisons, as below.

If we split data into 4 equal parts:
[  A  |  B  |  C  |  D  ]

and put the largest n/4 elements in part B, from large to small. The rest
of data we don't care about the order. Then the data looks like:

[  A  | B: [largest] [2nd largest] [3rd] .... [n/4-th largest] | C | D ]

If we call make_heap() on this data, based on the current implementation,
it will go backwards from the end of part B, to the start of part A.

Then what happens is that: since the elements in part B are the top n/4
elements of all data, they will be 1. temporarily located to a slot in
the [ C | D ] area, and then 2. pushed up all the way to the first
element (since they are the max element encountered so far). Step 2
takes O(log n) steps for each element. 

Hence, for the pathetic data above, if we do make_heap(), processing part B
already takes O(n/4 * log n) data comparison and moves. This doesn't conform
with the C++ standard, which requires O(n) time complexity.

One possible solution is, as llvm 3.4 does above, to provide a sift_down()
function to support the worst time O(n) time complexity.

(In addition, like llvm 3.4, the sift_down() function is also useful for
supporting the pop_heap() function. Although it has the same asymptotic 
complexity as current __adjust_heap() in gcc's libstdc++, but it only has
one-pass top-down search, and so has a better cache performance)

Thanks!

Tomy

Reply via email to