On 11/06/20 08:32 +0200, François Dumont via Libstdc++ wrote:
As we are on patching algos we still have this old one.
   From the original patch I only kept the memory optimization part
as the new performance test was not showing good result for the other
part to change pivot value. I also kept the small change in
get_temporary_buffer even if I don't have strong feeling about it, it
just make sure that we'll try to allocate 1 element as a last chance
allocation.
   Note that there is still place for an improvement. If we miss
memory on the heap we then use a recursive implementation which then
rely on stack memory. I would be surprise that a system which miss
heap memory would have no problem to allocate about the same on the
stack so we will surely end up in a stack overflow. I still have this
on my todo even if I already made several tries with no satisfying
result in terms of performance.
   Tested under Linux x86_64.
Commit message:
   libstdc++: Limit memory allocation in stable_sort/inplace_merge
(PR 83938)
   Reduce memory consumption in stable_sort/inplace_merge to what is used.
   Co-authored-by: François Dumont <fdum...@gcc.gnu.org>
   libstdc++-v3/ChangeLog:
   2020-06-11 John Chang <john.ch...@samba.tv>
               François Dumont <fdum...@gcc.gnu.org>
           PR libstdc++/83938
           * include/bits/stl_tempbuf.h (get_temporary_buffer):
Change __len
           computation in the loop.
           * include/bits/stl_algo.h:
           (__inplace_merge): Take temporary buffer length from
smallest range.
           (__stable_sort): Limit temporary buffer length.
           * testsuite/25_algorithms/inplace_merge/1.cc (test03):
Test different
           pivot positions.
           * testsuite/performance/25_algorithms/stable_sort.cc: Test
stable_sort
           under different heap memory conditions.
           * testsuite/performance/25_algorithms/inplace_merge.cc:
New.
Ok to commit ?
Oops, I replied to the old thread about this patch. See comments at:
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559583.html
P.S. it looks to me like the "else" branch in __merge_adaptive will
rarely get used. In most cases, the temporary buffer is going to
succeed on the first allocation and have enough space.
So I think we should split out the "else" branch of __merge_adaptive
to a separate function, so that __merge_adaptive is a much smaller
function and the cold branch doesn't have to be loaded into cache.
Something like this, but I haven't benchmarked it.
diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 6efc99035b7d..428c22cf5d43 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -2400,6 +2400,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return std::rotate(__first, __middle, __last);
}
+ // This is a helper function for the cold branch of __merge_adaptive.
+ template<typename _BidirectionalIterator, typename _Distance,
+ typename _Pointer, typename _Compare>
+ void
+ __merge_adaptive_resize(_BidirectionalIterator __first,
+ _BidirectionalIterator __middle,
+ _BidirectionalIterator __last,
+ _Distance __len1, _Distance __len2,
+ _Pointer __buffer, _Distance __buffer_size,
+ _Compare __comp);
+
/// This is a helper function for the merge routines.
template<typename _BidirectionalIterator, typename _Distance,
typename _Pointer, typename _Compare>
@@ -2425,42 +2436,58 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
else
{
- _BidirectionalIterator __first_cut = __first;
- _BidirectionalIterator __second_cut = __middle;
- _Distance __len11 = 0;
- _Distance __len22 = 0;
- if (__len1 > __len2)
- {
- __len11 = __len1 / 2;
- std::advance(__first_cut, __len11);
- __second_cut
- = std::__lower_bound(__middle, __last, *__first_cut,
- __gnu_cxx::__ops::__iter_comp_val(__comp));
- __len22 = std::distance(__middle, __second_cut);
- }
- else
- {
- __len22 = __len2 / 2;
- std::advance(__second_cut, __len22);
- __first_cut
- = std::__upper_bound(__first, __middle, *__second_cut,
- __gnu_cxx::__ops::__val_comp_iter(__comp));
- __len11 = std::distance(__first, __first_cut);
- }
-
- _BidirectionalIterator __new_middle
- = std::__rotate_adaptive(__first_cut, __middle, __second_cut,
- __len1 - __len11, __len22, __buffer,
- __buffer_size);
- std::__merge_adaptive(__first, __first_cut, __new_middle, __len11,
- __len22, __buffer, __buffer_size, __comp);
- std::__merge_adaptive(__new_middle, __second_cut, __last,
- __len1 - __len11,
- __len2 - __len22, __buffer,
- __buffer_size, __comp);
+ std::__merge_adaptive_resize(__first, __middle, __last,
+ __len1, __len2, __buffer, __buffer_size,
+ __comp);
}
}
+ // This is a helper function for the cold branch of __merge_adaptive.
+ template<typename _BidirectionalIterator, typename _Distance,
+ typename _Pointer, typename _Compare>
+ void
+ __merge_adaptive_resize(_BidirectionalIterator __first,
+ _BidirectionalIterator __middle,
+ _BidirectionalIterator __last,
+ _Distance __len1, _Distance __len2,
+ _Pointer __buffer, _Distance __buffer_size,
+ _Compare __comp)
+ {
+ _BidirectionalIterator __first_cut = __first;
+ _BidirectionalIterator __second_cut = __middle;
+ _Distance __len11 = 0;
+ _Distance __len22 = 0;
+ if (__len1 > __len2)
+ {
+ __len11 = __len1 / 2;
+ std::advance(__first_cut, __len11);
+ __second_cut
+ = std::__lower_bound(__middle, __last, *__first_cut,
+ __gnu_cxx::__ops::__iter_comp_val(__comp));
+ __len22 = std::distance(__middle, __second_cut);
+ }
+ else
+ {
+ __len22 = __len2 / 2;
+ std::advance(__second_cut, __len22);
+ __first_cut
+ = std::__upper_bound(__first, __middle, *__second_cut,
+ __gnu_cxx::__ops::__val_comp_iter(__comp));
+ __len11 = std::distance(__first, __first_cut);
+ }
+
+ _BidirectionalIterator __new_middle
+ = std::__rotate_adaptive(__first_cut, __middle, __second_cut,
+ __len1 - __len11, __len22, __buffer,
+ __buffer_size);
+ std::__merge_adaptive(__first, __first_cut, __new_middle, __len11,
+ __len22, __buffer, __buffer_size, __comp);
+ std::__merge_adaptive(__new_middle, __second_cut, __last,
+ __len1 - __len11,
+ __len2 - __len22, __buffer,
+ __buffer_size, __comp);
+ }
+
/// This is a helper function for the merge routines.
template<typename _BidirectionalIterator, typename _Distance,
typename _Compare>