On 11/06/20 08:32 +0200, François Dumont via Libstdc++ wrote:
As we are on patching algos we still have this old one.

    From the original patch I only kept the memory optimization part as the new performance test was not showing good result for the other part to change pivot value. I also kept the small change in get_temporary_buffer even if I don't have strong feeling about it, it just make sure that we'll try to allocate 1 element as a last chance allocation.

    Note that there is still place for an improvement. If we miss memory on the heap we then use a recursive implementation which then rely on stack memory. I would be surprise that a system which miss heap memory would have no problem to allocate about the same on the stack so we will surely end up in a stack overflow. I still have this on my todo even if I already made several tries with no satisfying result in terms of performance.

    Tested under Linux x86_64.

Commit message:

    libstdc++: Limit memory allocation in stable_sort/inplace_merge (PR 83938)

    Reduce memory consumption in stable_sort/inplace_merge to what is used.

    Co-authored-by: François Dumont  <fdum...@gcc.gnu.org>

    libstdc++-v3/ChangeLog:

    2020-06-11  John Chang  <john.ch...@samba.tv>
                François Dumont  <fdum...@gcc.gnu.org>

            PR libstdc++/83938
            * include/bits/stl_tempbuf.h (get_temporary_buffer): Change __len
            computation in the loop.
            * include/bits/stl_algo.h:
            (__inplace_merge): Take temporary buffer length from smallest range.
            (__stable_sort): Limit temporary buffer length.
            * testsuite/25_algorithms/inplace_merge/1.cc (test03): Test different
            pivot positions.
            * testsuite/performance/25_algorithms/stable_sort.cc: Test stable_sort
            under different heap memory conditions.
            * testsuite/performance/25_algorithms/inplace_merge.cc: 
New.

Ok to commit ?

Oops, I replied to the old thread about this patch. See comments at:
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559583.html

P.S. it looks to me like the "else" branch in __merge_adaptive will
rarely get used. In most cases, the temporary buffer is going to
succeed on the first allocation and have enough space.

So I think we should split out the "else" branch of __merge_adaptive
to a separate function, so that __merge_adaptive is a much smaller
function and the cold branch doesn't have to be loaded into cache.

Something like this, but I haven't benchmarked it.

diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 6efc99035b7d..428c22cf5d43 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -2400,6 +2400,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return std::rotate(__first, __middle, __last);
     }
 
+  // This is a helper function for the cold branch of __merge_adaptive.
+  template<typename _BidirectionalIterator, typename _Distance,
+	   typename _Pointer, typename _Compare>
+    void
+    __merge_adaptive_resize(_BidirectionalIterator __first,
+			    _BidirectionalIterator __middle,
+			    _BidirectionalIterator __last,
+			    _Distance __len1, _Distance __len2,
+			    _Pointer __buffer, _Distance __buffer_size,
+			    _Compare __comp);
+
   /// This is a helper function for the merge routines.
   template<typename _BidirectionalIterator, typename _Distance, 
 	   typename _Pointer, typename _Compare>
@@ -2425,42 +2436,58 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
       else
 	{
-	  _BidirectionalIterator __first_cut = __first;
-	  _BidirectionalIterator __second_cut = __middle;
-	  _Distance __len11 = 0;
-	  _Distance __len22 = 0;
-	  if (__len1 > __len2)
-	    {
-	      __len11 = __len1 / 2;
-	      std::advance(__first_cut, __len11);
-	      __second_cut
-		= std::__lower_bound(__middle, __last, *__first_cut,
-				     __gnu_cxx::__ops::__iter_comp_val(__comp));
-	      __len22 = std::distance(__middle, __second_cut);
-	    }
-	  else
-	    {
-	      __len22 = __len2 / 2;
-	      std::advance(__second_cut, __len22);
-	      __first_cut
-		= std::__upper_bound(__first, __middle, *__second_cut,
-				     __gnu_cxx::__ops::__val_comp_iter(__comp));
-	      __len11 = std::distance(__first, __first_cut);
-	    }
-
-	  _BidirectionalIterator __new_middle
-	    = std::__rotate_adaptive(__first_cut, __middle, __second_cut,
-				     __len1 - __len11, __len22, __buffer,
-				     __buffer_size);
-	  std::__merge_adaptive(__first, __first_cut, __new_middle, __len11,
-				__len22, __buffer, __buffer_size, __comp);
-	  std::__merge_adaptive(__new_middle, __second_cut, __last,
-				__len1 - __len11,
-				__len2 - __len22, __buffer,
-				__buffer_size, __comp);
+	  std::__merge_adaptive_resize(__first, __middle, __last,
+				       __len1, __len2, __buffer, __buffer_size,
+				       __comp);
 	}
     }
 
+  // This is a helper function for the cold branch of __merge_adaptive.
+  template<typename _BidirectionalIterator, typename _Distance,
+	   typename _Pointer, typename _Compare>
+    void
+    __merge_adaptive_resize(_BidirectionalIterator __first,
+			    _BidirectionalIterator __middle,
+			    _BidirectionalIterator __last,
+			    _Distance __len1, _Distance __len2,
+			    _Pointer __buffer, _Distance __buffer_size,
+			    _Compare __comp)
+    {
+      _BidirectionalIterator __first_cut = __first;
+      _BidirectionalIterator __second_cut = __middle;
+      _Distance __len11 = 0;
+      _Distance __len22 = 0;
+      if (__len1 > __len2)
+	{
+	  __len11 = __len1 / 2;
+	  std::advance(__first_cut, __len11);
+	  __second_cut
+	    = std::__lower_bound(__middle, __last, *__first_cut,
+				 __gnu_cxx::__ops::__iter_comp_val(__comp));
+	  __len22 = std::distance(__middle, __second_cut);
+	}
+      else
+	{
+	  __len22 = __len2 / 2;
+	  std::advance(__second_cut, __len22);
+	  __first_cut
+	    = std::__upper_bound(__first, __middle, *__second_cut,
+				 __gnu_cxx::__ops::__val_comp_iter(__comp));
+	  __len11 = std::distance(__first, __first_cut);
+	}
+
+      _BidirectionalIterator __new_middle
+	= std::__rotate_adaptive(__first_cut, __middle, __second_cut,
+				 __len1 - __len11, __len22, __buffer,
+				 __buffer_size);
+      std::__merge_adaptive(__first, __first_cut, __new_middle, __len11,
+			    __len22, __buffer, __buffer_size, __comp);
+      std::__merge_adaptive(__new_middle, __second_cut, __last,
+			    __len1 - __len11,
+			    __len2 - __len22, __buffer,
+			    __buffer_size, __comp);
+    }
+
   /// This is a helper function for the merge routines.
   template<typename _BidirectionalIterator, typename _Distance,
 	   typename _Compare>

Reply via email to