jpountz commented on a change in pull request #430:
URL: https://github.com/apache/lucene/pull/430#discussion_r750279415



##########
File path: lucene/core/src/java/org/apache/lucene/util/MathUtil.java
##########
@@ -24,13 +24,36 @@
   // No instance:
   private MathUtil() {}
 
+  /**
+   * Returns {@code x <= 0 ? 0 : Math.floor(Math.log(x) / Math.log(base))}
+   *
+   * @param base must be {@code > 1}
+   */
+  public static int log(int x, int base) {
+    if (base == 2) {
+      // This specialized method is 30x faster.
+      return x <= 0 ? 0 : 31 - Integer.numberOfLeadingZeros(x);
+    } else if (base <= 1) {
+      throw new IllegalArgumentException("base must be > 1");
+    }
+    int ret = 0;
+    while (x >= base) {
+      x /= base;
+      ret++;
+    }
+    return ret;
+  }
+

Review comment:
       It looks like this specialization isn't bringing any significant 
performance, should we only keep the function taht takes a `long`?

##########
File path: lucene/core/src/java/org/apache/lucene/util/IntroSelector.java
##########
@@ -17,173 +17,202 @@
 package org.apache.lucene.util;
 
 import java.util.Comparator;
+import java.util.SplittableRandom;
 
 /**
- * Implementation of the quick select algorithm.
+ * Adaptive selection algorithm based on the introspective quick select 
algorithm. The quick select
+ * algorithm uses an interpolation variant of Tukey's ninther 
median-of-medians for pivot, and
+ * Bentley-McIlroy 3-way partitioning. For the introspective protection, it 
shuffles the sub-range
+ * if the max recursive depth is exceeded.
  *
- * <p>It uses the median of the first, middle and last values as a pivot and 
falls back to a median
- * of medians when the number of recursion levels exceeds {@code 2 lg(n)}, as 
a consequence it runs
- * in linear time on average.
+ * <p>This selection algorithm is fast on most data shapes, especially on 
nearly sorted data, or
+ * when k is close to the boundaries. It runs in linear time on average.
  *
  * @lucene.internal
  */
 public abstract class IntroSelector extends Selector {
 
+  // This selector is used repeatedly by the radix selector for sub-ranges of 
less than
+  // 100 entries. This means this selector is also optimized to be fast on 
small ranges.
+  // It uses the variant of medians-of-medians and 3-way partitioning, and 
finishes the
+  // last tiny range (3 entries or less) with a very specialized sort.
+
+  private SplittableRandom random;
+
   @Override
   public final void select(int from, int to, int k) {
     checkArgs(from, to, k);
-    final int maxDepth = 2 * MathUtil.log(to - from, 2);
-    quickSelect(from, to, k, maxDepth);
+    select(from, to, k, 2 * MathUtil.log(to - from, 2));
   }
 
-  int slowSelect(int from, int to, int k) {
-    return medianOfMediansSelect(from, to - 1, k);
-  }
+  // Visible for testing.
+  void select(int from, int to, int k, int maxDepth) {
+    // This code is inspired from IntroSorter#sort, adapted to loop on a 
single partition.
+
+    // For efficiency, we must enter the loop with at least 4 entries to be 
able to skip
+    // some boundary tests during the 3-way partitioning.
+    int size;
+    while ((size = to - from) > 3) {
 
-  int medianOfMediansSelect(int left, int right, int k) {
-    do {
-      // Defensive check, this is also checked in the calling
-      // method. Including here so this method can be used
-      // as a self contained quickSelect implementation.
-      if (left == right) {
-        return left;
+      if (--maxDepth == -1) {
+        // Max recursion depth exceeded: shuffle (only once) and continue.
+        shuffle(from, to);
       }
-      int pivotIndex = pivot(left, right);
-      pivotIndex = partition(left, right, k, pivotIndex);
-      if (k == pivotIndex) {
-        return k;
-      } else if (k < pivotIndex) {
-        right = pivotIndex - 1;
+
+      // Pivot selection based on medians.
+      int last = to - 1;
+      int mid = (from + last) >>> 1;
+      int bottomDistance = k - from;
+      int topDistance = to - k;

Review comment:
       maybe move these two computations under the `else` block, since they are 
not used in the `if` block?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to