jpountz commented on a change in pull request #430: URL: https://github.com/apache/lucene/pull/430#discussion_r750279415
########## File path: lucene/core/src/java/org/apache/lucene/util/MathUtil.java ########## @@ -24,13 +24,36 @@ // No instance: private MathUtil() {} + /** + * Returns {@code x <= 0 ? 0 : Math.floor(Math.log(x) / Math.log(base))} + * + * @param base must be {@code > 1} + */ + public static int log(int x, int base) { + if (base == 2) { + // This specialized method is 30x faster. + return x <= 0 ? 0 : 31 - Integer.numberOfLeadingZeros(x); + } else if (base <= 1) { + throw new IllegalArgumentException("base must be > 1"); + } + int ret = 0; + while (x >= base) { + x /= base; + ret++; + } + return ret; + } + Review comment: It looks like this specialization isn't bringing any significant performance, should we only keep the function taht takes a `long`? ########## File path: lucene/core/src/java/org/apache/lucene/util/IntroSelector.java ########## @@ -17,173 +17,202 @@ package org.apache.lucene.util; import java.util.Comparator; +import java.util.SplittableRandom; /** - * Implementation of the quick select algorithm. + * Adaptive selection algorithm based on the introspective quick select algorithm. The quick select + * algorithm uses an interpolation variant of Tukey's ninther median-of-medians for pivot, and + * Bentley-McIlroy 3-way partitioning. For the introspective protection, it shuffles the sub-range + * if the max recursive depth is exceeded. * - * <p>It uses the median of the first, middle and last values as a pivot and falls back to a median - * of medians when the number of recursion levels exceeds {@code 2 lg(n)}, as a consequence it runs - * in linear time on average. + * <p>This selection algorithm is fast on most data shapes, especially on nearly sorted data, or + * when k is close to the boundaries. It runs in linear time on average. * * @lucene.internal */ public abstract class IntroSelector extends Selector { + // This selector is used repeatedly by the radix selector for sub-ranges of less than + // 100 entries. This means this selector is also optimized to be fast on small ranges. + // It uses the variant of medians-of-medians and 3-way partitioning, and finishes the + // last tiny range (3 entries or less) with a very specialized sort. + + private SplittableRandom random; + @Override public final void select(int from, int to, int k) { checkArgs(from, to, k); - final int maxDepth = 2 * MathUtil.log(to - from, 2); - quickSelect(from, to, k, maxDepth); + select(from, to, k, 2 * MathUtil.log(to - from, 2)); } - int slowSelect(int from, int to, int k) { - return medianOfMediansSelect(from, to - 1, k); - } + // Visible for testing. + void select(int from, int to, int k, int maxDepth) { + // This code is inspired from IntroSorter#sort, adapted to loop on a single partition. + + // For efficiency, we must enter the loop with at least 4 entries to be able to skip + // some boundary tests during the 3-way partitioning. + int size; + while ((size = to - from) > 3) { - int medianOfMediansSelect(int left, int right, int k) { - do { - // Defensive check, this is also checked in the calling - // method. Including here so this method can be used - // as a self contained quickSelect implementation. - if (left == right) { - return left; + if (--maxDepth == -1) { + // Max recursion depth exceeded: shuffle (only once) and continue. + shuffle(from, to); } - int pivotIndex = pivot(left, right); - pivotIndex = partition(left, right, k, pivotIndex); - if (k == pivotIndex) { - return k; - } else if (k < pivotIndex) { - right = pivotIndex - 1; + + // Pivot selection based on medians. + int last = to - 1; + int mid = (from + last) >>> 1; + int bottomDistance = k - from; + int topDistance = to - k; Review comment: maybe move these two computations under the `else` block, since they are not used in the `if` block? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org