stefanvodita commented on code in PR #13914: URL: https://github.com/apache/lucene/pull/13914#discussion_r1817835241
########## lucene/core/src/java/org/apache/lucene/util/WeightedSelector.java: ########## @@ -0,0 +1,407 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.util.Arrays; +import java.util.Comparator; +import java.util.SplittableRandom; + +/** + * Adaptive selection algorithm based on the introspective quick select algorithm. The quick select + * algorithm uses an interpolation variant of Tukey's ninther median-of-medians for pivot, and + * Bentley-McIlroy 3-way partitioning. For the introspective protection, it shuffles the sub-range + * if the max recursive depth is exceeded. + * + * <p>This selection algorithm is fast on most data shapes, especially on nearly sorted data, or + * when k is close to the boundaries. It runs in linear time on average. + * + * @lucene.internal + */ +public abstract class WeightedSelector { + + // This selector is used repeatedly by the radix selector for sub-ranges of less than + // 100 entries. This means this selector is also optimized to be fast on small ranges. + // It uses the variant of medians-of-medians and 3-way partitioning, and finishes the + // last tiny range (3 entries or less) with a very specialized sort. + + private SplittableRandom random; + + protected abstract long getWeight(int i); + + protected abstract long getValue(int i); + + public final WeightRangeInfo[] select( + int from, + int to, + long rangeTotalValue, + long beforeTotalValue, + long rangeWeight, + long beforeWeight, + double[] kWeights) { Review Comment: Does it make sense to replace `k` with `quantile` maybe? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org