HoustonPutman commented on code in PR #13914: URL: https://github.com/apache/lucene/pull/13914#discussion_r1953282099
########## lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java: ########## @@ -202,66 +208,83 @@ public SegmentOutput(int hitsLength) { * is used to compute the equi-weight per bin. */ public static List<DynamicRangeInfo> computeDynamicNumericRanges( - long[] values, long[] weights, int len, long totalWeight, int topN) { + long[] values, long[] weights, int len, long totalValue, long totalWeight, int topN) { assert values.length == weights.length && len <= values.length && len >= 0; assert topN >= 0; List<DynamicRangeInfo> dynamicRangeResult = new ArrayList<>(); if (len == 0 || topN == 0) { return dynamicRangeResult; } - new InPlaceMergeSorter() { - @Override - protected int compare(int index1, int index2) { - int cmp = Long.compare(values[index1], values[index2]); - if (cmp == 0) { - // If the values are equal, sort based on the weights. - // Any weight order is correct as long as it's deterministic. - return Long.compare(weights[index1], weights[index2]); - } - return cmp; - } + double rangeWeightTarget = (double) totalWeight / topN; + double[] kWeights = new double[topN]; + for (int i = 0; i < topN; i++) { + kWeights[i] = (i == 0 ? 0 : kWeights[i - 1]) + rangeWeightTarget; + } - @Override - protected void swap(int index1, int index2) { - long tmp = values[index1]; - values[index1] = values[index2]; - values[index2] = tmp; - tmp = weights[index1]; - weights[index1] = weights[index2]; - weights[index2] = tmp; - } - }.sort(0, len); + WeightedSelector.WeightRangeInfo[] kIndexResults = + new WeightedSelector() { + private long pivotValue; + private long pivotWeight; - long accuWeight = 0; - long valueSum = 0; - int count = 0; - int minIdx = 0; + @Override + protected long getWeight(int i) { + return weights[i]; + } - double rangeWeightTarget = (double) totalWeight / Math.min(topN, len); + @Override + protected long getValue(int i) { + return values[i]; + } - for (int i = 0; i < len; i++) { - accuWeight += weights[i]; - valueSum += values[i]; - count++; + @Override + protected void swap(int index1, int index2) { + long tmp = values[index1]; + values[index1] = values[index2]; + values[index2] = tmp; + tmp = weights[index1]; + weights[index1] = weights[index2]; + weights[index2] = tmp; + } - if (accuWeight >= rangeWeightTarget) { + @Override + protected void setPivot(int i) { + pivotValue = values[i]; + pivotWeight = weights[i]; + } + + @Override + protected int comparePivot(int j) { + int cmp = Long.compare(pivotValue, values[j]); + if (cmp == 0) { + // If the values are equal, sort based on the weights. + // Any weight order is correct as long as it's deterministic. + return Long.compare(pivotWeight, weights[j]); + } + return cmp; + } + }.select(0, len, totalValue, 0, totalWeight, 0, kWeights); + + int lastIdx = -1; + long lastTotalValue = 0; + long lastTotalWeight = 0; + for (int kIdx = 0; kIdx < topN; kIdx++) { Review Comment: So it's actually simpler than I thought. Basically we already have the next minimum value if the quantile is found in the last value of the `bottom` group, because it is our pivot value. Same for the `top` group, if the last value of it is our quantile maximum, then either it is the last value in the list (at which point there is no next-minimum to find), or it is under some other pivot value we have found (so no need to find it, the pivot is already sorted correctly). The only time we need to do something is when the last pivot value is the end of a quantile, then we need to find the bottom of the `top` group. So we pass to `select()` that we want to select the minimum in that range. In the `select()` we only need to find the minimum for the `bottom` group after pivoting. If this `bottom` group contains a quantile-end, then just recursively tell it to find the minimum. If the `bottom` group does not contain a quantile-end, then we won't be touching these values again. So go through them and find the minimum, swapping it into the `from` position. So just a little more work for the algorithm, adding maybe `O(log(n) * k)` time. That's just a guess though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org