houserjohn commented on PR #13914: URL: https://github.com/apache/lucene/pull/13914#issuecomment-2658101119
Hey @HoustonPutman, I just published [GH#14238](https://github.com/apache/lucene/pull/14238) which contains all of the unit tests that I've created so far. Note that there was a slight API change between the main branch and this PR, so I included some unit tests that work for this PR below. While some of these tests are not considering the caveat (the change in behavior) you mentioned, I believe there are a few unit tests that capture a few existing issues. For instance: ``` public void testComputeDynamicNumericRangesWithLargeTopN() { List<DynamicRangeUtil.DynamicRangeInfo> expectedRangeInfoList = new ArrayList<>(); long[] values = new long[] {487, 439, 794, 277}; long[] weights = new long[] {59, 508, 736, 560}; expectedRangeInfoList.add(new DynamicRangeUtil.DynamicRangeInfo(1, 560L, 277L, 277L, 277D)); expectedRangeInfoList.add(new DynamicRangeUtil.DynamicRangeInfo(1, 508L, 439L, 439L, 439D)); expectedRangeInfoList.add(new DynamicRangeUtil.DynamicRangeInfo(2, 795L, 487L, 794L, 640.5D)); assertDynamicNumericRangeResults(values, weights, 42, 1997L, 1863L, expectedRangeInfoList); } ``` Gives the exception: > java.lang.IllegalArgumentException: All kWeights must be < beforeWeight + rangeWeight at __randomizedtesting.SeedInfo.seed([913AAD1D60B9263B:FFD4EE9EA025DBF]:0) at org.apache.lucene.core@11.0.0-SNAPSHOT/org.apache.lucene.util.WeightedSelector.checkArgs(WeightedSelector.java:82) at org.apache.lucene.core@11.0.0-SNAPSHOT/org.apache.lucene.util.WeightedSelector.select(WeightedSelector.java:57) at org.apache.lucene.facet.range.DynamicRangeUtil.computeDynamicNumericRanges(DynamicRangeUtil.java:266) at org.apache.lucene.facet.range.TestDynamicRangeUtil.testComputeDynamicNumericRangesWithLargeTopN(TestDynamicRangeUtil.java:169) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) I've tried to track down this bug, and I haven't quite fixed it, but I believe the fix is related to these lines: ``` --- a/lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java +++ b/lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java @@ -216,7 +216,7 @@ public final class DynamicRangeUtil { return dynamicRangeResult; } - double rangeWeightTarget = (double) totalWeight / topN; + double rangeWeightTarget = (double) totalWeight / Math.min(topN, len); double[] kWeights = new double[topN]; for (int i = 0; i < topN; i++) { kWeights[i] = (i == 0 ? 0 : kWeights[i - 1]) + rangeWeightTarget; -- ``` Additionally: ``` public void testComputeDynamicNumericRangesWithSameWeights() { List<DynamicRangeUtil.DynamicRangeInfo> expectedRangeInfoList = new ArrayList<>(); long totalValue = 0; long[] values = new long[100]; long[] weights = new long[100]; for (int i = 0; i < 100; i++) { values[i] = i; weights[i] = 50; totalValue += i; } expectedRangeInfoList.add(new DynamicRangeUtil.DynamicRangeInfo(25, 1250L, 0L, 24L, 12.0D)); expectedRangeInfoList.add(new DynamicRangeUtil.DynamicRangeInfo(25, 1250L, 25L, 49L, 37.0D)); expectedRangeInfoList.add(new DynamicRangeUtil.DynamicRangeInfo(25, 1250L, 50L, 74L, 62.0D)); expectedRangeInfoList.add(new DynamicRangeUtil.DynamicRangeInfo(25, 1250L, 75L, 99L, 87.0D)); assertDynamicNumericRangeResults(values, weights, 4, totalValue, 5000L, expectedRangeInfoList); } ``` Gives (Important values marked with **): ``` java.lang.AssertionError: expected:<[DynamicRangeInfo[count=**25**, weight=1250, min=0, max=24, centroid=12.0], DynamicRangeInfo[count=25, weight=1250, min=25, max=49, centroid=37.0], DynamicRangeInfo[count=25, weight=1250, min=50, max=74, centroid=62.0], DynamicRangeInfo[count=25, weight=1250, min=75, max=99, centroid=87.0]]> but was:<[DynamicRangeInfo[count=**26**, weight=1300, min=0, max=25, centroid=12.5], DynamicRangeInfo[count=25, weight=1250, min=26, max=50, centroid=38.0], DynamicRangeInfo[count=25, weight=1250, min=51, max=75, centroid=63.0], DynamicRangeInfo[count=24, weight=1200, min=76, max=99, centroid=87.5]]> at __randomizedtesting.SeedInfo.seed([DA6EB4C0C0CA4022:DBA0A4538AB03899]:0) at junit@4.13.1/org.junit.Assert.fail(Assert.java:89) at junit@4.13.1/org.junit.Assert.failNotEquals(Assert.java:835) at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:120) at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:146) at org.apache.lucene.facet.range.TestDynamicRangeUtil.compareDynamicRangeResult(TestDynamicRangeUtil.java:361) at org.apache.lucene.facet.range.TestDynamicRangeUtil.assertDynamicNumericRangeResults(TestDynamicRangeUtil.java:351) at org.apache.lucene.facet.range.TestDynamicRangeUtil.testComputeDynamicNumericRangesWithSameWeights(TestDynamicRangeUtil.java:156) ``` I know you mentioned there is a change in behavior in the caveat, but I do believe that this example should probably return ranges with equal counts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org