jainankitk commented on code in PR #14421: URL: https://github.com/apache/lucene/pull/14421#discussion_r2019647007
########## lucene/sandbox/src/test/org/apache/lucene/sandbox/facet/plain/histograms/TestHistogramCollectorManager.java: ########## @@ -137,6 +140,23 @@ private void doTestSkipIndex(IndexWriterConfig cfg) throws IOException { IllegalStateException.class, () -> searcher.search(new MatchAllDocsQuery(), new HistogramCollectorManager("f", 4, 1))); + // Create a query so that bucket "1" (values from 4 to 8), which is in the middle of the range, + // doesn't match any docs. HistogramCollector should not add an entry with a count of 0 in this + // case. + Query query = + new BooleanQuery.Builder() + .add(NumericDocValuesField.newSlowRangeQuery("f", Long.MIN_VALUE, 2), Occur.SHOULD) + .add(NumericDocValuesField.newSlowRangeQuery("f", 10, Long.MAX_VALUE), Occur.SHOULD) + .build(); + actualCounts = searcher.search(query, new HistogramCollectorManager("f", 4)); Review Comment: Initially, I was wondering if `checkMaxBuckets(collectorCounts.size(), maxBuckets)` might cause some of the tests earlier expecting exception to fail. But, all the tests with `counts[i] == 0`, use `1024` as `maxBuckets` which is comfortably over `collectorCounts.size()` Maybe, we can specify 3 as `maxBuckets` for even stronger condition: ``` searcher.search(query, new HistogramCollectorManager("f", 4, 3)); ``` ########## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java: ########## @@ -279,7 +279,9 @@ public void collect(DocIdStream stream) throws IOException { public void finish() throws IOException { // Put counts that we computed in the int[] back into the hash map. for (int i = 0; i < counts.length; ++i) { - collectorCounts.addTo(leafMinBucket + i, counts[i]); + if (counts[i] != 0) { + collectorCounts.addTo(leafMinBucket + i, counts[i]); + } } checkMaxBuckets(collectorCounts.size(), maxBuckets); Review Comment: While unrelated to this change, I am wondering if we should check eagerly to prevent unnecessary iterations: ``` if (counts[i] != 0) { collectorCounts.addTo(leafMinBucket + i, counts[i]); checkMaxBuckets(collectorCounts.size(), maxBuckets); } ``` We are doing similar validation in other places: ``` if (bucket != prevBucket) { counts.addTo(bucket, 1); checkMaxBuckets(counts.size(), maxBuckets); prevBucket = bucket; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org