jainankitk commented on code in PR #14421:
URL: https://github.com/apache/lucene/pull/14421#discussion_r2019647007


##########
lucene/sandbox/src/test/org/apache/lucene/sandbox/facet/plain/histograms/TestHistogramCollectorManager.java:
##########
@@ -137,6 +140,23 @@ private void doTestSkipIndex(IndexWriterConfig cfg) throws 
IOException {
         IllegalStateException.class,
         () -> searcher.search(new MatchAllDocsQuery(), new 
HistogramCollectorManager("f", 4, 1)));
 
+    // Create a query so that bucket "1" (values from 4 to 8), which is in the 
middle of the range,
+    // doesn't match any docs. HistogramCollector should not add an entry with 
a count of 0 in this
+    // case.
+    Query query =
+        new BooleanQuery.Builder()
+            .add(NumericDocValuesField.newSlowRangeQuery("f", Long.MIN_VALUE, 
2), Occur.SHOULD)
+            .add(NumericDocValuesField.newSlowRangeQuery("f", 10, 
Long.MAX_VALUE), Occur.SHOULD)
+            .build();
+    actualCounts = searcher.search(query, new HistogramCollectorManager("f", 
4));

Review Comment:
   Initially, I was wondering if `checkMaxBuckets(collectorCounts.size(), 
maxBuckets)` might cause some of the tests earlier expecting exception to fail. 
But, all the tests with `counts[i] == 0`, use `1024` as `maxBuckets` which is 
comfortably over `collectorCounts.size()`
   
   Maybe, we can specify 3 as `maxBuckets` for even stronger condition:
   
   ```
   searcher.search(query, new HistogramCollectorManager("f", 4, 3));
   ```



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java:
##########
@@ -279,7 +279,9 @@ public void collect(DocIdStream stream) throws IOException {
     public void finish() throws IOException {
       // Put counts that we computed in the int[] back into the hash map.
       for (int i = 0; i < counts.length; ++i) {
-        collectorCounts.addTo(leafMinBucket + i, counts[i]);
+        if (counts[i] != 0) {
+          collectorCounts.addTo(leafMinBucket + i, counts[i]);
+        }
       }
       checkMaxBuckets(collectorCounts.size(), maxBuckets);

Review Comment:
   While unrelated to this change, I am wondering if we should check eagerly to 
prevent unnecessary iterations:
   
   ```
   if (counts[i] != 0) {
     collectorCounts.addTo(leafMinBucket + i, counts[i]);
     checkMaxBuckets(collectorCounts.size(), maxBuckets);
   }
   ```
   
   We are doing similar validation in other places:
   
   ```
   if (bucket != prevBucket) {
     counts.addTo(bucket, 1);
     checkMaxBuckets(counts.size(), maxBuckets);
     prevBucket = bucket;
   }
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to