jtibshirani commented on code in PR #910:
URL: https://github.com/apache/lucene/pull/910#discussion_r878568768


##########
lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestCombinedFieldQuery.java:
##########
@@ -589,4 +589,52 @@ public SimScorer scorer(
       return new BM25Similarity().scorer(boost, collectionStats, termStats);
     }
   }
+
+  public void testDistributedCollectionStatistics() throws IOException {
+    Directory dir = newDirectory();
+    IndexWriterConfig iwc = new IndexWriterConfig();
+    iwc.setSimilarity(randomCompatibleSimilarity());
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir, iwc);
+
+    String queryString = "foo";
+
+    Document doc0 = new Document();
+    doc0.add(new TextField("f", "foo", Store.NO));
+    doc0.add(new TextField("g", "foo baz", Store.NO));
+    w.addDocument(doc0);
+
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher =
+        new IndexSearcher(reader) {
+          @Override
+          public CollectionStatistics collectionStatistics(String field) 
throws IOException {
+            CollectionStatistics shardStatistics = 
super.collectionStatistics(field);
+            int extraMaxDoc = randomIntBetween(0, 10);
+            int extraDocCount = randomIntBetween(0, extraMaxDoc);
+            int extraSumDocFreq = extraDocCount + randomIntBetween(0, 10);
+            int extraSumTotalTermFreq = extraSumDocFreq + randomIntBetween(0, 
10);
+            CollectionStatistics globalStatistics =
+                new CollectionStatistics(
+                    field,
+                    shardStatistics.maxDoc() + extraMaxDoc,
+                    shardStatistics.docCount() + extraDocCount,
+                    shardStatistics.sumTotalTermFreq() + extraSumTotalTermFreq,
+                    shardStatistics.sumDocFreq() + extraSumDocFreq);
+            return globalStatistics;
+          }
+        };
+    searcher.setSimilarity(new BM25Similarity());
+    CombinedFieldQuery query =
+        new CombinedFieldQuery.Builder()
+            .addField("f")
+            .addField("g")
+            .addTerm(new BytesRef(queryString))
+            .build();
+    // just check that search does not fail
+    searcher.search(query, 10);

Review Comment:
   It'd be nice to assert something stronger here, to check that 
`CombinedFieldQuery` still works as expected when collection stats are 
overridden. Maybe we could compare the output of two query strategies like we 
do in `testCopyField`.



##########
lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestCombinedFieldQuery.java:
##########
@@ -589,4 +589,52 @@ public SimScorer scorer(
       return new BM25Similarity().scorer(boost, collectionStats, termStats);
     }
   }
+
+  public void testDistributedCollectionStatistics() throws IOException {
+    Directory dir = newDirectory();
+    IndexWriterConfig iwc = new IndexWriterConfig();
+    iwc.setSimilarity(randomCompatibleSimilarity());
+    RandomIndexWriter w = new RandomIndexWriter(random(), dir, iwc);
+
+    String queryString = "foo";
+
+    Document doc0 = new Document();
+    doc0.add(new TextField("f", "foo", Store.NO));
+    doc0.add(new TextField("g", "foo baz", Store.NO));
+    w.addDocument(doc0);
+
+    IndexReader reader = w.getReader();
+    IndexSearcher searcher =
+        new IndexSearcher(reader) {
+          @Override
+          public CollectionStatistics collectionStatistics(String field) 
throws IOException {
+            CollectionStatistics shardStatistics = 
super.collectionStatistics(field);
+            int extraMaxDoc = randomIntBetween(0, 10);
+            int extraDocCount = randomIntBetween(0, extraMaxDoc);
+            int extraSumDocFreq = extraDocCount + randomIntBetween(0, 10);
+            int extraSumTotalTermFreq = extraSumDocFreq + randomIntBetween(0, 
10);
+            CollectionStatistics globalStatistics =
+                new CollectionStatistics(
+                    field,
+                    shardStatistics.maxDoc() + extraMaxDoc,
+                    shardStatistics.docCount() + extraDocCount,
+                    shardStatistics.sumTotalTermFreq() + extraSumTotalTermFreq,
+                    shardStatistics.sumDocFreq() + extraSumDocFreq);
+            return globalStatistics;
+          }
+        };
+    searcher.setSimilarity(new BM25Similarity());

Review Comment:
   It's unusual to search with a different similarity than was used during 
indexing -- I think we could remove this line.



##########
lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestCombinedFieldQuery.java:
##########
@@ -589,4 +589,52 @@ public SimScorer scorer(
       return new BM25Similarity().scorer(boost, collectionStats, termStats);
     }
   }
+
+  public void testDistributedCollectionStatistics() throws IOException {

Review Comment:
   Small comment, maybe we could call this `testOverrideCollectionStatistics`? 
Lucene doesn't really have a native concept of "distributed collection 
statistics" (as far as I'm aware) and this test doesn't really use that concept 
anyway?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to