jtibshirani commented on code in PR #910: URL: https://github.com/apache/lucene/pull/910#discussion_r878568768
########## lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestCombinedFieldQuery.java: ########## @@ -589,4 +589,52 @@ public SimScorer scorer( return new BM25Similarity().scorer(boost, collectionStats, termStats); } } + + public void testDistributedCollectionStatistics() throws IOException { + Directory dir = newDirectory(); + IndexWriterConfig iwc = new IndexWriterConfig(); + iwc.setSimilarity(randomCompatibleSimilarity()); + RandomIndexWriter w = new RandomIndexWriter(random(), dir, iwc); + + String queryString = "foo"; + + Document doc0 = new Document(); + doc0.add(new TextField("f", "foo", Store.NO)); + doc0.add(new TextField("g", "foo baz", Store.NO)); + w.addDocument(doc0); + + IndexReader reader = w.getReader(); + IndexSearcher searcher = + new IndexSearcher(reader) { + @Override + public CollectionStatistics collectionStatistics(String field) throws IOException { + CollectionStatistics shardStatistics = super.collectionStatistics(field); + int extraMaxDoc = randomIntBetween(0, 10); + int extraDocCount = randomIntBetween(0, extraMaxDoc); + int extraSumDocFreq = extraDocCount + randomIntBetween(0, 10); + int extraSumTotalTermFreq = extraSumDocFreq + randomIntBetween(0, 10); + CollectionStatistics globalStatistics = + new CollectionStatistics( + field, + shardStatistics.maxDoc() + extraMaxDoc, + shardStatistics.docCount() + extraDocCount, + shardStatistics.sumTotalTermFreq() + extraSumTotalTermFreq, + shardStatistics.sumDocFreq() + extraSumDocFreq); + return globalStatistics; + } + }; + searcher.setSimilarity(new BM25Similarity()); + CombinedFieldQuery query = + new CombinedFieldQuery.Builder() + .addField("f") + .addField("g") + .addTerm(new BytesRef(queryString)) + .build(); + // just check that search does not fail + searcher.search(query, 10); Review Comment: It'd be nice to assert something stronger here, to check that `CombinedFieldQuery` still works as expected when collection stats are overridden. Maybe we could compare the output of two query strategies like we do in `testCopyField`. ########## lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestCombinedFieldQuery.java: ########## @@ -589,4 +589,52 @@ public SimScorer scorer( return new BM25Similarity().scorer(boost, collectionStats, termStats); } } + + public void testDistributedCollectionStatistics() throws IOException { + Directory dir = newDirectory(); + IndexWriterConfig iwc = new IndexWriterConfig(); + iwc.setSimilarity(randomCompatibleSimilarity()); + RandomIndexWriter w = new RandomIndexWriter(random(), dir, iwc); + + String queryString = "foo"; + + Document doc0 = new Document(); + doc0.add(new TextField("f", "foo", Store.NO)); + doc0.add(new TextField("g", "foo baz", Store.NO)); + w.addDocument(doc0); + + IndexReader reader = w.getReader(); + IndexSearcher searcher = + new IndexSearcher(reader) { + @Override + public CollectionStatistics collectionStatistics(String field) throws IOException { + CollectionStatistics shardStatistics = super.collectionStatistics(field); + int extraMaxDoc = randomIntBetween(0, 10); + int extraDocCount = randomIntBetween(0, extraMaxDoc); + int extraSumDocFreq = extraDocCount + randomIntBetween(0, 10); + int extraSumTotalTermFreq = extraSumDocFreq + randomIntBetween(0, 10); + CollectionStatistics globalStatistics = + new CollectionStatistics( + field, + shardStatistics.maxDoc() + extraMaxDoc, + shardStatistics.docCount() + extraDocCount, + shardStatistics.sumTotalTermFreq() + extraSumTotalTermFreq, + shardStatistics.sumDocFreq() + extraSumDocFreq); + return globalStatistics; + } + }; + searcher.setSimilarity(new BM25Similarity()); Review Comment: It's unusual to search with a different similarity than was used during indexing -- I think we could remove this line. ########## lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestCombinedFieldQuery.java: ########## @@ -589,4 +589,52 @@ public SimScorer scorer( return new BM25Similarity().scorer(boost, collectionStats, termStats); } } + + public void testDistributedCollectionStatistics() throws IOException { Review Comment: Small comment, maybe we could call this `testOverrideCollectionStatistics`? Lucene doesn't really have a native concept of "distributed collection statistics" (as far as I'm aware) and this test doesn't really use that concept anyway? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org