jtibshirani commented on code in PR #910: URL: https://github.com/apache/lucene/pull/910#discussion_r881050412
########## lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestCombinedFieldQuery.java: ########## @@ -589,4 +589,97 @@ public SimScorer scorer( return new BM25Similarity().scorer(boost, collectionStats, termStats); } } + + public void testOverrideCollectionStatistics() throws IOException { + Directory dir = newDirectory(); + IndexWriterConfig iwc = new IndexWriterConfig(); + Similarity similarity = randomCompatibleSimilarity(); + iwc.setSimilarity(similarity); + RandomIndexWriter w = new RandomIndexWriter(random(), dir, iwc); + + int numMatch = atLeast(10); + for (int i = 0; i < numMatch; i++) { + Document doc = new Document(); + if (random().nextBoolean()) { + doc.add(new TextField("a", "baz", Store.NO)); + doc.add(new TextField("b", "baz", Store.NO)); + for (int k = 0; k < 2; k++) { + doc.add(new TextField("ab", "baz", Store.NO)); + } + w.addDocument(doc); + doc.clear(); + } + int freqA = random().nextInt(5) + 1; + for (int j = 0; j < freqA; j++) { + doc.add(new TextField("a", "foo", Store.NO)); + } + int freqB = random().nextInt(5) + 1; + for (int j = 0; j < freqB; j++) { + doc.add(new TextField("b", "foo", Store.NO)); + } + int freqAB = freqA + freqB; + for (int j = 0; j < freqAB; j++) { + doc.add(new TextField("ab", "foo", Store.NO)); + } + w.addDocument(doc); + } + + IndexReader reader = w.getReader(); + + int extraMaxDoc = randomIntBetween(0, 10); + int extraDocCount = randomIntBetween(0, extraMaxDoc); + + int extraSumDocFreqA = extraDocCount + randomIntBetween(0, 10); Review Comment: I think it'd make more sense to have a single `sumDocFreq` here. This represents the number of unique term-document pairs, and we can't just add the values across different fields. In fact `CombinedFieldQuery` chooses to take a maximum of the `sumDocFreq`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org