rmuir commented on code in PR #12055: URL: https://github.com/apache/lucene/pull/12055#discussion_r1059807197
########## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ########## @@ -183,23 +182,31 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException { } Query q = new ConstantScoreQuery(bq.build()); final Weight weight = searcher.rewrite(q).createWeight(searcher, scoreMode, score()); - return new WeightOrDocIdSet(weight); + return new WeightOrDocIdSetIterator(weight); } // Too many terms: go back to the terms we already collected and start building the bit set - DocIdSetBuilder builder = new DocIdSetBuilder(context.reader().maxDoc(), terms); + PriorityQueue<PostingsEnum> highFrequencyTerms = + new PriorityQueue<PostingsEnum>(collectedTerms.size()) { + @Override + protected boolean lessThan(PostingsEnum a, PostingsEnum b) { + return a.cost() < b.cost(); + } + }; + DocIdSetBuilder otherTerms = new DocIdSetBuilder(context.reader().maxDoc(), terms); if (collectedTerms.isEmpty() == false) { TermsEnum termsEnum2 = terms.iterator(); for (TermAndState t : collectedTerms) { termsEnum2.seekExact(t.term, t.state); - docs = termsEnum2.postings(docs, PostingsEnum.NONE); - builder.add(docs); + PostingsEnum postings = termsEnum2.postings(null, PostingsEnum.NONE); + highFrequencyTerms.add(postings); Review Comment: Rather than just blindly add terms to the PQ, should we just have a constant mininum `cost` threshold (e.g. 256, 1024, whatever) to even consider it? otherwise go directly to `otherTerms`. The skipping stuff isn't going to be useful for the long-tail of low-cost terms (the majority, if we are thinking zipf). Ideally we wouldnt waste our time unless it has skipdata? And we want to be careful about the performance of these queries when there are jazillions of jazillions of matching low-frequency terms. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org