gsmiller commented on code in PR #12055: URL: https://github.com/apache/lucene/pull/12055#discussion_r1060869958
########## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ########## @@ -183,23 +182,31 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException { } Query q = new ConstantScoreQuery(bq.build()); final Weight weight = searcher.rewrite(q).createWeight(searcher, scoreMode, score()); - return new WeightOrDocIdSet(weight); + return new WeightOrDocIdSetIterator(weight); } // Too many terms: go back to the terms we already collected and start building the bit set - DocIdSetBuilder builder = new DocIdSetBuilder(context.reader().maxDoc(), terms); + PriorityQueue<PostingsEnum> highFrequencyTerms = + new PriorityQueue<PostingsEnum>(collectedTerms.size()) { + @Override + protected boolean lessThan(PostingsEnum a, PostingsEnum b) { + return a.cost() < b.cost(); + } + }; + DocIdSetBuilder otherTerms = new DocIdSetBuilder(context.reader().maxDoc(), terms); Review Comment: minor: Could we define `otherTerms` closer to where it first gets used? (e.g., L:207) ########## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ########## @@ -211,32 +218,39 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException { new ConstantScoreQuery( new TermQuery(new Term(query.field, termsEnum.term()), termStates)); Weight weight = searcher.rewrite(q).createWeight(searcher, scoreMode, score()); - return new WeightOrDocIdSet(weight); + return new WeightOrDocIdSetIterator(weight); } - builder.add(docs); + PostingsEnum dropped = highFrequencyTerms.insertWithOverflow(postings); + otherTerms.add(dropped); + postings = dropped; } while (termsEnum.next() != null); - return new WeightOrDocIdSet(builder.build()); + List<DocIdSetIterator> disis = new ArrayList<>(highFrequencyTerms.size() + 1); + for (PostingsEnum pe : highFrequencyTerms) { + disis.add(pe); + } + disis.add(otherTerms.build().iterator()); + DisiPriorityQueue subs = new DisiPriorityQueue(disis.size()); + for (DocIdSetIterator disi : disis) { + subs.add(new DisiWrapper(disi)); + } Review Comment: Maybe I'm overlooking something silly, but can't we just do one pass like this? ```suggestion DisiPriorityQueue subs = new DisiPriorityQueue(highFrequencyTerms.size() + 1); for (DocIdSetIterator disi : highFrequencyTerms) { subs.add(new DisiWrapper(disi)); } subs.add(new DisiWrapper(otherTerms.build().iterator())); ``` ########## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ########## @@ -183,23 +182,31 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException { } Query q = new ConstantScoreQuery(bq.build()); final Weight weight = searcher.rewrite(q).createWeight(searcher, scoreMode, score()); - return new WeightOrDocIdSet(weight); + return new WeightOrDocIdSetIterator(weight); } // Too many terms: go back to the terms we already collected and start building the bit set Review Comment: Can we update the comments to more accurately reflect the new logic? We don't really start building the bit set here. ########## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ########## @@ -211,32 +218,39 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException { new ConstantScoreQuery( new TermQuery(new Term(query.field, termsEnum.term()), termStates)); Weight weight = searcher.rewrite(q).createWeight(searcher, scoreMode, score()); - return new WeightOrDocIdSet(weight); + return new WeightOrDocIdSetIterator(weight); } - builder.add(docs); + PostingsEnum dropped = highFrequencyTerms.insertWithOverflow(postings); + otherTerms.add(dropped); + postings = dropped; } while (termsEnum.next() != null); - return new WeightOrDocIdSet(builder.build()); + List<DocIdSetIterator> disis = new ArrayList<>(highFrequencyTerms.size() + 1); + for (PostingsEnum pe : highFrequencyTerms) { + disis.add(pe); + } + disis.add(otherTerms.build().iterator()); + DisiPriorityQueue subs = new DisiPriorityQueue(disis.size()); + for (DocIdSetIterator disi : disis) { + subs.add(new DisiWrapper(disi)); + } Review Comment: Also, it would be nice if we could get direct access to the underlying array backing `highFrequencyTerms`, then we could leverage `DisiPriorityQueue#addAll` to heapify everything at once. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org