[GitHub] [lucene] rmuir commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

GitBox Wed, 07 Sep 2022 19:39:07 -0700


rmuir commented on code in PR #11738:
URL: https://github.com/apache/lucene/pull/11738#discussion_r965449782



##########
lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java:
##########
@@ -165,9 +143,46 @@ private WeightOrDocIdSet rewrite(LeafReaderContext 
context) throws IOException {
 
         PostingsEnum docs = null;
 
-        final List<TermAndState> collectedTerms = new ArrayList<>();
-        if (collectTerms(context, termsEnum, collectedTerms)) {
-          // build a boolean query
+        // We will first try to collect up to 'threshold' terms into 
'matchingTerms'
+        // if there are too many terms, we will fall back to building the 
'builder'

Review Comment:
   by the way, i think we could make the change more safely (performance wise), 
to just use the existing code structure, where we call collectTerms() and so 
on. It has been optimized over the years.
   
   We can just add a simple check instead to be more conservative?:
   ```
           if (collectTerms(context, termsEnum, collectedTerms)) {
             // build a boolean query
             BooleanQuery.Builder bq = new BooleanQuery.Builder();
             for (TermAndState t : collectedTerms) {
   +          // optimize terms that match all documents
   +          if (t.docFreq == reader.maxDoc()) {
   +            return new WeightOrDocIdSet(DocIdSet.all(reader.maxDoc()));
   +          }
               final TermStates termStates = new 
TermStates(searcher.getTopReaderContext());
               termStates.register(t.state, context.ord, t.docFreq, 
t.totalTermFreq);
               bq.add(new TermQuery(new Term(query.field, t.term), termStates), 
Occur.SHOULD);
             }
             Query q = new ConstantScoreQuery(bq.build());
             final Weight weight = searcher.rewrite(q).createWeight(searcher, 
scoreMode, score());
             return new WeightOrDocIdSet(weight);
           }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

Reply via email to