[GitHub] [lucene] gsmiller commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

GitBox Thu, 08 Sep 2022 07:51:35 -0700


gsmiller commented on code in PR #11738:
URL: https://github.com/apache/lucene/pull/11738#discussion_r966070400



##########
lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java:
##########
@@ -165,9 +143,46 @@ private WeightOrDocIdSet rewrite(LeafReaderContext 
context) throws IOException {
 
         PostingsEnum docs = null;
 
-        final List<TermAndState> collectedTerms = new ArrayList<>();
-        if (collectTerms(context, termsEnum, collectedTerms)) {
-          // build a boolean query
+        // We will first try to collect up to 'threshold' terms into 
'matchingTerms'
+        // if there are too many terms, we will fall back to building the 
'builder'

Review Comment:
   Thanks for the suggestion @rmuir. Let me see if I can use the existing code 
structure a bit more in this change. The reason I didn't want to just call 
`collectTerms` as-is is that we could unnecessarily seek and load term states 
when we've already found a term covering all docs. For example, if the first 
term we visit covers all terms, we can just stop there.
   
   I'm also not sure I'm following your point about reallocating 
`collectedTerms` as part of this change? That's certainly not my intention with 
this code, but maybe I'm staring at a bug and not realizing it? As soon as we 
hit the size threshold, we should be nulling out `collectTerms`, initializing a 
building and just using that for the remaining term iteration. Apologies if I'm 
overlooking something though. Entirely possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

Reply via email to