[GitHub] [lucene] rmuir commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

GitBox Wed, 07 Sep 2022 19:20:27 -0700


rmuir commented on code in PR #11738:
URL: https://github.com/apache/lucene/pull/11738#discussion_r965442751



##########
lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java:
##########
@@ -165,9 +143,46 @@ private WeightOrDocIdSet rewrite(LeafReaderContext 
context) throws IOException {
 
         PostingsEnum docs = null;
 
-        final List<TermAndState> collectedTerms = new ArrayList<>();
-        if (collectTerms(context, termsEnum, collectedTerms)) {
-          // build a boolean query
+        // We will first try to collect up to 'threshold' terms into 
'matchingTerms'
+        // if there are too many terms, we will fall back to building the 
'builder'

Review Comment:
   But this comment isn't really what happens. Instead, if we hit the threshold 
(16), we just null out our ArrayList of `collectedTerms`, and continue to 
iterate through the same loop and allocate another ArrayList of 
`collectedTerms` for the next 16 terms. and over and over again.
   
   So if there are millions of terms, we add a lot more allocations and GC 
pressure to do this optimization. I would prefer if we would "really bail" 
after we hit the limit, so it behaves as fast as before for the "huge numbers 
of terms" case. Seems like handling the case for `if (builder != null)` in the 
if-then-else decision tree here might do the trick?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

Reply via email to