gsmiller commented on code in PR #11738:
URL: https://github.com/apache/lucene/pull/11738#discussion_r966070400
##########
lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java:
##########
@@ -165,9 +143,46 @@ private WeightOrDocIdSet rewrite(LeafReaderContext
context) throws IOException {
PostingsEnum docs = null;
- final List<TermAndState> collectedTerms = new ArrayList<>();
- if (collectTerms(context, termsEnum, collectedTerms)) {
- // build a boolean query
+ // We will first try to collect up to 'threshold' terms into
'matchingTerms'
+ // if there are too many terms, we will fall back to building the
'builder'
Review Comment:
Thanks for the suggestion @rmuir. Let me see if I can use the existing code
structure a bit more in this change. The reason I didn't want to just call
`collectTerms` as-is is that we could unnecessarily seek and load term states
when we've already found a term covering all docs. For example, if the first
term we visit covers all terms, we can just stop there.
I'm also not sure I'm following your point about reallocating
`collectedTerms` as part of this change? That's certainly not my intention with
this code, but maybe I'm staring at a bug and not realizing it? As soon as we
hit the size threshold, we should be nulling out `collectTerms`, initializing a
building and just using that for the remaining term iteration. Apologies if I'm
overlooking something though. Entirely possible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]