jpountz opened a new issue, #11915:
URL: https://github.com/apache/lucene/issues/11915

   ### Description
   
   Lucene's abstractions are good at dealing with long runs of documents that 
do not match a query, but much less at dealing with long runs of documents that 
match a query. In such cases, Lucene still needs to linearly scan these matches 
and do some amount of work for each of the matches.
   
   Is it actually common to have long runs of matches? For full-text indexes, 
maybe not so much, only stop words may have runs of adjacent matches. For 
string fields, this may happen if the field has a default value that is the 
value of most documents in the collection. Also it's possible for users to use 
index sorting in order to cluster similar documents together, which increases 
the likelihood to have long runs of adjacent matches.
   
   One idea would be to augment the `DocIdSetIterator` API to add a new 
`peekNextNonMatchingDocID` method, which would return the next doc ID that may 
not be a match. The default implementation could return `docID() + 1`. We'd 
need to implement this API in postings, doc-value iterators and a few other 
important `DocIdSetIterators` like `BitSetIterator` and `DocIdSetIterator#all`. 
And propagate this information in disjunctions and conjunctions. Then we could 
leverage this API in a number of places:
    - `ReqExclScorer` could ask the prohibited clause for the next doc ID that 
is worth evaluating on the required clause.
    - Conjunctions could ignore non-scoring clauses over ranges of doc IDs that 
all match.
    - `FixedBitSet#or` could set ranges of doc IDs at once, which would in-turn 
speed up `DocIdSetBuilder`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to