[PR] Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. [lucene]

via GitHub Thu, 22 May 2025 04:47:41 -0700


jpountz opened a new pull request, #14701:
URL: https://github.com/apache/lucene/pull/14701


   `MaxScoreBulkScorer` and `BlockMaxConjunctionBulkScorer` currently evaluate 
hits in a doc-at-a-time (DAAT) fashion, meaning that they they look at all 
their clauses to find the next doc and so forth until all docs from the window 
are evaluated. This changes evaluation to run in a more term-at-a-time fashion 
(TAAT) within scoring windows, meaning that each clause is fully evaluated 
within the window before moving on to the next clause.
   
   Note that this isn't completely new, `BooleanScorer` has been doing this to 
exhaustively evaluate disjunctive queries, by loading their matches into a bit 
set, one clause at a time. Also note that this is a bit different from 
traditional TAAT as this is scoped to small-ish windows of doc IDs, not the 
entire doc ID space.
   
   This in-turn allows these scorers to take advantage of the new 
`Scorer#nextDocsAndScores` API, and provides a good speedup. A downside is that 
we may need to perform more memory copying in some cases, and evaluate a bit 
more documents, but the change still looks like a win in general.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. [lucene]

Reply via email to