jpountz opened a new pull request, #14052:
URL: https://github.com/apache/lucene/pull/14052

   Currently, the disjunction iterator puts all clauses in a heap in order to 
be able to merge doc IDs in a streaming fashion. This is a good approach for 
exhaustive evaluation, when only one clause moves to a different doc ID on 
average and the per-iteration cost is in the order of O(log(N)) where N is the 
number of clauses.
   
   However, if a selective filter is applied, this could cause many clauses to 
move to a different doc ID. In the worst-case scenario, all clauses could move 
to a different doc ID and the cost of maintaiting heap invariants could grow to 
O(N * log(N)) (every clause introduces a O(log(N)) cost). With many clauses, 
this is much higher than the cost of checking all clauses sequentially: O(N).
   
   To protect from this reordering overhead, DisjunctionDISIApproximation now 
only puts the cheapest clauses in a heap in a way that tries to achieve up to 
1.5 clauses moving to a different doc ID on average. More expensive clauses are 
checked linearly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to