Re: [I] Performance regression in BooleanQuery with multiple MUST_NOT clauses on interleaved posting lists [lucene]

via GitHub Tue, 12 May 2026 18:11:40 -0700


sgup432 commented on issue #16049:
URL: https://github.com/apache/lucene/issues/16049#issuecomment-4436200721


   In a nutshell, 
   lucene 9 the worst case time complexity for above case is O(N * logK) where 
N is the number of matching documents and K is the number of iterator.
   
   For lucene 10, worst case can be O(N * K) for a case like above where you 
cannot exclude docs in batches and need to go one by one.
   
   I guess the current approach would still work if the posting lists are not 
interleaved (e.g., clustered — type_a docs are all in one contiguous block, 
then type_b docs, etc.), then docIDRunEnd() or the current approach would 
actually help as we could exclude a lot of docs in batches, and will be faster.
   
   But with higher number of sub-iterators(like 11 in this case), the 
probability of interleaving is higher. So it may not be a bad idea to disable 
the current batch approach in case we have > 2 sub-iterators like we did above 
in the benchmark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Performance regression in BooleanQuery with multiple MUST_NOT clauses on interleaved posting lists [lucene]

Reply via email to