sgup432 opened a new issue, #16049:
URL: https://github.com/apache/lucene/issues/16049
### Description
For one of the users in OpenSearch, we observed that they had a sample query
like below(redacted the whole boolean query and just keeping the must_not part
for reference):
```
{"bool":{"must_not":[{"terms":{"category":["type_a","type_b","type_c","type_d","type_e","type_f","type_g","type_h","type_i","type_j","type_k"]}}]}}
```
They have an Opensearch index of ~2B docs across many shards, and after
upgrading to lucene 10.x from lucene 9.x, we observed a significant performance
regression on this query pattern.
I see there were major changes which were done in `ReqExclBulkScorer`(which
handles the exclusion part) where we added `docIdRunEnd()` optimization,
basically excluding a batch of matching docs efficiently rather than going over
each matching docs one by one. When a term matches a dense contiguous block of
documents, this lets the scorer jump over the entire block in a single step
instead of iterating through each excluded doc individually.
### How Lucene 9 handled it:
In Lucene 9, multiple MUST_NOT terms were merged using a min-heap
(`DisiPriorityQueue`). The heap always surfaces the smallest excluded doc ID
across all term iterators. Each excluded doc costs one heap operation ie O(log
K) where K is the number of excluded terms. Simple, predictable, and scales
well regardless of how the excluded docs are distributed.
### What changed in Lucene 10:
In Lucene 10, `ReqExclBulkScorer` now calls `docIDRunEnd()` on every
excluded doc hit to try to skip an entire run at once. For a single MUST_NOT
term with a dense posting list, this is a genuine win. The scorer can skip
thousands of docs in one step. However, when there are many MUST_NOT terms
(like 11 in this case) whose values are interleaved across the index,
`docIDRunEnd()` triggers `DisjunctionDISIApproximation.computeTopList()`, which
linearly scans all K iterators on every excluded doc hit.
In this case or generally, where we have 11 interleaved terms, and the lead
iterator cannot claim a run longer than 1(or a very small value) as the next
excluded doc always belongs to a different term at a different position. Yet
`computeTopList()` still has to linearly scan all K iterators on every hit just
to confirm this, paying O(K) cost for a result that's no better than the old
upTo += 1. This turns the per-doc cost from O(log K) in Lucene 9 to O(K) in
Lucene 10 for this pattern.
Looks like this optimization works well with few MUST_NOT terms (1-2). It
regresses with many MUST_NOT terms (like 11) whose posting lists are
interleaved, which is a common pattern when filtering by multiple category
values on a keyword field.
### Sample hot thread dump we saw:
```
70.1% (350.2ms out of 500ms) cpu usage by thread
'opensearch[...][search][T#42]'
3/10 snapshots sharing following 34 elements
app//org.apache.lucene.search.DisjunctionDISIApproximation.computeTopList(DisjunctionDISIApproximation.java:179)
app//org.apache.lucene.search.DisjunctionDISIApproximation.topList(DisjunctionDISIApproximation.java:169)
app//org.apache.lucene.search.DisjunctionDISIApproximation.docIDRunEnd(DisjunctionDISIApproximation.java:194)
app//org.apache.lucene.search.ReqExclBulkScorer.score(ReqExclBulkScorer.java:62)
app//org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:367)
app//org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282)
```
### Suggested fix
Maybe skip this approach when the exclusion is a disjunction of many
terms(>2), falling back to simpler one-doc at-a-time approach like in lucene 9,
which seems more efficient for this pattern.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]