[
https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345575#comment-17345575
]
Zach Chen commented on LUCENE-9335:
-----------------------------------
I see why Fuzzy1 & Fuzzy2 did not trigger BMM scorer / bulkScorer now. Those
queries were rewritten into boolean queries with boosting (BoostQuery), but in
the BMM eligibility check I had check for TermQuery directly
[https://github.com/apache/lucene/pull/113/files#diff-d500c30048128831b0fe3c53d9bb74eed7d8063e81d33737b26dcd00bc7f1fd2R337]
, hence the BMM scorer / bulkScorer were not invoked for them.
Also likely the looping in that check hurt performance for both
implementations, as fuzzy queries can expand into ones with many subqueries
(one instance I saw was 50 subqueries), and the current logic would go through
all subqueries.
> Add a bulk scorer for disjunctions that does dynamic pruning
> ------------------------------------------------------------
>
> Key: LUCENE-9335
> URL: https://issues.apache.org/jira/browse/LUCENE-9335
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: wikimedium.10M.nostopwords.tasks,
> wikimedium.10M.nostopwords.tasks.5OrMeds
>
> Time Spent: 6h 50m
> Remaining Estimate: 0h
>
> Lucene often gets benchmarked against other engines, e.g. against Tantivy and
> PISA at [https://tantivy-search.github.io/bench/] or against research
> prototypes in Table 1 of
> [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf].
> Given that top-level disjunctions of term queries are commonly used for
> benchmarking, it would be nice to optimize this case a bit more, I suspect
> that we could make fewer per-document decisions by implementing a BulkScorer
> instead of a Scorer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]