[jira] [Commented] (LUCENE-9335) Add a bulk scorer for disjunctions that does dynamic pruning
[ https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345013#comment-17345013 ] Adrien Grand commented on LUCENE-9335: -- bq. For some reasons it hurt Fuzzy1 & Fuzzy2 performance consistently by around 8%-13%, even though it wasn't used for those queries Are you sure? I believe that fuzzy queries rewrite to boolean queries, so they would use your new block-max maxscore under the hood? > Add a bulk scorer for disjunctions that does dynamic pruning > > > Key: LUCENE-9335 > URL: https://issues.apache.org/jira/browse/LUCENE-9335 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: wikimedium.10M.nostopwords.tasks, > wikimedium.10M.nostopwords.tasks.5OrMeds > > Time Spent: 6h 50m > Remaining Estimate: 0h > > Lucene often gets benchmarked against other engines, e.g. against Tantivy and > PISA at [https://tantivy-search.github.io/bench/] or against research > prototypes in Table 1 of > [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf]. > Given that top-level disjunctions of term queries are commonly used for > benchmarking, it would be nice to optimize this case a bit more, I suspect > that we could make fewer per-document decisions by implementing a BulkScorer > instead of a Scorer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9335) Add a bulk scorer for disjunctions that does dynamic pruning
[ https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345564#comment-17345564 ] Zach Chen commented on LUCENE-9335: --- > Are you sure? I believe that fuzzy queries rewrite to boolean queries, so >they would use your new block-max maxscore under the hood? Hmm I verified that by throwing runtime exception in the BMM BulkScorer's constructor, and running only Fuzz1 & Fuzz2 queries in the benchmark, which completed successfully. I feel the slow down may come from the checks to see if BMM is applicable. Let me take a further look there. > Add a bulk scorer for disjunctions that does dynamic pruning > > > Key: LUCENE-9335 > URL: https://issues.apache.org/jira/browse/LUCENE-9335 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: wikimedium.10M.nostopwords.tasks, > wikimedium.10M.nostopwords.tasks.5OrMeds > > Time Spent: 6h 50m > Remaining Estimate: 0h > > Lucene often gets benchmarked against other engines, e.g. against Tantivy and > PISA at [https://tantivy-search.github.io/bench/] or against research > prototypes in Table 1 of > [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf]. > Given that top-level disjunctions of term queries are commonly used for > benchmarking, it would be nice to optimize this case a bit more, I suspect > that we could make fewer per-document decisions by implementing a BulkScorer > instead of a Scorer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9335) Add a bulk scorer for disjunctions that does dynamic pruning
[ https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345564#comment-17345564 ] Zach Chen edited comment on LUCENE-9335 at 5/15/21, 6:57 PM: - {quote}Are you sure? I believe that fuzzy queries rewrite to boolean queries, so they would use your new block-max maxscore under the hood? {quote} Hmm I verified that by throwing runtime exception in the BMM BulkScorer's constructor, and running only Fuzz1 & Fuzz2 queries in the benchmark, which completed successfully. I feel the slow down may come from the checks to see if BMM is applicable. Let me take a further look there. was (Author: zacharymorn): > Are you sure? I believe that fuzzy queries rewrite to boolean queries, so >they would use your new block-max maxscore under the hood? Hmm I verified that by throwing runtime exception in the BMM BulkScorer's constructor, and running only Fuzz1 & Fuzz2 queries in the benchmark, which completed successfully. I feel the slow down may come from the checks to see if BMM is applicable. Let me take a further look there. > Add a bulk scorer for disjunctions that does dynamic pruning > > > Key: LUCENE-9335 > URL: https://issues.apache.org/jira/browse/LUCENE-9335 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: wikimedium.10M.nostopwords.tasks, > wikimedium.10M.nostopwords.tasks.5OrMeds > > Time Spent: 6h 50m > Remaining Estimate: 0h > > Lucene often gets benchmarked against other engines, e.g. against Tantivy and > PISA at [https://tantivy-search.github.io/bench/] or against research > prototypes in Table 1 of > [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf]. > Given that top-level disjunctions of term queries are commonly used for > benchmarking, it would be nice to optimize this case a bit more, I suspect > that we could make fewer per-document decisions by implementing a BulkScorer > instead of a Scorer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9335) Add a bulk scorer for disjunctions that does dynamic pruning
[ https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345575#comment-17345575 ] Zach Chen commented on LUCENE-9335: --- I see why Fuzzy1 & Fuzzy2 did not trigger BMM scorer / bulkScorer now. Those queries were rewritten into boolean queries with boosting (BoostQuery), but in the BMM eligibility check I had check for TermQuery directly [https://github.com/apache/lucene/pull/113/files#diff-d500c30048128831b0fe3c53d9bb74eed7d8063e81d33737b26dcd00bc7f1fd2R337] , hence the BMM scorer / bulkScorer were not invoked for them. Also likely the looping in that check hurt performance for both implementations, as fuzzy queries can expand into ones with many subqueries (one instance I saw was 50 subqueries), and the current logic would go through all subqueries. > Add a bulk scorer for disjunctions that does dynamic pruning > > > Key: LUCENE-9335 > URL: https://issues.apache.org/jira/browse/LUCENE-9335 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: wikimedium.10M.nostopwords.tasks, > wikimedium.10M.nostopwords.tasks.5OrMeds > > Time Spent: 6h 50m > Remaining Estimate: 0h > > Lucene often gets benchmarked against other engines, e.g. against Tantivy and > PISA at [https://tantivy-search.github.io/bench/] or against research > prototypes in Table 1 of > [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf]. > Given that top-level disjunctions of term queries are commonly used for > benchmarking, it would be nice to optimize this case a bit more, I suspect > that we could make fewer per-document decisions by implementing a BulkScorer > instead of a Scorer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org