[jira] [Commented] (LUCENE-9335) Add a bulk scorer for disjunctions that does dynamic pruning

Zach Chen (Jira) Sat, 01 May 2021 21:12:08 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337948#comment-17337948
 ]


Zach Chen commented on LUCENE-9335:
-----------------------------------

I was trying to modify the _CreateQueries_ class in luceneutil to generate OR 
queries with 5 clauses, but got some issues running it. So I did some quick 
hack to combine the queries from OrHighHigh, OrHighMed and OrHighLow to create 
a new OrHighHighMedHighLow task with queries. I've attached the resulting file 
_wikimedium.10M.nostopwords.tasks_ to this ticket. 

Here are the luceneutil results from 2 runs for each implementation:

Scorer [https://github.com/apache/lucene/pull/101]
{code:java}
                   TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value
    OrHighHighMedHighLow       30.97      (6.2%)       24.92      (4.4%)  
-19.5% ( -28% -   -9%) 0.000
                PKLookup      223.53      (2.4%)      228.10      (3.7%)    
2.0% (  -3% -    8%) 0.037{code}
{code:java}
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value     OrHighHighMedHighLow       32.83     
 (3.4%)       34.00      (5.1%)    3.6% (  -4% -   12%) 0.009                 
PKLookup      217.86      (2.8%)      228.14      (4.2%)    4.7% (  -2% -   
12%) 0.000
{code}
BulkScorer 
[https://github.com/apache/lucene/pull/113|https://github.com/apache/lucene/pull/113.]
{code:java}
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value
                PKLookup      197.84      (4.1%)      207.79      (4.2%)    
5.0% (  -3% -   13%) 0.000
    OrHighHighMedHighLow       32.50     (16.7%)       35.79      (9.9%)   
10.1% ( -14% -   44%) 0.020 {code}
{code:java}
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value     OrHighHighMedHighLow       28.61     
 (5.4%)       22.28      (4.2%)  -22.1% ( -30% -  -13%) 0.000                 
PKLookup      227.38      (2.6%)      233.05      (2.7%)    2.5% (  -2% -    
8%) 0.003
{code}
 

> Add a bulk scorer for disjunctions that does dynamic pruning
> ------------------------------------------------------------
>
>                 Key: LUCENE-9335
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9335
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: wikimedium.10M.nostopwords.tasks
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Lucene often gets benchmarked against other engines, e.g. against Tantivy and 
> PISA at [https://tantivy-search.github.io/bench/] or against research 
> prototypes in Table 1 of 
> [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf].
>  Given that top-level disjunctions of term queries are commonly used for 
> benchmarking, it would be nice to optimize this case a bit more, I suspect 
> that we could make fewer per-document decisions by implementing a BulkScorer 
> instead of a Scorer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-9335) Add a bulk scorer for disjunctions that does dynamic pruning

Reply via email to