[ 
https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347362#comment-17347362
 ] 

Zach Chen commented on LUCENE-9335:
-----------------------------------

{quote}The speedup for some of the slower queries looks great. I know Fuzzy1 
and Fuzzy2 are quite noisy, but have you tried running them using BMM? Maybe 
your change makes them faster?
{quote}
Ah not sure why I didn't think of running them through BMM earlier! I just gave 
them a run, and got the following results:

*BMM Scorer*

 
{code:java}
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value
                  Fuzzy1       30.46     (24.7%)       17.63     (11.6%)  
-42.1% ( -62% -   -7%) 0.000
                  Fuzzy2       21.61     (16.4%)       16.28     (12.0%)  
-24.7% ( -45% -    4%) 0.000
                PKLookup      216.72      (4.1%)      215.63      (3.0%)   
-0.5% (  -7% -    6%) 0.654
{code}
 
{code:java}
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value
                  Fuzzy1       30.58      (9.1%)       22.12      (6.4%)  
-27.7% ( -39% -  -13%) 0.000
                  Fuzzy2       36.07     (12.7%)       27.05     (10.8%)  
-25.0% ( -42% -   -1%) 0.000
                PKLookup      215.26      (3.4%)      213.99      (2.5%)   
-0.6% (  -6% -    5%) 0.530{code}
 

 

*BMMBulkScorer without window (with the above scorer implementation)*

 
{code:java}
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value
                  Fuzzy2       16.32     (22.6%)       15.68     (16.3%)   
-3.9% ( -34% -   45%) 0.527
                  Fuzzy1       48.11     (17.6%)       47.48     (13.6%)   
-1.3% ( -27% -   36%) 0.791
                PKLookup      213.67      (3.2%)      212.52      (4.0%)   
-0.5% (  -7% -    6%) 0.640
{code}
{code:java}
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value
                  Fuzzy2       26.99     (23.2%)       24.75     (13.6%)   
-8.3% ( -36% -   37%) 0.169
                PKLookup      216.27      (4.3%)      216.43      (3.4%)    
0.1% (  -7% -    8%) 0.951
                  Fuzzy1       19.01     (24.2%)       20.01     (14.2%)    
5.3% ( -26% -   57%) 0.400
{code}
*BMMBulkScorer with window size 1024*

 

 
{code:java}
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value
                  Fuzzy2       23.56     (26.0%)       19.08     (13.9%)  
-19.0% ( -46% -   28%) 0.004
                  Fuzzy1       30.97     (31.6%)       25.82     (16.9%)  
-16.6% ( -49% -   46%) 0.038
                PKLookup      213.23      (2.5%)      211.63      (1.8%)   
-0.7% (  -5% -    3%) 0.289
{code}
{code:java}
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value
                  Fuzzy1       20.59     (12.1%)       20.59     (10.5%)   
-0.0% ( -20% -   25%) 0.994
                PKLookup      205.21      (3.1%)      206.99      (3.7%)    
0.9% (  -5% -    7%) 0.422
                  Fuzzy2       30.74     (22.7%)       32.71     (17.0%)    
6.4% ( -27% -   59%) 0.311
{code}
 

These results look strange to me actually, as I would imagine the BulkScorer 
without window one to perform similarly with the scorer one, as it was just 
using the scorer implementation under the hood. I'll need to dive into it more 
to understand what contributed to these difference (their JFR CPU recordings 
look similar too).

>From the results I got now, it seems BMM may not be ideal for handling queries 
>with many terms. My high level guess is that with these queries that can be 
>rewritten into boolean queries with  ~50 terms, BMM may find itself spending 
>lots of time to compute upTo and update maxScore, as the minimum of all block 
>boundaries of scorers were used to update upTo each time. This can explain why 
>the bulkScorer implementation with a fixed window size has better performance 
>than the scorer one, but doesn't explain the difference above.

 
{quote}I wanted to do some more tests so I played with the MSMARCO passages 
dataset, which has the interesting property of having queries that have several 
terms (often around 8-10). See the attached benchmark if you are interested, 
here are the outputs I'm getting for various scorers:

Contrary to my intuition, WAND seems to perform better despite the high number 
of terms. I wonder if there are some improvements we can still make to BMM?
{quote}
Thanks for running these additional tests! The results indeed look interesting. 
I took a look at the MSMarcoPassages.java code you attached, and wonder if it's 
also possible that, since the percentile numbers were computed after sort, for 
some low percentile (P10 for example) BMM can do much better, and for the rest 
(at least 50% of them), worse than BMW?

I also notice that BMM BulkScorer collects roughly 10X the amount of docs 
compared with BMM scorer, which in turn also collects > 10X the amount of docs 
compared with BMW. I feel this may also explain the unexpected slow down? In 
general I would assume these scorers to all collect the same amount of top docs.

Also, I'm interested to run these benchmark tests as well. Are these passages 
data set and queries used available for download somewhere (I found the MS 
github site, but not sure if that has the same version with the one you used)? 

> Add a bulk scorer for disjunctions that does dynamic pruning
> ------------------------------------------------------------
>
>                 Key: LUCENE-9335
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9335
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: MSMarcoPassages.java, wikimedium.10M.nostopwords.tasks, 
> wikimedium.10M.nostopwords.tasks.5OrMeds
>
>          Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Lucene often gets benchmarked against other engines, e.g. against Tantivy and 
> PISA at [https://tantivy-search.github.io/bench/] or against research 
> prototypes in Table 1 of 
> [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf].
>  Given that top-level disjunctions of term queries are commonly used for 
> benchmarking, it would be nice to optimize this case a bit more, I suspect 
> that we could make fewer per-document decisions by implementing a BulkScorer 
> instead of a Scorer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to