[jira] [Commented] (LUCENE-9335) Add a bulk scorer for disjunctions that does dynamic pruning

Zach Chen (Jira) Mon, 12 Apr 2021 23:08:39 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319923#comment-17319923
 ]


Zach Chen commented on LUCENE-9335:
-----------------------------------

I made some further changes to move some block max related logic from 
DisjunctionMaxScorer to DisjunctionScorer, so that DisjunctionSumScorer can 
inherit. I've published a WIP PR [https://github.com/apache/lucene/pull/81] for 
those changes for the ease of review. 

When I run luceneutil, I see further errors from verifyScores section of code, 
which may indicate bugs in my changes:

 
{code:java}
WARNING: cat=OrHighHigh: hit counts differ: 9870+ vs 2616+
Traceback (most recent call last):
  File "src/python/localrun.py", line 53, in <module>
    comp.benchmark("baseline_vs_patch")
  File "/Users/xichen/IdeaProjects/benchmarks/util/src/python/competition.py", 
line 455, in benchmark
    randomSeed = self.randomSeed)
  File "/Users/xichen/IdeaProjects/benchmarks/util/src/python/searchBench.py", 
line 196, in run
    raise RuntimeError('errors occurred: %s' % str(cmpDiffs))
RuntimeError: errors occurred: ([], ["query=body:second body:short filter=None 
sort=None groupField=None hitCount=9870+: hit 0 has wrong field/score value 
([1444649], '5.0718417') vs ([5125], '4.224689')"], 1.0){code}
 

 

I then made further changes in benchUtil.py to skip over verifyScores, so that 
I can see what benchmark results it would generate:

 
{code:java}
diff --git a/src/python/benchUtil.py b/src/python/benchUtil.py
index fb50033..c2faffc 100644
--- a/src/python/benchUtil.py
+++ b/src/python/benchUtil.py
@@ -1203,7 +1203,7 @@ class RunAlgs:
     cmpRawResults, heapCmp = parseResults(cmpLogFiles)
 
     # make sure they got identical results
-    cmpDiffs = compareHits(baseRawResults, cmpRawResults, self.verifyScores, 
self.verifyCounts)
+    cmpDiffs = compareHits(baseRawResults, cmpRawResults, False, False)
 
     baseResults = collateResults(baseRawResults)
     cmpResults = collateResults(cmpRawResults){code}
 

 

 I then got the following benchmark results from multiple runs

 
{code:java}
                  TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value
               OrHighMed      186.44      (2.8%)      160.50      (4.5%)  
-13.9% ( -20% -   -6%) 0.000
               OrHighLow      735.70      (7.5%)      696.89      (4.3%)   
-5.3% ( -15% -    6%) 0.006
                  Fuzzy1       75.85     (11.5%)       72.81     (14.0%)   
-4.0% ( -26% -   24%) 0.323
              TermDTSort      237.49     (10.4%)      228.02     (10.6%)   
-4.0% ( -22% -   18%) 0.230
       HighTermMonthSort      280.82      (9.8%)      274.90     (10.8%)   
-2.1% ( -20% -   20%) 0.518
                  Fuzzy2       54.08     (12.5%)       53.04     (14.2%)   
-1.9% ( -25% -   28%) 0.648
            OrNotHighMed      672.83      (2.7%)      661.16      (4.7%)   
-1.7% (  -8% -    5%) 0.153
    HighTermTitleBDVSort      438.56     (14.4%)      431.81     (16.6%)   
-1.5% ( -28% -   34%) 0.754
              AndHighLow      969.43      (5.2%)      957.49      (4.7%)   
-1.2% ( -10% -    9%) 0.432
           OrNotHighHigh      704.98      (3.4%)      700.72      (3.9%)   
-0.6% (  -7% -    7%) 0.605
             AndHighHigh      109.77      (4.2%)      109.31      (4.7%)   
-0.4% (  -9% -    8%) 0.767
   BrowseMonthSSDVFacets       32.52      (2.1%)       32.40      (4.6%)   
-0.4% (  -6% -    6%) 0.755
                PKLookup      219.90      (3.1%)      219.16      (3.2%)   
-0.3% (  -6% -    6%) 0.734
                Wildcard      284.84      (1.9%)      284.18      (1.8%)   
-0.2% (  -3% -    3%) 0.690
                 Prefix3      361.00      (2.1%)      360.24      (2.0%)   
-0.2% (  -4% -    4%) 0.750
    HighIntervalsOrdered       28.68      (2.2%)       28.64      (1.7%)   
-0.1% (  -3% -    3%) 0.819
   BrowseMonthTaxoFacets       13.60      (2.9%)       13.59      (2.7%)   
-0.1% (  -5% -    5%) 0.947
BrowseDayOfYearSSDVFacets       28.67      (4.8%)       28.66      (4.8%)   
-0.0% (  -9% -   10%) 0.979
            HighSpanNear       79.29      (2.4%)       79.29      (2.2%)    
0.0% (  -4% -    4%) 0.997
           OrHighNotHigh      695.37      (5.5%)      696.65      (3.8%)    
0.2% (  -8% -   10%) 0.903
                 MedTerm     1478.47      (3.6%)     1481.54      (3.0%)    
0.2% (  -6% -    7%) 0.843
   HighTermDayOfYearSort      372.12     (14.1%)      373.08     (14.8%)    
0.3% ( -25% -   33%) 0.955
                  IntNRQ      125.36      (1.3%)      125.72      (0.7%)    
0.3% (  -1% -    2%) 0.391
             LowSpanNear       52.82      (1.7%)       52.98      (2.0%)    
0.3% (  -3% -    4%) 0.611
BrowseDayOfYearTaxoFacets       11.28      (3.1%)       11.31      (3.1%)    
0.3% (  -5% -    6%) 0.756
         LowSloppyPhrase      154.42      (2.9%)      154.91      (2.9%)    
0.3% (  -5% -    6%) 0.731
               MedPhrase      143.27      (2.9%)      143.88      (2.5%)    
0.4% (  -4% -    6%) 0.625
            OrHighNotMed      760.65      (6.8%)      763.93      (5.4%)    
0.4% ( -10% -   13%) 0.824
                 Respell       86.71      (1.5%)       87.11      (2.1%)    
0.5% (  -3% -    4%) 0.425
             MedSpanNear      210.43      (2.2%)      211.43      (1.4%)    
0.5% (  -3% -    4%) 0.414
         MedSloppyPhrase      220.29      (2.6%)      221.35      (2.2%)    
0.5% (  -4% -    5%) 0.528
    BrowseDateTaxoFacets       11.24      (3.0%)       11.30      (3.1%)    
0.6% (  -5% -    6%) 0.529
               LowPhrase      174.98      (2.4%)      176.05      (2.0%)    
0.6% (  -3% -    5%) 0.385
        HighSloppyPhrase      100.25      (3.2%)      100.88      (3.0%)    
0.6% (  -5% -    7%) 0.524
            OrHighNotLow     1016.27      (7.7%)     1025.44      (6.9%)    
0.9% ( -12% -   16%) 0.696
                 LowTerm     1634.20      (3.8%)     1649.20      (2.6%)    
0.9% (  -5% -    7%) 0.376
              HighPhrase      415.55      (3.0%)      419.76      (2.9%)    
1.0% (  -4% -    7%) 0.282
            OrNotHighLow      940.18      (5.4%)      952.61      (2.7%)    
1.3% (  -6% -    9%) 0.328
                HighTerm     1163.01      (3.8%)     1178.47      (4.8%)    
1.3% (  -6% -   10%) 0.329
              AndHighMed      365.15      (4.4%)      370.53      (3.1%)    
1.5% (  -5% -    9%) 0.225
              OrHighHigh       80.20      (2.2%)      718.16    (158.9%)  
795.4% ( 620% -  978%) 0.000
WARNING: cat=OrHighHigh: hit counts differ: 19022+ vs 1002+
WARNING: cat=OrHighMed: hit counts differ: 4321+ vs 4289+


{code}
 

 

 
{code:java}
         TaskQPS baseline      StdDevQPS my_modified_version      StdDev        
        Pct diff p-value
               OrHighLow      667.55      (4.3%)      649.15      (5.9%)   
-2.8% ( -12% -    7%) 0.092
           OrHighNotHigh      866.11      (5.2%)      851.54      (5.2%)   
-1.7% ( -11% -    9%) 0.305
   HighTermDayOfYearSort      293.23     (16.4%)      288.53     (15.0%)   
-1.6% ( -28% -   35%) 0.747
                  Fuzzy1       58.36     (10.3%)       57.55     (10.2%)   
-1.4% ( -19% -   21%) 0.666
            OrHighNotLow      709.78      (3.0%)      702.08      (3.9%)   
-1.1% (  -7% -    5%) 0.322
                 LowTerm     1816.04      (4.4%)     1797.54      (4.8%)   
-1.0% (  -9% -    8%) 0.488
                  Fuzzy2       50.30     (11.0%)       49.85     (11.7%)   
-0.9% ( -21% -   24%) 0.802
              OrHighHigh       55.81      (2.4%)       55.35      (2.4%)   
-0.8% (  -5% -    4%) 0.288
            HighSpanNear       15.16      (2.9%)       15.07      (3.0%)   
-0.6% (  -6% -    5%) 0.547
             MedSpanNear       67.82      (3.1%)       67.47      (3.5%)   
-0.5% (  -6% -    6%) 0.613
               OrHighMed      195.58      (2.7%)      194.60      (2.6%)   
-0.5% (  -5% -    4%) 0.548
             LowSpanNear       36.88      (2.7%)       36.75      (2.9%)   
-0.3% (  -5% -    5%) 0.690
   BrowseMonthTaxoFacets       13.05      (3.3%)       13.01      (3.5%)   
-0.3% (  -6% -    6%) 0.749
    HighIntervalsOrdered       44.33      (1.3%)       44.18      (1.4%)   
-0.3% (  -3% -    2%) 0.439
        HighSloppyPhrase       17.92      (4.4%)       17.87      (4.3%)   
-0.3% (  -8% -    8%) 0.821
               MedPhrase       78.25      (2.5%)       78.12      (2.1%)   
-0.2% (  -4% -    4%) 0.823
BrowseDayOfYearSSDVFacets       27.73      (2.4%)       27.68      (2.0%)   
-0.2% (  -4% -    4%) 0.817
                PKLookup      213.40      (2.9%)      213.09      (2.4%)   
-0.1% (  -5% -    5%) 0.862
             AndHighHigh      100.38      (2.9%)      100.25      (2.9%)   
-0.1% (  -5% -    5%) 0.891
BrowseDayOfYearTaxoFacets       10.79      (3.4%)       10.78      (3.7%)   
-0.1% (  -6% -    7%) 0.912
              AndHighLow      778.66      (3.4%)      778.10      (3.1%)   
-0.1% (  -6% -    6%) 0.945
                Wildcard      141.04      (2.1%)      141.00      (2.4%)   
-0.0% (  -4% -    4%) 0.970
    BrowseDateTaxoFacets       10.77      (3.3%)       10.77      (3.6%)   
-0.0% (  -6% -    7%) 0.993
    HighTermTitleBDVSort      222.22     (12.5%)      222.20     (11.7%)   
-0.0% ( -21% -   27%) 0.998
         LowSloppyPhrase       34.64      (3.6%)       34.64      (3.4%)   
-0.0% (  -6% -    7%) 0.994
                  IntNRQ      143.05      (0.6%)      143.25      (0.8%)    
0.1% (  -1% -    1%) 0.546
   BrowseMonthSSDVFacets       30.95      (5.2%)       31.00      (5.0%)    
0.2% (  -9% -   10%) 0.922
                 Respell       66.20      (2.2%)       66.36      (1.8%)    
0.2% (  -3% -    4%) 0.719
              AndHighMed      300.10      (2.5%)      300.83      (2.9%)    
0.2% (  -5% -    5%) 0.775
               LowPhrase       54.92      (2.6%)       55.07      (2.1%)    
0.3% (  -4% -    5%) 0.701
                 MedTerm     1522.58      (3.9%)     1529.48      (3.8%)    
0.5% (  -7% -    8%) 0.711
            OrNotHighLow      965.50      (5.2%)      972.89      (2.8%)    
0.8% (  -6% -    9%) 0.564
              HighPhrase      163.18      (2.5%)      164.66      (2.6%)    
0.9% (  -4% -    6%) 0.259
                HighTerm     1400.74      (3.8%)     1417.05      (3.9%)    
1.2% (  -6% -    9%) 0.335
         MedSloppyPhrase      146.41      (2.6%)      148.13      (2.8%)    
1.2% (  -4% -    6%) 0.171
                 Prefix3      355.38      (3.1%)      359.62      (3.4%)    
1.2% (  -5% -    7%) 0.242
           OrNotHighHigh      704.32      (4.2%)      713.26      (4.6%)    
1.3% (  -7% -   10%) 0.363
              TermDTSort      227.11     (13.3%)      230.00     (13.5%)    
1.3% ( -22% -   32%) 0.764
            OrHighNotMed      724.29      (4.2%)      736.47      (3.8%)    
1.7% (  -6% -   10%) 0.183
       HighTermMonthSort      134.94     (10.6%)      137.37     (10.2%)    
1.8% ( -17% -   25%) 0.583
            OrNotHighMed      764.10      (3.7%)      778.89      (3.4%)    
1.9% (  -4% -    9%) 0.082
{code}
 

 

> Add a bulk scorer for disjunctions that does dynamic pruning
> ------------------------------------------------------------
>
>                 Key: LUCENE-9335
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9335
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Lucene often gets benchmarked against other engines, e.g. against Tantivy and 
> PISA at [https://tantivy-search.github.io/bench/] or against research 
> prototypes in Table 1 of 
> [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf].
>  Given that top-level disjunctions of term queries are commonly used for 
> benchmarking, it would be nice to optimize this case a bit more, I suspect 
> that we could make fewer per-document decisions by implementing a BulkScorer 
> instead of a Scorer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9335) Add a bulk scorer for disjunctions that does dynamic pruning

Reply via email to