[GitHub] [lucene] jpountz commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

GitBox Thu, 23 Jun 2022 06:53:03 -0700


jpountz commented on PR #972:
URL: https://github.com/apache/lucene/pull/972#issuecomment-1164436120


   @zacharymorn FYI I played with a slightly different approach that implements 
BMM as a bulk scorer instead of a scorer, which I was hoping would help with 
making bookkeeping more lightweight: 
https://github.com/jpountz/lucene/tree/maxscore. It could be interesting to 
compare with your implementation.
   
   One optimization it has that seemed to help that your scorer doesn't have is 
to check for every non-essential scorer whether the score obtained so far plus 
the sum of max scores of non essential scorers that haven't been checked yet is 
still competitive.
   
   I got the following results on one run on wikimedium10m:
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                       OrHighNotLow     1493.13      (6.5%)     1445.29      
(5.1%)   -3.2% ( -13% -    8%) 0.083
                       OrNotHighMed     1410.19      (3.8%)     1373.37      
(3.1%)   -2.6% (  -9% -    4%) 0.017
                      OrNotHighHigh     1057.88      (5.1%)     1031.19      
(4.4%)   -2.5% ( -11% -    7%) 0.096
                       OrHighNotMed     1525.10      (5.2%)     1486.80      
(4.4%)   -2.5% ( -11% -    7%) 0.098
                      OrHighNotHigh     1250.31      (4.3%)     1221.99      
(3.4%)   -2.3% (  -9% -    5%) 0.062
                             IntNRQ      531.54      (2.9%)      522.49      
(2.7%)   -1.7% (  -7% -    3%) 0.053
                             Fuzzy1      111.13      (2.1%)      109.80      
(2.6%)   -1.2% (  -5% -    3%) 0.107
                         AndHighMed      386.29      (4.1%)      381.84      
(3.3%)   -1.2% (  -8% -    6%) 0.329
                        AndHighHigh       78.96      (5.6%)       78.18      
(4.7%)   -1.0% ( -10% -    9%) 0.548
               BrowseDateSSDVFacets        4.51     (12.6%)        4.47     
(12.4%)   -0.8% ( -22% -   27%) 0.836
                       OrNotHighLow     1316.24      (3.8%)     1305.93      
(3.1%)   -0.8% (  -7% -    6%) 0.476
             OrHighMedDayTaxoFacets       20.87      (5.1%)       20.71      
(4.2%)   -0.8% (  -9% -    9%) 0.609
              BrowseMonthSSDVFacets       23.54      (6.4%)       23.42      
(7.4%)   -0.5% ( -13% -   14%) 0.817
        BrowseRandomLabelTaxoFacets       37.54      (1.7%)       37.37      
(1.9%)   -0.5% (  -4% -    3%) 0.432
                        MedSpanNear       68.68      (1.7%)       68.37      
(2.2%)   -0.4% (  -4% -    3%) 0.474
           AndHighHighDayTaxoFacets       10.78      (5.9%)       10.73      
(4.7%)   -0.4% ( -10% -   10%) 0.794
              BrowseMonthTaxoFacets       28.39     (10.0%)       28.29      
(9.1%)   -0.3% ( -17% -   20%) 0.910
              HighTermDayOfYearSort      171.78     (13.7%)      171.22     
(13.2%)   -0.3% ( -23% -   30%) 0.939
                           PKLookup      245.27      (2.2%)      244.52      
(1.9%)   -0.3% (  -4% -    3%) 0.635
                   HighSloppyPhrase       39.08      (2.9%)       38.96      
(4.3%)   -0.3% (  -7% -    7%) 0.795
                  HighTermMonthSort      167.47     (15.1%)      167.06     
(14.7%)   -0.2% ( -26% -   34%) 0.959
                         HighPhrase      250.14      (2.8%)      249.53      
(2.3%)   -0.2% (  -5% -    5%) 0.767
                         TermDTSort      138.22     (14.0%)      137.97     
(13.4%)   -0.2% ( -24% -   31%) 0.967
                             Fuzzy2       55.22      (1.6%)       55.17      
(1.5%)   -0.1% (  -3% -    3%) 0.837
                            MedTerm     1844.25      (6.4%)     1843.10      
(4.9%)   -0.1% ( -10% -   11%) 0.972
                    MedSloppyPhrase       15.34      (2.2%)       15.33      
(3.9%)   -0.1% (  -5% -    6%) 0.954
                            Prefix3      110.03      (2.6%)      110.07      
(1.8%)    0.0% (  -4% -    4%) 0.962
                       HighSpanNear        7.95      (1.7%)        7.97      
(1.7%)    0.2% (  -3% -    3%) 0.772
          BrowseDayOfYearTaxoFacets       46.78      (1.9%)       46.86      
(2.1%)    0.2% (  -3% -    4%) 0.788
                         AndHighLow     1291.99      (2.6%)     1294.28      
(3.4%)    0.2% (  -5% -    6%) 0.854
                        LowSpanNear       47.55      (1.5%)       47.64      
(1.4%)    0.2% (  -2% -    3%) 0.697
                           Wildcard      157.83      (1.5%)      158.14      
(1.3%)    0.2% (  -2% -    3%) 0.661
                          LowPhrase       83.20      (2.3%)       83.37      
(2.1%)    0.2% (  -4% -    4%) 0.773
                            Respell       95.18      (1.4%)       95.47      
(1.3%)    0.3% (  -2% -    3%) 0.492
            AndHighMedDayTaxoFacets       51.97      (1.8%)       52.16      
(2.1%)    0.4% (  -3% -    4%) 0.553
               BrowseDateTaxoFacets       45.77      (2.0%)       45.98      
(1.9%)    0.5% (  -3% -    4%) 0.452
               MedTermDayTaxoFacets       60.66      (5.9%)       61.03      
(5.0%)    0.6% (  -9% -   12%) 0.718
                          MedPhrase       57.67      (3.1%)       58.06      
(2.6%)    0.7% (  -4% -    6%) 0.452
          BrowseDayOfYearSSDVFacets       20.40      (6.0%)       20.57      
(4.2%)    0.8% (  -8% -   11%) 0.609
                    LowSloppyPhrase       37.59      (4.0%)       38.00      
(3.6%)    1.1% (  -6% -    9%) 0.376
        BrowseRandomLabelSSDVFacets       15.25      (5.2%)       15.41      
(6.9%)    1.1% ( -10% -   13%) 0.571
                           HighTerm     2001.23      (6.4%)     2025.82      
(4.9%)    1.2% (  -9% -   13%) 0.493
                            LowTerm     2092.97      (4.3%)     2119.02      
(5.5%)    1.2% (  -8% -   11%) 0.423
                MedIntervalsOrdered       56.91      (3.9%)       57.92      
(3.0%)    1.8% (  -4% -    9%) 0.107
               HighIntervalsOrdered       16.67      (6.2%)       16.97      
(4.6%)    1.8% (  -8% -   13%) 0.297
                LowIntervalsOrdered       20.18      (4.3%)       20.57      
(3.3%)    1.9% (  -5% -   10%) 0.113
               HighTermTitleBDVSort      182.32     (14.2%)      186.92     
(22.0%)    2.5% ( -29% -   45%) 0.667
                          OrHighLow     1235.23      (1.8%)     1484.12      
(4.8%)   20.1% (  13% -   27%) 0.000
                          OrHighMed      156.75      (4.7%)      200.46      
(4.7%)   27.9% (  17% -   39%) 0.000
                         OrHighHigh       25.07      (5.2%)       48.30      
(9.1%)   92.6% (  74% -  112%) 0.000
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

Reply via email to