[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

GitBox Tue, 21 Jun 2022 21:07:22 -0700


zacharymorn commented on PR #972:
URL: https://github.com/apache/lucene/pull/972#issuecomment-1162613846


   Hi @jpountz , I have adapted the original BMM PR 
https://github.com/apache/lucene/pull/101 to the latest codebase and run 
further experiments on using it for 2 clauses disjunction. The results look 
both encouraging and strange :D 
   
   When I run `python3 src/python/localrun.py -source wikimedium10m` with only 
`OrHighLow`, `OrHighHigh` and `OrHighMed` tasks from ` 
tasks/wikimedium.10M.nostopwords.tasks tasks/wikimedium.10M.nostopwords.tasks` 
(by removing the other tasks), I got pretty impressive speedup on average:
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           PKLookup      173.31     (24.6%)      181.79     
(26.8%)    4.9% ( -37% -   74%) 0.547
                          OrHighLow      166.70     (62.8%)      385.94    
(101.5%)  131.5% ( -20% -  794%) 0.000
                         OrHighHigh        9.27     (48.9%)       23.44     
(85.9%)  152.9% (  12% -  562%) 0.000
                          OrHighMed       18.45     (61.3%)       55.92    
(137.3%)  203.0% (   2% - 1037%) 0.000
   ```
   
   However, when I run all the tasks, `OrHighLow`, `OrHighHigh` and `OrHighMed` 
have only moderate speedup on average and sometimes even slightly negatively 
impacted:
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                         OrHighHigh       35.23      (7.2%)       23.86      
(7.0%)  -32.3% ( -43% -  -19%) 0.000
                          OrHighLow      898.97      (4.4%)      788.65      
(4.2%)  -12.3% ( -20% -   -3%) 0.000
               BrowseDateSSDVFacets        2.62     (27.0%)        2.43     
(18.8%)   -7.4% ( -41% -   52%) 0.312
                       HighSpanNear       21.86      (6.4%)       21.00      
(6.1%)   -4.0% ( -15% -    9%) 0.045
                             Fuzzy2       94.11     (12.4%)       90.59      
(9.8%)   -3.7% ( -23% -   21%) 0.290
                    LowSloppyPhrase       65.63      (8.2%)       63.99      
(8.6%)   -2.5% ( -17% -   15%) 0.347
                   HighSloppyPhrase       17.25      (5.3%)       16.84      
(5.3%)   -2.4% ( -12% -    8%) 0.154
                         TermDTSort      160.18      (8.2%)      156.49      
(9.9%)   -2.3% ( -18% -   17%) 0.423
              HighTermDayOfYearSort      164.86      (6.8%)      161.77     
(10.1%)   -1.9% ( -17% -   16%) 0.490
             OrHighMedDayTaxoFacets       11.05      (7.1%)       10.86      
(7.3%)   -1.7% ( -15% -   13%) 0.465
                         AndHighLow     1482.47      (4.0%)     1459.63     
(10.6%)   -1.5% ( -15% -   13%) 0.544
                        MedSpanNear       27.77      (7.2%)       27.49      
(6.1%)   -1.0% ( -13% -   13%) 0.628
               HighTermTitleBDVSort      197.53      (7.4%)      195.53      
(6.3%)   -1.0% ( -13% -   13%) 0.640
            AndHighMedDayTaxoFacets       43.61      (8.7%)       43.19     
(10.1%)   -1.0% ( -18% -   19%) 0.745
               HighIntervalsOrdered       17.38      (8.7%)       17.26      
(7.5%)   -0.7% ( -15% -   16%) 0.782
                         HighPhrase      454.15      (5.0%)      451.67      
(8.7%)   -0.5% ( -13% -   13%) 0.807
        BrowseRandomLabelSSDVFacets       15.40      (8.1%)       15.32      
(7.3%)   -0.5% ( -14% -   16%) 0.837
           AndHighHighDayTaxoFacets       16.94      (7.0%)       16.87      
(6.6%)   -0.5% ( -13% -   14%) 0.834
                        LowSpanNear        9.08      (4.8%)        9.05      
(4.3%)   -0.3% (  -9% -    9%) 0.838
                           Wildcard       55.15     (11.3%)       55.01     
(12.0%)   -0.2% ( -21% -   26%) 0.947
                          MedPhrase      976.56      (2.8%)      977.29      
(3.3%)    0.1% (  -5% -    6%) 0.939
               MedTermDayTaxoFacets       77.21      (8.6%)       77.46      
(8.7%)    0.3% ( -15% -   19%) 0.908
                       OrNotHighLow     1187.34      (5.1%)     1191.80      
(5.3%)    0.4% (  -9% -   11%) 0.819
                      OrHighNotHigh     1556.42      (4.4%)     1566.26      
(4.5%)    0.6% (  -7% -    9%) 0.654
                LowIntervalsOrdered      158.96      (6.4%)      160.03      
(8.9%)    0.7% ( -13% -   17%) 0.785
                      OrNotHighHigh     1427.22      (3.8%)     1436.97      
(5.0%)    0.7% (  -7% -    9%) 0.628
                             Fuzzy1      116.55     (11.4%)      117.41      
(9.4%)    0.7% ( -18% -   24%) 0.823
                            LowTerm     3470.46      (5.9%)     3500.25      
(5.9%)    0.9% ( -10% -   13%) 0.644
                  HighTermMonthSort      169.22     (10.4%)      170.68     
(14.9%)    0.9% ( -22% -   29%) 0.832
                             IntNRQ      115.77     (22.6%)      116.95     
(21.3%)    1.0% ( -34% -   57%) 0.883
                            MedTerm     3042.06      (4.5%)     3080.17      
(5.4%)    1.3% (  -8% -   11%) 0.427
                           HighTerm     2407.19      (5.5%)     2440.56      
(4.1%)    1.4% (  -7% -   11%) 0.369
                            Prefix3      396.92     (10.2%)      403.19      
(8.6%)    1.6% ( -15% -   22%) 0.595
                       OrNotHighMed     1695.31      (3.6%)     1722.43      
(5.5%)    1.6% (  -7% -   11%) 0.274
                    MedSloppyPhrase       13.19      (4.5%)       13.40      
(5.0%)    1.6% (  -7% -   11%) 0.283
                       OrHighNotLow     1473.94      (6.7%)     1500.95      
(6.6%)    1.8% ( -10% -   16%) 0.383
                         AndHighMed      201.69      (4.5%)      205.65      
(9.1%)    2.0% ( -11% -   16%) 0.387
                           PKLookup      247.69     (11.3%)      253.24      
(9.6%)    2.2% ( -16% -   26%) 0.499
                MedIntervalsOrdered       30.40      (8.1%)       31.13      
(7.7%)    2.4% ( -12% -   19%) 0.338
                       OrHighNotMed     1534.55      (4.5%)     1571.83      
(3.9%)    2.4% (  -5% -   11%) 0.068
                            Respell       90.55      (7.9%)       92.75      
(8.8%)    2.4% ( -13% -   20%) 0.359
                        AndHighHigh       65.14      (7.1%)       67.16      
(8.3%)    3.1% ( -11% -   19%) 0.206
          BrowseDayOfYearSSDVFacets       20.96      (9.7%)       21.65     
(11.1%)    3.3% ( -15% -   26%) 0.320
                          LowPhrase       63.71      (6.9%)       65.86      
(9.2%)    3.4% ( -11% -   20%) 0.191
              BrowseMonthSSDVFacets       22.49     (13.6%)       23.62     
(14.8%)    5.0% ( -20% -   38%) 0.263
              BrowseMonthTaxoFacets       26.25     (43.5%)       34.10     
(40.2%)   29.9% ( -37% -  200%) 0.024
               BrowseDateTaxoFacets       22.04     (40.4%)       29.87     
(63.1%)   35.5% ( -48% -  233%) 0.034
          BrowseDayOfYearTaxoFacets       22.07     (39.3%)       31.04     
(64.1%)   40.6% ( -45% -  236%) 0.016
                          OrHighMed       59.30      (9.3%)       84.18     
(20.1%)   41.9% (  11% -   78%) 0.000
        BrowseRandomLabelTaxoFacets       20.38     (52.4%)       30.77     
(88.6%)   50.9% ( -59% -  403%) 0.027
   
   ```
   
   This seems to suggest tasks run may interfere with each other as opposed to 
independent? Do you have any suggestion where I can look into next to confirm 
the performance impact of this change ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

Reply via email to