[PR] Remove LeafSimScorer abstraction. [lucene]

via GitHub Fri, 25 Oct 2024 02:38:42 -0700


jpountz opened a new pull request, #13957:
URL: https://github.com/apache/lucene/pull/13957


   `LeafSimScorer` is a specialization of a `SimScorer` for a given segment. It 
doesn't add much value, but benchmarks suggest that it adds measurable overhead 
to queries sorted by score.
   
   Here is a `luceneutil` run with `-searchConcurrency 0` on `wikibigall`:
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                    CountAndHighMed      148.80      (3.6%)      146.79      
(3.3%)   -1.4% (  -8% -    5%) 0.219
                            Prefix3      210.12      (3.4%)      208.12      
(3.1%)   -1.0% (  -7% -    5%) 0.355
                       OrNotHighLow      930.49      (2.9%)      922.26      
(2.8%)   -0.9% (  -6% -    4%) 0.326
                     CountOrHighMed      104.34      (1.6%)      103.50      
(1.5%)   -0.8% (  -3% -    2%) 0.099
                   CountAndHighHigh       48.93      (3.6%)       48.55      
(3.4%)   -0.8% (  -7% -    6%) 0.485
                  HighTermMonthSort     3011.98      (2.9%)     2989.18      
(4.1%)   -0.8% (  -7% -    6%) 0.498
                         TermDTSort      342.40      (7.1%)      340.02      
(6.1%)   -0.7% ( -13% -   13%) 0.741
                    CountOrHighHigh       49.93      (1.6%)       49.76      
(1.2%)   -0.3% (  -3% -    2%) 0.451
                  HighTermTitleSort      111.58      (2.3%)      111.22      
(3.1%)   -0.3% (  -5% -    5%) 0.710
                      OrNotHighHigh      308.36      (3.1%)      307.70      
(3.3%)   -0.2% (  -6% -    6%) 0.835
                             Fuzzy2       71.17      (1.6%)       71.07      
(2.2%)   -0.1% (  -3% -    3%) 0.824
                          OrHighLow      726.98      (1.6%)      727.36      
(2.5%)    0.1% (  -4% -    4%) 0.939
              HighTermDayOfYearSort      764.56      (3.8%)      765.85      
(3.4%)    0.2% (  -6% -    7%) 0.882
                       OrNotHighMed      350.64      (3.4%)      351.46      
(4.3%)    0.2% (  -7% -    8%) 0.848
                             Fuzzy1       75.46      (1.9%)       75.80      
(1.8%)    0.5% (  -3% -    4%) 0.448
                             IntNRQ      139.45     (13.7%)      140.08     
(14.5%)    0.5% ( -24% -   33%) 0.918
               HighTermTitleBDVSort       15.35      (5.7%)       15.42      
(5.5%)    0.5% ( -10% -   12%) 0.781
                           PKLookup      265.51      (2.5%)      267.01      
(1.6%)    0.6% (  -3% -    4%) 0.389
                         AndHighLow      989.77      (1.9%)      995.39      
(2.2%)    0.6% (  -3% -    4%) 0.387
                          CountTerm     7984.92      (3.9%)     8051.09      
(5.0%)    0.8% (  -7% -   10%) 0.557
                      OrHighNotHigh      321.43      (2.7%)      324.15      
(3.1%)    0.8% (  -4% -    6%) 0.357
                             OrMany       18.24      (2.4%)       18.45      
(2.1%)    1.1% (  -3% -    5%) 0.107
                           Wildcard      117.97      (3.2%)      119.40      
(3.2%)    1.2% (  -5% -    7%) 0.230
                         OrHighRare      269.54      (5.3%)      273.78      
(6.5%)    1.6% (  -9% -   14%) 0.401
                          OrHighMed      219.25      (2.5%)      222.89      
(2.7%)    1.7% (  -3% -    7%) 0.044
                And2Terms2StopWords      151.65      (1.8%)      154.21      
(1.6%)    1.7% (  -1% -    5%) 0.002
                 Or2Terms2StopWords      153.46      (3.1%)      156.15      
(2.8%)    1.8% (  -4% -    7%) 0.061
                           Or3Terms      164.81      (2.4%)      168.57      
(2.9%)    2.3% (  -2% -    7%) 0.007
                            MedTerm      610.37      (3.5%)      625.30      
(3.7%)    2.4% (  -4% -   10%) 0.032
                       OrHighNotMed      417.48      (2.8%)      427.78      
(3.1%)    2.5% (  -3% -    8%) 0.008
                            LowTerm      981.78      (2.8%)     1008.35      
(3.8%)    2.7% (  -3% -    9%) 0.010
                          And3Terms      165.41      (1.8%)      170.05      
(1.7%)    2.8% (   0% -    6%) 0.000
                       AndStopWords       30.15      (3.0%)       31.07      
(3.8%)    3.0% (  -3% -   10%) 0.005
                           HighTerm      455.84      (3.4%)      469.91      
(4.0%)    3.1% (  -4% -   10%) 0.009
                         OrHighHigh       68.52      (1.7%)       70.69      
(3.7%)    3.2% (  -2% -    8%) 0.000
                       OrHighNotLow      412.63      (2.8%)      427.86      
(3.5%)    3.7% (  -2% -   10%) 0.000
                        OrStopWords       33.50      (3.8%)       34.75      
(5.1%)    3.7% (  -4% -   13%) 0.009
                         AndHighMed      165.41      (1.9%)      171.81      
(1.7%)    3.9% (   0% -    7%) 0.000
                        AndHighHigh       72.22      (1.7%)       76.11      
(1.4%)    5.4% (   2% -    8%) 0.000
   ```
   
   I could reproduce these small speedups with a low p-value across several 
runs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Remove LeafSimScorer abstraction. [lucene]

Reply via email to