[jira] [Created] (LUCENE-10639) WANDScorer performs better without two-phase

Greg Miller (Jira) Sat, 02 Jul 2022 06:20:04 -0700

Greg Miller created LUCENE-10639:
------------------------------------

             Summary: WANDScorer performs better without two-phase
                 Key: LUCENE-10639
                 URL: https://issues.apache.org/jira/browse/LUCENE-10639
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/search
            Reporter: Greg Miller



After looking at the recent improvement [~jpountz] made to WAND scoring in 
LUCENE-10634, which does additional work during match confirmation to not 
confirm a match who's score wouldn't be competitive, I wanted to see how 
performance would shift if we squashed the two-phase iteration completely and 
only returned true matches (that were also known to be competitive by score) in 
the "approximation" phase. I was a bit surprised to find that luceneutil 
benchmarks (run with {{{}wikimediumall{}}}), improves significantly on some 
disjunction tasks and doesn't show significant regressions anywhere else.

Note that I used LUCENE-10634 as a baseline, and built my candidate change on 
top of that. The diff can be seen here: 
[DIFF|https://github.com/gsmiller/lucene/compare/b2d46440998fe4a972e8cc8c948580111359ed0f..c5bab794c92dbc66e70f9389948c1bdfe9b45231]

A simple conclusion here might be that we shouldn't do two-phase iteration in 
WANDScorer, but I'm pretty sure that's not right. I wonder if what's really 
going on is that we're under-estimating the cost of confirming a match? Right 
now we just return the tail size as the cost. While the cost of confirming a 
match is proportional to the tail size, the actual work involved can be quite 
significant (having to advance tail iterators to new blocks and decompress 
them). I wonder if the WAND second phase is being run too early on approximate 
candidates, and if less-expensive, (and even possibly more restrictive?), 
second phases could/should be running first?

I'm raising this here as more of a curiosity to see if it sparks ideas on how 
to move forward. Again, I'm not proposing we do away with two-phase iteration, 
but it seems we might be able to improve things. Maybe I'll explore changing 
the cost heuristic next. Also, maybe there's some different benchmarking that 
would be useful here that I may not be familiar with?

Benchmark results on wikimediumall:
{code:java}
                            TaskQPS baseline      StdDevQPS candidate      
StdDev                Pct diff p-value
            HighTermTitleBDVSort       22.52     (18.9%)       21.66     
(15.6%)   -3.8% ( -32% -   37%) 0.485
                         Prefix3        9.38      (9.2%)        9.09     
(10.6%)   -3.1% ( -20% -   18%) 0.326
               HighTermMonthSort       25.37     (16.0%)       24.87     
(17.1%)   -2.0% ( -30% -   37%) 0.710
            MedTermDayTaxoFacets        9.62      (4.2%)        9.51      
(4.1%)   -1.2% (  -9% -    7%) 0.368
                      TermDTSort       74.69     (18.0%)       74.13     
(18.2%)   -0.7% ( -31% -   43%) 0.897
           HighTermDayOfYearSort       52.64     (16.1%)       52.32     
(15.4%)   -0.6% ( -27% -   36%) 0.903
           BrowseMonthTaxoFacets        8.64     (19.1%)        8.59     
(19.8%)   -0.6% ( -33% -   47%) 0.926
            BrowseDateSSDVFacets        0.86      (9.5%)        0.86     
(13.1%)   -0.4% ( -20% -   24%) 0.914
                        PKLookup      147.18      (3.9%)      146.66      
(3.3%)   -0.3% (  -7% -    7%) 0.759
       BrowseDayOfYearSSDVFacets        3.47      (4.5%)        3.45      
(4.8%)   -0.3% (  -9% -    9%) 0.822
                        Wildcard       36.36      (4.4%)       36.26      
(5.2%)   -0.3% (  -9% -    9%) 0.866
           BrowseMonthSSDVFacets        4.15     (12.7%)        4.13     
(12.8%)   -0.3% ( -22% -   28%) 0.950
         AndHighMedDayTaxoFacets       15.21      (2.7%)       15.18      
(2.9%)   -0.2% (  -5% -    5%) 0.819
                          Fuzzy1       68.33      (1.8%)       68.22      
(2.0%)   -0.2% (  -3% -    3%) 0.783
          OrHighMedDayTaxoFacets        2.90      (4.1%)        2.89      
(4.0%)   -0.1% (  -7% -    8%) 0.930
                       MedPhrase       52.81      (2.3%)       52.76      
(1.8%)   -0.1% (  -4% -    4%) 0.878
                         Respell       36.80      (1.9%)       36.78      
(1.9%)   -0.1% (  -3% -    3%) 0.933
                          Fuzzy2       63.06      (1.9%)       63.05      
(2.1%)   -0.0% (  -3% -    4%) 0.971
                       LowPhrase       74.60      (1.9%)       74.61      
(1.8%)    0.0% (  -3% -    3%) 0.987
        AndHighHighDayTaxoFacets        4.54      (2.3%)        4.55      
(2.0%)    0.0% (  -4% -    4%) 0.960
                      HighPhrase      353.13      (2.6%)      353.28      
(2.5%)    0.0% (  -4% -    5%) 0.958
                   OrNotHighHigh      761.72      (4.0%)      762.48      
(3.6%)    0.1% (  -7% -    8%) 0.935
                    OrHighNotLow     1129.94      (4.1%)     1131.56      
(3.6%)    0.1% (  -7% -    8%) 0.906
                         LowTerm     1315.90      (2.9%)     1318.61      
(2.5%)    0.2% (  -5% -    5%) 0.810
                          IntNRQ      192.33      (2.8%)      192.93      
(2.3%)    0.3% (  -4% -    5%) 0.701
                     LowSpanNear       23.60      (2.2%)       23.68      
(1.6%)    0.3% (  -3% -    4%) 0.592
                    OrNotHighMed      867.21      (2.3%)      870.27      
(2.8%)    0.4% (  -4% -    5%) 0.664
     BrowseRandomLabelSSDVFacets        2.53      (1.6%)        2.54      
(1.9%)    0.4% (  -3% -    3%) 0.494
                      AndHighMed      105.33      (4.5%)      105.83      
(4.6%)    0.5% (  -8% -    9%) 0.739
                        HighTerm     1030.35      (5.7%)     1035.54      
(5.9%)    0.5% ( -10% -   12%) 0.783
                 MedSloppyPhrase       41.07      (3.0%)       41.28      
(2.9%)    0.5% (  -5% -    6%) 0.581
                      AndHighLow      287.51      (3.2%)      289.03      
(4.3%)    0.5% (  -6% -    8%) 0.657
                    OrHighNotMed      910.71      (3.9%)      915.93      
(4.1%)    0.6% (  -7% -    8%) 0.651
                     AndHighHigh       28.96      (5.0%)       29.15      
(5.3%)    0.6% (  -9% -   11%) 0.695
                    OrNotHighLow      679.21      (2.7%)      683.68      
(4.1%)    0.7% (  -6% -    7%) 0.551
                         MedTerm     1425.49      (4.8%)     1435.41      
(5.1%)    0.7% (  -8% -   11%) 0.657
                     MedSpanNear        8.74      (3.0%)        8.80      
(2.8%)    0.7% (  -4% -    6%) 0.448
     BrowseRandomLabelTaxoFacets        6.11     (14.4%)        6.16     
(15.2%)    0.7% ( -25% -   35%) 0.875
                   OrHighNotHigh      674.18      (4.1%)      679.40      
(4.5%)    0.8% (  -7% -    9%) 0.569
                 LowSloppyPhrase        5.08      (3.3%)        5.12      
(3.5%)    0.8% (  -5% -    7%) 0.445
                    HighSpanNear        2.22      (5.4%)        2.25      
(4.2%)    1.3% (  -7% -   11%) 0.398
                HighSloppyPhrase        5.27      (7.8%)        5.34      
(9.0%)    1.3% ( -14% -   19%) 0.622
             LowIntervalsOrdered       17.88      (4.8%)       18.21      
(3.1%)    1.9% (  -5% -   10%) 0.144
            BrowseDateTaxoFacets        6.51     (14.4%)        6.65     
(17.4%)    2.3% ( -25% -   39%) 0.652
       BrowseDayOfYearTaxoFacets        6.52     (14.4%)        6.68     
(17.7%)    2.5% ( -25% -   40%) 0.624
             MedIntervalsOrdered       14.43      (7.8%)       14.80      
(4.5%)    2.6% (  -9% -   16%) 0.205
                       OrHighLow      158.48      (3.2%)      162.94      
(4.2%)    2.8% (  -4% -   10%) 0.017
            HighIntervalsOrdered        1.56      (9.4%)        1.60      
(5.2%)    3.0% ( -10% -   19%) 0.215
                       OrHighMed       65.32      (4.2%)       71.62      
(4.1%)    9.6% (   1% -   18%) 0.000
                      OrHighHigh       14.04      (4.5%)       15.68      
(3.9%)   11.7% (   3% -   21%) 0.000
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10639) WANDScorer performs better without two-phase

Reply via email to