[jira] [Commented] (LUCENE-10639) WANDScorer performs better without two-phase

Greg Miller (Jira) Tue, 05 Jul 2022 11:06:04 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562786#comment-17562786
 ]


Greg Miller commented on LUCENE-10639:
--------------------------------------

As a quick update, I ran benchmarks with just [livedoc checking broken 
out|https://github.com/gsmiller/lucene/commit/f4e9614a299523b57c854a3bd3371253f0a7fb17]
 in {{DefaultBulkScorer}}. I surprisingly didn't see any difference. So maybe 
something else going on here?

Note that I ran this with {{wikimedium10m}} instead of {{all}} to get a 
datapoint a bit quicker:

{code:java}
                            TaskQPS baseline      StdDevQPS candidate      
StdDev                Pct diff p-value
                         Prefix3      118.98     (10.2%)      114.60      
(9.9%)   -3.7% ( -21% -   18%) 0.247
                        Wildcard       40.69      (6.9%)       39.62      
(7.2%)   -2.6% ( -15% -   12%) 0.236
                      TermDTSort       17.76     (20.4%)       17.33     
(14.2%)   -2.4% ( -30% -   40%) 0.663
                   OrNotHighHigh      881.01      (4.4%)      861.34      
(3.9%)   -2.2% ( -10% -    6%) 0.089
                     AndHighHigh        8.87      (5.0%)        8.70      
(6.2%)   -1.8% ( -12% -    9%) 0.296
                         MedTerm     1771.40      (4.2%)     1740.50      
(4.4%)   -1.7% (  -9% -    7%) 0.198
                      AndHighMed       30.59      (4.0%)       30.06      
(5.6%)   -1.7% ( -10% -    8%) 0.267
                    OrHighNotLow      782.90      (4.8%)      769.92      
(5.1%)   -1.7% ( -11% -    8%) 0.291
                      HighPhrase      392.18      (2.7%)      386.50      
(2.7%)   -1.4% (  -6% -    4%) 0.087
                   OrHighNotHigh      830.76      (4.3%)      818.83      
(4.3%)   -1.4% (  -9% -    7%) 0.295
                    OrNotHighMed      585.86      (2.6%)      578.07      
(3.1%)   -1.3% (  -6% -    4%) 0.146
                    OrHighNotMed      966.75      (3.6%)      956.07      
(3.9%)   -1.1% (  -8% -    6%) 0.352
                       LowPhrase      546.02      (2.1%)      540.42      
(2.4%)   -1.0% (  -5% -    3%) 0.148
                       MedPhrase       24.65      (2.3%)       24.40      
(3.0%)   -1.0% (  -6% -    4%) 0.225
                      AndHighLow      508.37      (3.7%)      503.84      
(4.7%)   -0.9% (  -8% -    7%) 0.506
                    OrNotHighLow      672.15      (2.7%)      666.29      
(2.8%)   -0.9% (  -6% -    4%) 0.313
           BrowseMonthTaxoFacets        8.92     (14.5%)        8.84     
(13.9%)   -0.9% ( -25% -   32%) 0.846
         AndHighMedDayTaxoFacets       39.14      (2.2%)       38.82      
(2.2%)   -0.8% (  -5% -    3%) 0.241
        AndHighHighDayTaxoFacets        8.01      (2.8%)        7.96      
(2.8%)   -0.7% (  -6% -    4%) 0.416
                 LowSloppyPhrase        5.83      (3.8%)        5.79      
(3.8%)   -0.7% (  -8% -    7%) 0.556
                       OrHighLow      128.01      (3.7%)      127.11      
(3.8%)   -0.7% (  -7% -    7%) 0.554
                        HighTerm     1190.03      (4.4%)     1183.10      
(4.1%)   -0.6% (  -8% -    8%) 0.663
                 MedSloppyPhrase       11.67      (2.1%)       11.61      
(2.6%)   -0.5% (  -5% -    4%) 0.480
            MedTermDayTaxoFacets       14.09      (3.1%)       14.03      
(4.1%)   -0.5% (  -7% -    6%) 0.686
                          IntNRQ      110.15      (2.3%)      109.69      
(2.1%)   -0.4% (  -4% -    4%) 0.546
                HighSloppyPhrase        9.56      (4.5%)        9.53      
(4.5%)   -0.4% (  -8% -    9%) 0.794
            BrowseDateSSDVFacets        0.85     (10.4%)        0.85     
(10.8%)   -0.3% ( -19% -   23%) 0.939
                         Respell       33.65      (1.7%)       33.58      
(1.7%)   -0.2% (  -3% -    3%) 0.684
                          Fuzzy2       74.16      (1.9%)       74.02      
(1.7%)   -0.2% (  -3% -    3%) 0.740
                         LowTerm     1522.48      (2.9%)     1520.76      
(3.3%)   -0.1% (  -6% -    6%) 0.909
             LowIntervalsOrdered       12.75      (3.3%)       12.74      
(3.3%)   -0.1% (  -6% -    6%) 0.915
            HighIntervalsOrdered        6.30      (4.2%)        6.31      
(4.0%)    0.1% (  -7% -    8%) 0.923
     BrowseRandomLabelSSDVFacets        2.57      (4.9%)        2.57      
(4.9%)    0.1% (  -9% -   10%) 0.927
                          Fuzzy1       57.11      (1.9%)       57.26      
(1.7%)    0.2% (  -3% -    3%) 0.666
     BrowseRandomLabelTaxoFacets        6.32      (9.3%)        6.34     
(10.3%)    0.3% ( -17% -   21%) 0.911
                     LowSpanNear       15.95      (2.9%)       16.01      
(2.7%)    0.4% (  -5% -    6%) 0.680
             MedIntervalsOrdered        1.61      (5.8%)        1.62      
(5.8%)    0.4% ( -10% -   12%) 0.834
                    HighSpanNear        2.27      (4.2%)        2.28      
(4.0%)    0.6% (  -7% -    9%) 0.636
                     MedSpanNear        8.99      (3.4%)        9.05      
(3.3%)    0.7% (  -5% -    7%) 0.502
                       OrHighMed       60.81      (3.8%)       61.29      
(3.3%)    0.8% (  -6% -    8%) 0.479
                      OrHighHigh       15.25      (4.7%)       15.38      
(3.8%)    0.8% (  -7% -    9%) 0.548
            HighTermTitleBDVSort       59.77     (18.2%)       60.25     
(14.7%)    0.8% ( -27% -   41%) 0.876
          OrHighMedDayTaxoFacets        2.42      (3.1%)        2.44      
(3.6%)    0.9% (  -5% -    7%) 0.420
           BrowseMonthSSDVFacets        4.05      (7.7%)        4.09      
(9.4%)    1.0% ( -14% -   19%) 0.717
       BrowseDayOfYearSSDVFacets        3.43      (5.5%)        3.46      
(4.9%)    1.0% (  -8% -   12%) 0.523
               HighTermMonthSort       58.75     (20.4%)       59.43     
(14.8%)    1.2% ( -28% -   45%) 0.836
                        PKLookup      147.01      (2.9%)      148.75      
(3.8%)    1.2% (  -5% -    8%) 0.272
           HighTermDayOfYearSort       17.59     (17.0%)       17.96     
(15.2%)    2.1% ( -25% -   41%) 0.681
            BrowseDateTaxoFacets        6.68     (10.4%)        6.84     
(13.0%)    2.5% ( -18% -   28%) 0.503
       BrowseDayOfYearTaxoFacets        6.68     (10.4%)        6.86     
(13.2%)    2.6% ( -18% -   29%) 0.485
{code}


> WANDScorer performs better without two-phase
> --------------------------------------------
>
>                 Key: LUCENE-10639
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10639
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Greg Miller
>            Priority: Major
>
> After looking at the recent improvement [~jpountz] made to WAND scoring in 
> LUCENE-10634, which does additional work during match confirmation to not 
> confirm a match who's score wouldn't be competitive, I wanted to see how 
> performance would shift if we squashed the two-phase iteration completely and 
> only returned true matches (that were also known to be competitive by score) 
> in the "approximation" phase. I was a bit surprised to find that luceneutil 
> benchmarks (run with {{{}wikimediumall{}}}), improves significantly on some 
> disjunction tasks and doesn't show significant regressions anywhere else.
> Note that I used LUCENE-10634 as a baseline, and built my candidate change on 
> top of that. The diff can be seen here: 
> [DIFF|https://github.com/gsmiller/lucene/compare/b2d46440998fe4a972e8cc8c948580111359ed0f..c5bab794c92dbc66e70f9389948c1bdfe9b45231]
> A simple conclusion here might be that we shouldn't do two-phase iteration in 
> WANDScorer, but I'm pretty sure that's not right. I wonder if what's really 
> going on is that we're under-estimating the cost of confirming a match? Right 
> now we just return the tail size as the cost. While the cost of confirming a 
> match is proportional to the tail size, the actual work involved can be quite 
> significant (having to advance tail iterators to new blocks and decompress 
> them). I wonder if the WAND second phase is being run too early on 
> approximate candidates, and if less-expensive, (and even possibly more 
> restrictive?), second phases could/should be running first?
> I'm raising this here as more of a curiosity to see if it sparks ideas on how 
> to move forward. Again, I'm not proposing we do away with two-phase 
> iteration, but it seems we might be able to improve things. Maybe I'll 
> explore changing the cost heuristic next. Also, maybe there's some different 
> benchmarking that would be useful here that I may not be familiar with?
> Benchmark results on wikimediumall:
> {code:java}
>                             TaskQPS baseline      StdDevQPS candidate      
> StdDev                Pct diff p-value
>             HighTermTitleBDVSort       22.52     (18.9%)       21.66     
> (15.6%)   -3.8% ( -32% -   37%) 0.485
>                          Prefix3        9.38      (9.2%)        9.09     
> (10.6%)   -3.1% ( -20% -   18%) 0.326
>                HighTermMonthSort       25.37     (16.0%)       24.87     
> (17.1%)   -2.0% ( -30% -   37%) 0.710
>             MedTermDayTaxoFacets        9.62      (4.2%)        9.51      
> (4.1%)   -1.2% (  -9% -    7%) 0.368
>                       TermDTSort       74.69     (18.0%)       74.13     
> (18.2%)   -0.7% ( -31% -   43%) 0.897
>            HighTermDayOfYearSort       52.64     (16.1%)       52.32     
> (15.4%)   -0.6% ( -27% -   36%) 0.903
>            BrowseMonthTaxoFacets        8.64     (19.1%)        8.59     
> (19.8%)   -0.6% ( -33% -   47%) 0.926
>             BrowseDateSSDVFacets        0.86      (9.5%)        0.86     
> (13.1%)   -0.4% ( -20% -   24%) 0.914
>                         PKLookup      147.18      (3.9%)      146.66      
> (3.3%)   -0.3% (  -7% -    7%) 0.759
>        BrowseDayOfYearSSDVFacets        3.47      (4.5%)        3.45      
> (4.8%)   -0.3% (  -9% -    9%) 0.822
>                         Wildcard       36.36      (4.4%)       36.26      
> (5.2%)   -0.3% (  -9% -    9%) 0.866
>            BrowseMonthSSDVFacets        4.15     (12.7%)        4.13     
> (12.8%)   -0.3% ( -22% -   28%) 0.950
>          AndHighMedDayTaxoFacets       15.21      (2.7%)       15.18      
> (2.9%)   -0.2% (  -5% -    5%) 0.819
>                           Fuzzy1       68.33      (1.8%)       68.22      
> (2.0%)   -0.2% (  -3% -    3%) 0.783
>           OrHighMedDayTaxoFacets        2.90      (4.1%)        2.89      
> (4.0%)   -0.1% (  -7% -    8%) 0.930
>                        MedPhrase       52.81      (2.3%)       52.76      
> (1.8%)   -0.1% (  -4% -    4%) 0.878
>                          Respell       36.80      (1.9%)       36.78      
> (1.9%)   -0.1% (  -3% -    3%) 0.933
>                           Fuzzy2       63.06      (1.9%)       63.05      
> (2.1%)   -0.0% (  -3% -    4%) 0.971
>                        LowPhrase       74.60      (1.9%)       74.61      
> (1.8%)    0.0% (  -3% -    3%) 0.987
>         AndHighHighDayTaxoFacets        4.54      (2.3%)        4.55      
> (2.0%)    0.0% (  -4% -    4%) 0.960
>                       HighPhrase      353.13      (2.6%)      353.28      
> (2.5%)    0.0% (  -4% -    5%) 0.958
>                    OrNotHighHigh      761.72      (4.0%)      762.48      
> (3.6%)    0.1% (  -7% -    8%) 0.935
>                     OrHighNotLow     1129.94      (4.1%)     1131.56      
> (3.6%)    0.1% (  -7% -    8%) 0.906
>                          LowTerm     1315.90      (2.9%)     1318.61      
> (2.5%)    0.2% (  -5% -    5%) 0.810
>                           IntNRQ      192.33      (2.8%)      192.93      
> (2.3%)    0.3% (  -4% -    5%) 0.701
>                      LowSpanNear       23.60      (2.2%)       23.68      
> (1.6%)    0.3% (  -3% -    4%) 0.592
>                     OrNotHighMed      867.21      (2.3%)      870.27      
> (2.8%)    0.4% (  -4% -    5%) 0.664
>      BrowseRandomLabelSSDVFacets        2.53      (1.6%)        2.54      
> (1.9%)    0.4% (  -3% -    3%) 0.494
>                       AndHighMed      105.33      (4.5%)      105.83      
> (4.6%)    0.5% (  -8% -    9%) 0.739
>                         HighTerm     1030.35      (5.7%)     1035.54      
> (5.9%)    0.5% ( -10% -   12%) 0.783
>                  MedSloppyPhrase       41.07      (3.0%)       41.28      
> (2.9%)    0.5% (  -5% -    6%) 0.581
>                       AndHighLow      287.51      (3.2%)      289.03      
> (4.3%)    0.5% (  -6% -    8%) 0.657
>                     OrHighNotMed      910.71      (3.9%)      915.93      
> (4.1%)    0.6% (  -7% -    8%) 0.651
>                      AndHighHigh       28.96      (5.0%)       29.15      
> (5.3%)    0.6% (  -9% -   11%) 0.695
>                     OrNotHighLow      679.21      (2.7%)      683.68      
> (4.1%)    0.7% (  -6% -    7%) 0.551
>                          MedTerm     1425.49      (4.8%)     1435.41      
> (5.1%)    0.7% (  -8% -   11%) 0.657
>                      MedSpanNear        8.74      (3.0%)        8.80      
> (2.8%)    0.7% (  -4% -    6%) 0.448
>      BrowseRandomLabelTaxoFacets        6.11     (14.4%)        6.16     
> (15.2%)    0.7% ( -25% -   35%) 0.875
>                    OrHighNotHigh      674.18      (4.1%)      679.40      
> (4.5%)    0.8% (  -7% -    9%) 0.569
>                  LowSloppyPhrase        5.08      (3.3%)        5.12      
> (3.5%)    0.8% (  -5% -    7%) 0.445
>                     HighSpanNear        2.22      (5.4%)        2.25      
> (4.2%)    1.3% (  -7% -   11%) 0.398
>                 HighSloppyPhrase        5.27      (7.8%)        5.34      
> (9.0%)    1.3% ( -14% -   19%) 0.622
>              LowIntervalsOrdered       17.88      (4.8%)       18.21      
> (3.1%)    1.9% (  -5% -   10%) 0.144
>             BrowseDateTaxoFacets        6.51     (14.4%)        6.65     
> (17.4%)    2.3% ( -25% -   39%) 0.652
>        BrowseDayOfYearTaxoFacets        6.52     (14.4%)        6.68     
> (17.7%)    2.5% ( -25% -   40%) 0.624
>              MedIntervalsOrdered       14.43      (7.8%)       14.80      
> (4.5%)    2.6% (  -9% -   16%) 0.205
>                        OrHighLow      158.48      (3.2%)      162.94      
> (4.2%)    2.8% (  -4% -   10%) 0.017
>             HighIntervalsOrdered        1.56      (9.4%)        1.60      
> (5.2%)    3.0% ( -10% -   19%) 0.215
>                        OrHighMed       65.32      (4.2%)       71.62      
> (4.1%)    9.6% (   1% -   18%) 0.000
>                       OrHighHigh       14.04      (4.5%)       15.68      
> (3.9%)   11.7% (   3% -   21%) 0.000
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10639) WANDScorer performs better without two-phase

Reply via email to