[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

Adrien Grand (Jira) Fri, 15 Jul 2022 07:46:12 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567268#comment-17567268
 ]


Adrien Grand commented on LUCENE-10633:
---------------------------------------

I played with a prototype that starts dynamically pruning matches as soon as 
there are 128 competitive ordinals left or less by pulling postings to iterate 
over the remaining documents that have competitive values. I still need to 
think of simplifying the logic and improving tests but the initial benchmarks 
on wikimedium10m are very encouraging (assuming I didn't get anything wrong):

{noformat}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                         Prefix3      248.74      (6.1%)      242.61      
(5.8%)   -2.5% ( -13% -   10%) 0.191
           BrowseMonthTaxoFacets       27.71     (10.1%)       27.34     
(10.6%)   -1.3% ( -20% -   21%) 0.682
            BrowseDateSSDVFacets        4.99     (10.3%)        4.94      
(8.4%)   -1.1% ( -17% -   19%) 0.707
            BrowseDateTaxoFacets       44.26     (12.2%)       43.97     
(13.1%)   -0.7% ( -23% -   28%) 0.870
                        Wildcard      137.61      (3.0%)      136.97      
(2.6%)   -0.5% (  -5% -    5%) 0.592
       BrowseDayOfYearTaxoFacets       45.53     (12.4%)       45.44     
(13.4%)   -0.2% ( -23% -   29%) 0.963
                          IntNRQ      198.27      (8.1%)      197.94      
(7.4%)   -0.2% ( -14% -   16%) 0.946
     BrowseRandomLabelSSDVFacets       14.51      (2.2%)       14.49      
(2.4%)   -0.2% (  -4% -    4%) 0.835
        AndHighHighDayTaxoFacets        8.32      (5.1%)        8.31      
(5.7%)   -0.1% ( -10% -   11%) 0.956
                     LowSpanNear       46.83      (1.6%)       46.82      
(2.0%)   -0.0% (  -3% -    3%) 0.990
     BrowseRandomLabelTaxoFacets       36.18     (10.5%)       36.18     
(12.6%)    0.0% ( -20% -   25%) 0.998
            MedTermDayTaxoFacets       73.59      (4.8%)       73.66      
(5.7%)    0.1% (  -9% -   11%) 0.954
                   OrNotHighHigh     1476.08      (5.3%)     1477.58      
(3.9%)    0.1% (  -8% -    9%) 0.945
                      TermDTSort      746.55      (2.4%)      747.70      
(1.7%)    0.2% (  -3% -    4%) 0.817
                          Fuzzy2       96.18      (1.3%)       96.39      
(1.4%)    0.2% (  -2% -    2%) 0.617
         AndHighMedDayTaxoFacets      154.89      (1.8%)      155.29      
(1.6%)    0.3% (  -3% -    3%) 0.629
                      AndHighMed      378.38      (3.7%)      379.50      
(4.4%)    0.3% (  -7% -    8%) 0.817
                        PKLookup      243.14      (1.9%)      243.99      
(1.9%)    0.4% (  -3% -    4%) 0.552
                      HighPhrase      279.13      (2.1%)      280.21      
(1.5%)    0.4% (  -3% -    4%) 0.510
                         Respell       71.59      (1.5%)       71.87      
(1.5%)    0.4% (  -2% -    3%) 0.406
                      OrHighHigh       66.95      (6.5%)       67.21      
(5.7%)    0.4% ( -11% -   13%) 0.837
                          Fuzzy1      101.53      (1.5%)      101.95      
(1.5%)    0.4% (  -2% -    3%) 0.382
                       LowPhrase      101.76      (2.3%)      102.22      
(2.6%)    0.5% (  -4% -    5%) 0.558
                 LowSloppyPhrase       21.14      (3.1%)       21.25      
(4.1%)    0.5% (  -6% -    7%) 0.661
                       MedPhrase      173.45      (2.7%)      174.55      
(2.6%)    0.6% (  -4% -    6%) 0.443
                     MedSpanNear       17.77      (4.5%)       17.88      
(4.8%)    0.6% (  -8% -   10%) 0.661
                    OrHighNotLow     1396.26      (5.6%)     1406.85      
(6.4%)    0.8% ( -10% -   13%) 0.692
                       OrHighMed      162.41      (5.3%)      163.69      
(4.8%)    0.8% (  -8% -   11%) 0.625
           HighTermDayOfYearSort     1476.11      (2.7%)     1488.26      
(2.4%)    0.8% (  -4% -    6%) 0.312
             MedIntervalsOrdered      113.65      (4.2%)      114.59      
(7.0%)    0.8% (  -9% -   12%) 0.652
                       OrHighLow      828.13      (5.2%)      835.45      
(4.7%)    0.9% (  -8% -   11%) 0.574
                         MedTerm     2356.21      (4.7%)     2377.47      
(5.0%)    0.9% (  -8% -   11%) 0.554
                 MedSloppyPhrase       62.13      (3.4%)       62.72      
(3.9%)    0.9% (  -6% -    8%) 0.420
            HighIntervalsOrdered       18.19      (5.7%)       18.37      
(8.6%)    1.0% ( -12% -   16%) 0.673
                     AndHighHigh       54.46      (6.2%)       55.01      
(6.3%)    1.0% ( -10% -   14%) 0.615
                         LowTerm     2247.13      (4.7%)     2270.19      
(3.7%)    1.0% (  -7% -    9%) 0.446
                    OrNotHighLow     1728.71      (4.3%)     1748.19      
(4.7%)    1.1% (  -7% -   10%) 0.427
            HighTermTitleBDVSort       14.31      (3.3%)       14.47      
(5.7%)    1.2% (  -7% -   10%) 0.429
                   OrHighNotHigh     1328.26      (5.6%)     1345.40      
(5.6%)    1.3% (  -9% -   13%) 0.467
          OrHighMedDayTaxoFacets       21.05      (3.4%)       21.32      
(6.2%)    1.3% (  -8% -   11%) 0.412
                HighSloppyPhrase       13.58      (4.6%)       13.76      
(5.2%)    1.3% (  -8% -   11%) 0.396
       BrowseDayOfYearSSDVFacets       20.03      (7.4%)       20.30     
(10.3%)    1.3% ( -15% -   20%) 0.640
                        HighTerm     1696.02      (7.0%)     1720.12      
(6.3%)    1.4% ( -11% -   15%) 0.500
             LowIntervalsOrdered        5.49      (4.9%)        5.57      
(5.3%)    1.5% (  -8% -   12%) 0.359
                    OrHighNotMed     2042.56      (5.5%)     2075.38      
(6.0%)    1.6% (  -9% -   13%) 0.378
                      AndHighLow     1604.98      (3.7%)     1632.93      
(3.2%)    1.7% (  -5% -    9%) 0.115
           BrowseMonthSSDVFacets       22.20     (10.1%)       22.61     
(12.0%)    1.9% ( -18% -   26%) 0.596
                    OrNotHighMed     1440.64      (4.3%)     1467.73      
(2.6%)    1.9% (  -4% -    9%) 0.093
                    HighSpanNear       23.27      (6.2%)       24.09      
(6.2%)    3.5% (  -8% -   16%) 0.071
               HighTermMonthSort      173.72     (15.7%)     3968.96     
(90.7%) 2184.7% (1795% - 2719%) 0.000
               HighTermTitleSort       17.70     (14.4%)     1383.03    
(288.2%) 7712.7% (6474% - 9368%) 0.000
{noformat}

> Dynamic pruning for queries sorted by SORTED(_SET) field
> --------------------------------------------------------
>
>                 Key: LUCENE-10633
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10633
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

Reply via email to